Skip to content

Latest commit

 

History

History
 
 

arcface_torch

Distributed Arcface Training in Pytorch

This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions identity on a single server.

Requirements

How to Training

To train a model, run train.py with the path to the configs.
The example commands below show how to run distributed training.

1. To run on a machine with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12581 train.py configs/ms1mv3_r50_lr02

2. To run on 2 machines with 8 GPUs each:

Node 0:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus

Node 1:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus

Download Datasets or Prepare Datasets

Model Zoo

  • The models are available for non-commercial research purposes only.
  • All models can be found in here.
  • Baidu Yun Pan: e8pw
  • OneDrive

Performance on IJB-C and ICCV2021-MFR

ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities. As the result, we can evaluate the FAIR performance for different algorithms.

For ICCV2021-MFR-ALL set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The globalised multi-racial testset contains 242,143 identities and 1,624,305 images.

Datasets Backbone MFR-ALL IJB-C(1E-4) IJB-C(1E-5) Training Throughout log
MS1MV3 mobileface 65.76 94.44 91.85 ~13000 log|config
Glint360K mobileface 69.83 95.17 92.58 -11000 log|config
WebFace42M-PartialFC-0.2 mobileface 73.80 95.40 92.64 (16GPUs)~18583 log|config
MS1MV3 r100 85.39 97.00 95.36 ~3400 log|config
Glint360K r100 90.86 97.53 96.43 ~5000 log|config
WebFace42M-PartialFC-0.2 r50(bs4k) 93.83 97.53 96.16 (8 GPUs)~5900 log|config
WebFace42M-PartialFC-0.2 r50(bs8k) 93.96 97.46 96.12 (16GPUs)~11000 log|config
WebFace42M-PartialFC-0.2 r50(bs4k) 94.04 97.48 95.94 (32GPUs)~17000 log|config
WebFace42M-PartialFC-0.2 r100(bs4k) 96.69 97.85 96.63 (16GPUs)~5200 log|config
WebFace42M-PartialFC-0.2 r200 - - - - log|config

PartialFC-0.2 means negivate class centers sample rate is 0.2.

Speed Benchmark

arcface_torch can train large-scale face recognition training set efficiently and quickly. When the number of classes in training sets is greater than 1 Million, partial fc sampling strategy will get same accuracy with several times faster training performance and smaller GPU memory. Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC, we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed training and mixed precision training.

Image text

More details see speed_benchmark.md in docs.

1. Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)

- means training failed because of gpu memory limitations.

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 4681 4824 5004
1400000 1672 3043 4738
5500000 - 1389 3975
8000000 - - 3565
16000000 - - 2679
29000000 - - 1855

2. GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 7358 5306 4868
1400000 32252 11178 6056
5500000 - 32188 9854
8000000 - - 12310
16000000 - - 19950
29000000 - - 32324

Citations

@inproceedings{deng2019arcface,
  title={Arcface: Additive angular margin loss for deep face recognition},
  author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4690--4699},
  year={2019}
}
@inproceedings{an2020partical_fc,
  title={Partial FC: Training 10 Million Identities on a Single Machine},
  author={An, Xiang and Zhu, Xuhan and Xiao, Yang and Wu, Lan and Zhang, Ming and Gao, Yuan and Qin, Bin and
  Zhang, Debing and Fu Ying},
  booktitle={Arxiv 2010.05222},
  year={2020}
}