[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

This repository is an official implementation of TokenHPE.

Overview

We propose a novel critical minority relationship-aware method based on the Transformer architecture in which the facial part relationships can be learned. Specifically, we design several orientation tokens to explicitly encode the basic orientation regions. Meanwhile, a novel token guide multi-loss function is designed to guide the orientation tokens as they learn the desired regional similarities and relationships.

Preparation

Environments

python == 3.9, torch >= 1.10.1, CUDA ==11.2

Datasets

Follow the 6DRepnet to prepare the datasets:

300W-LP, AFLW2000 from here.
BIWI (Biwi Kinect Head Pose Database) from here.

Store them in the datasets directory.

For 300W-LP and AFLW2000 we need to create a filenamelist.

python create_filename_list.py --root_dir datasets/300W_LP

The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided here. For 7:3 splitting of the BIWI dataset you can use the equivalent script here. The cropped image size is set to 256.

Download weights

Download trained weights from gdrive You can choose to use the pretrained ViT-B/16 weigthts for the feature extractor. (optional)

Directory structure

After preparation, you will be able to see the following directory structure:

TokenHPE
├── datasets
│   ├── 300W_LP
│     ├── files.txt
│     ├── ...
│   ├── AFLW2000 
│     ├── files.txt
│     ├── ... 
│   ├── ...
├── weights
│   ├── TokenHPEv1-ViTB-224_224-lyr3.tar
├── figs
├── create_filename_list.py
├── datasets.py
├── README.md
├── ...

Training & Evaluation

Download trained weight from gdrive, then you can evaluate the model following:

python test.py  --batch_size 64 \
                --dataset ALFW2000 \
                --data_dir datasets/AFLW2000 \
                --filename_list datasets/AFLW2000/files.txt \
                --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \
                --show_viz False

You can train the model following:

python train.py --batch_size 64 \
                --num_epochs 60 \
                --lr 0.00001 \
                --dataset Pose_300W_LP \
                --data_dir datasets/300W_LP \
                --filename_list datasets/300W_LP/files.txt

Inference & Visualization

You can get the visualizations following:

python inference.py  --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \
                     --image_path img_path_here

Main results

We provide some results on AFLW2000 with models trained on 300W_LP. These models are trained on one TITAN V GPU.

config	MAE	VMAE	training	download
TokenHPEv1-ViT/B-224*224-lyr3	4.81	6.09	~24hours	gdrive

Acknowledgement

Many thanks to the authors of 6DRepnet. We reuse their code for data preprocessing and evaluation which greatly reduced redundant work.

Citation

If you find our work useful, please cite the paper:

@InProceedings{Zhang_2023_CVPR,
    author    = {Zhang, Cheng and Liu, Hai and Deng, Yongjian and Xie, Bochen and Li, Youfu},
    title     = {TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {8897-8906}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md
ViT_model.py		ViT_model.py
create_filename_list.py		create_filename_list.py
datasets.py		datasets.py
inference.py		inference.py
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

Overview

Preparation

Environments

Datasets

Download weights

Directory structure

Training & Evaluation

Inference & Visualization

Main results

Acknowledgement

Citation

About

Releases

Packages

Languages

License

zc2023/TokenHPE

Folders and files

Latest commit

History

Repository files navigation

[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

Overview

Preparation

Environments

Datasets

Download weights

Directory structure

Training & Evaluation

Inference & Visualization

Main results

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages