Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination, CVPR 2022

This repository provides an inplementation of our paper Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination in CVPR 2022. If you use our code and data please cite our paper.

Please note that this is research software and may contain bugs or other issues – please use it at your own risk. If you experience major problems with it, you may contact us, but please note that we do not have the resources to deal with all issues.

@InProceedings{Nonaka_2022_CVPR,
    author    = {Nonaka, Soma and Nobuhara, Shohei and Nishino, Ko},
    title     = {Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {2192-2201}
}

Prerequisites

We tested our code with Python 3.7 on Ubuntu 20.04 LTS. Our code depends on the following modules.

numpy
opencv_python
matplotlib
tqdm
pytorch
torchvision
pytorch_lightning
efficientnet_pytorch
albumentations

Please browse conda.yaml to find the specific versions we used in our test environment, or you can simply do

$ conda env create --file conda.yaml

in your (virtual) environment.

Alternatively, you can use gafa.def to build your singularity container by

$ singularity build --fakeroot ./singularity/gafa.sif ./singularity/gafa.def
INFO:    Starting build...
Getting image source signatures
(snip)
INFO:    Creating SIF file...
INFO:    Build complete: ./gafa.sif

The file size of the generated container file (gafa.sif in this example) will be around 7.7GB.

Gaze from Afar (GAFA) dataset

License

The GAFA dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Ethical approval

The GAFA dataset is collected under approval by the Research Ethics Committee of the Graduate School of Informatics, Kyoto University (KUIS-EAR-2020-002).

Download

We provide the raw data and preprocessed data. As the raw data is so large (1.7 TB), we recommend you to use the preprocessed data (5.9 GB), which contain cropped human images. If you would like to use your own cropping or head detection models, please use the raw data.

The large files above (Library and Courtyard) are also available as 128GB chunks.

The chunks are generated by

$ split -b 128G -d library.tar.gz library.tar.gz.
$ split -b 128G -d courtyard.tar.gz courtyard.tar.gz.

and can be merged by

$ cat library.tar.gz.0[0-7] > library.tar.gz
$ cat courtyard.tar.gz.0[0-3] > courtyard.tar.gz

Library

Courtyard

Raw data

We provide the raw surveillance videos, calibration data (camera intrinsics and extrainsics), and annotation data (gaze/head/body directions). Our dataset contains 5 daily scenes, lab, library, kitchen, courtyard, and living room.

The data is organized as follows.

data/raw_data
├── library/
│   ├──1026_3/
│   │   ├── Camera1_0/
│   │   │   ├── 000000.jpg
│   │   │   ├── 000001.jpg
│   │   │    ...
│   │   ├── Camera1_0.pkl
│   │    ...
│   │   ├── Camera1_1/
│   │   ├── Camera1_1.pkl
│   │    ...
│   │   ├── Camera2_0/
│   │   ├── Camera2_0.pkl
│   │    ...
│   ├──1028_2/
│   ...
├── lab/
├── courtyard/
├── living_room/
├── kitchen/
└── intrinsics.npz

Data is stored in five scenes (e.g. library/).

Each scene is subdivided by shooting session (e.g. 1026_3/).
Each session is stored frame by frame with images taken from each camera (e.g. Camera1_0, Camera2_0, ..., Caemra8_0).
- For example, Camera1_0/000000.jpg and Camera2_0/000000.jpg contain images of a person as seen from each camera at the same time.
- In some cases, we further divided the session into multiple subsessions (e.g. Camera1_1, Camera1_2, ...).
The calibration data of each camera and annotation data (gaze, head, and body directions) in the camera coordinate are stored in a pickle file (e.g. Caera1_0.pkl).
The cameras share a single intrinsic parameter given in intrinsics.npz.

Preprocessed data

The preprocessed data are stored in a similar format as the raw data, but each frame contains cropped person images.

We also provide a script to preprocess raw data as data/preprocessed/preprocess.py.

Usage

Demo

Please open the following Jupyter notebooks.

demo.ipynb: Demo code for end-to-end gaze estimation with our proposed model.
dataset-demo.ipynb: Demo code to visualize annotations of the GAFA dataset.

In case you have built your singularity container as mentioned above, you can launch jupyter in singularity by

$ singularity exec ./singularity/gafa.sif jupyter notebook --port 12345
    To access the notebook, open this file in a browser:
        file:/// ...
    Or copy and paste one of these URLs:
        http://localhost:12345/?token= ...
     or http://127.0.0.1:12345/?token= ...

Or, you can launch bash shell in singularity by

singularity shell --nv ./singularity/gafa.sif --port 12345

Evaluation with the GAFA dataset

Please download the weights of the pretrained models from here, and place the .pth files to models/weights.

We can then evaluate the accuracy of the estimated gaze direction with our model.

python eval.py \
   --gpus 1 (If no GPU is available, set -1)

# Result
MAE (3D front):  20.697409
MAE (2D front):  17.393957
MAE (3D back):  23.210892
MAE (2D back):  21.901659
MAE (3D all):  21.688675
MAE (2D all):  20.889896

Training with the GAFA dataset

You can train our model with the GAFA dataset by

python train.py \
   --epoch 10 \
   --n_frames 7 \
   --gpus 2

It will consume 24GB for each GPUs, and takes about 24 hours for training. As we implement our model using distributed data parallel functionality in PyTorch Lightning, you can speed up the training by adding more GPUs.

If your GPU's VRAM is less than 24 GB, please switch the model to data parallel mode by changing strategy="ddp" in train.py to strategy="dp".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination, CVPR 2022

Prerequisites

Gaze from Afar (GAFA) dataset

License

Ethical approval

Download

Library

Courtyard

Raw data

Preprocessed data

Usage

Demo

Evaluation with the GAFA dataset

Training with the GAFA dataset

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
dataloader		dataloader
img		img
models		models
singularity		singularity
LICENSE		LICENSE
README.md		README.md
conda.yaml		conda.yaml
demo.ipynb		demo.ipynb
eval.py		eval.py
train.py		train.py

License

kyotovision-public/dynamic-3d-gaze-from-afar

Folders and files

Latest commit

History

Repository files navigation

Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination, CVPR 2022

Prerequisites

Gaze from Afar (GAFA) dataset

License

Ethical approval

Download

Library

Courtyard

Raw data

Preprocessed data

Usage

Demo

Evaluation with the GAFA dataset

Training with the GAFA dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages