Skip to content

Latest commit

 

History

History
733 lines (401 loc) · 94.5 KB

readme.md

File metadata and controls

733 lines (401 loc) · 94.5 KB

Contents

List of public algorithms and datasets

1) Pubilc Datasets and Challenges

For Face Alignment or Landmark Detection

  • Flickr-Faces-HQ (FFHQ) Dataset: Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark forgenerative adversarial networks (GAN). The dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. It also has good coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr, thus inheriting all the biases of that website, and automatically aligned and cropped using dlib. (CVPR2019) A Style-Based Generator Architecture for Generative Adversarial Networks

For Head Pose Estimation

  • BIWI RGBD-ID Dataset: The BIWI RGBD-ID Dataset is a RGB-D dataset of people targeted to long-term people re-identification from RGB-D cameras. It contains 50 training and 56 testing sequences of 50 different people.
  • 300W-LP & AFLW2000-3D: 300W-LP has the synthesized large-pose face images from 300W. AFLW2000-3D is the fitted 3D faces of the first 2000 AFLW samples, which can be used for 3D face alignment evaluation.
  • CMU Panoptic Studio Dataset: Currently, 480 VGA videos, 31 HD videos, 3D body pose, and calibration data are available. PointCloud DB from 10 Kinects (with corresponding 41 RGB videos) is also available (6+ hours of data). Please refer the official website for details. Dataset paper link Panoptic studio: A massively multiview system for social interaction capture.

For Head Detection Only

  • HollywoodHead dataset: HolleywoodHeads dataset is a head detection datset. HollywoodHeads dataset contains 369846 human heads annotated in 224740 video frames from 21 Hollywood movies.
  • Brainwash dataset: Brainwash dataset is related for face detection. Brainwash dataset contains 11917 images with 91146 labeled people.
  • SCUT-HEAD-Dataset-Release: SCUT-HEAD is a large-scale head detection dataset, including 4405 images labeld with 111251 heads. The dataset consists of two parts. PartA includes 2000 images sampled from monitor videos of classrooms in an university with 67321 heads annotated. PartB includes 2405 images crawled from Internet with 43930 heads annotated.

For Head Detection or Crowd Counting

  • ShanghaiTech dataset: Dataset appeared in Single Image Crowd Counting via Multi Column Convolutional Neural Network(MCNN) in CVPR2016. 【情况介绍】:包含标注图片 1198 张,共 330165 人,分为 A 和 B 两个部分,A 包含 482 张图片,均为网络下载的含高度拥挤人群的场景图片,人群数量从 33 到 3139 个不等,训练集包含 300 张图片和测试集包含 182 张图片。B 包含 716 张图片,这些图片的人流场景相对稀疏,拍摄于街道的固定摄像头,群体数量从 12 到 578 不等。训练集包含 400 张图像,测试集包含 316 张图像。
  • UCF-QNRF - A Large Crowd Counting Data Set: It contains 1535 images which are divided into train and test sets of 1201 and 334 images respectively. Paper is published in ECCV2018. 【情况介绍】:这是最新发布的最大人群数据集。它包含 1535 张来自 Flickr、网络搜索和 Hajj 片段的密集人群图像。数据集包含广泛的场景,拥有丰富的视角、照明变化和密度多样性,计数范围从 49 到 12865 不等,这使该数据库更加困难和现实。此外,图像分辨率也很大,因此导致头部尺寸出现大幅变化。
  • UCSD Pedestrian Dataset: Video of people on pedestrian walkways at UCSD, and the corresponding motion segmentations. Currently two scenes are available. 【情况介绍】:由 2000 帧监控摄像机拍摄的照片组成,尺寸为 238×158。这个数据集的密度相对较低,每幅图像 11 到 46 人不等,平均约 25 人。在所有帧中,帧 601 到 1400 为训练集,其余帧为测试集。
  • Megvii CrowdHuman: CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.

2) Pioneers and Experts

👍Michael Black; 👍Jian Sun; 👍Gang YU; 👍Yuliang Xiu 修宇亮; 👍(website) face-rec

3) Related Materials (Papers, Sources Code, Blogs, Videos and Applications)

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Beautify Face

Materials

Papers

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Body Orientation Estimation

Materials

Papers

  • TUD(CVPR2010) Monocular 3D Pose Estimation and Tracking by Detection [paper link][TUD Dataset]

  • (ICCV2015) Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-Formations From Surveillance Videos [paper link]

  • AKRF-VW(IJCV2017) Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation [paper link]

  • CPOEHK(ISCAS2019) Continuous Pedestrian Orientation Estimation using Human Keypoints [paper link]

  • ❤ MEBOW(CVPR2020) MEBOW: Monocular Estimation of Body Orientation in the Wild [paper link][project link][codes|official][COCO-MEBOW dataset, Body Orientation Estimation]

  • PedRecNet(IV2022) PedRecNet: Multi-task deep neural network for full 3D human pose and orientation estimation [paper link][codes|official]

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Crowd Counting

Materials

Papers

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Eye Gaze Estimation

Materials

Papers

  • HGM(CVPR2018) A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation [paper link]

  • ETH-XGaze(ECCV2020) ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation [arxiv link][project link][Codes|PyTorch(official)]

  • EVE(ECCV2020) Towards End-to-end Video-based Eye-tracking [arxiv link][project link][Codes|PyTorch(official)]

  • MTGLS(WACV2022) MTGLS: Multi-Task Gaze Estimation With Limited Supervision [paper link]

  • RUDA(CVPR2022) Generalizing Gaze Estimation With Rotation Consistency [paper link]

  • ❤ GazeOnce/MPSGaze(CVPR2022) GazeOnce: Real-Time Multi-Person Gaze Estimation [paper link][codes|official][The MPSGaze is a synthetic dataset (ETH-XGaze + WiderFace) containing full images (instead of only cropped faces) that provides ground truth 3D gaze directions for multiple people in one image.]

  • ❤ GAFA(CVPR2022) Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination [paper link][project link][codes|official][The GAze From Afar (GAFA) dataset consists of surveillance videos of freely moving people with automatically annotated 3D gaze, head, and body orientations.]

  • NeRF-Gaze(arxiv2022) NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [paper link][HKVision]

  • GazeNeRF(arxiv2022) GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields [paper link][ETH]

  • PARKS-Gaze(arxiv2023) Towards Precision in Appearance-based Gaze Estimation in the Wild [paper link][code|official][PARKS-Gaze dataset]

  • CUDA-GHR(WACV2023) CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection [paper link][code|official]

  • 👍PJAE(ICCV2023) Interaction-aware Joint Attention Estimation Using People Attributes [paper link][arxiv link link][project link][code|official][Japan, Toyota Technological Institute and University of Hyogo]

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Face Alignment

Materials

Datasets

Papers

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Face Detection

Materials

Datasets

Papers

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Face Recognition

Materials

###3 Papers

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Face Reconstruction (3D)

Materials

Datasets

Survey

  • Survey of optimization-based methods(CGFroum2018) State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications [paper link][pdf page]

  • Survey of face models(TOG2020) 3D Morphable Face Models—Past, Present, and Future [paper link]pdf page]

  • Survey of regression-based methods(CSReview2021) Survey on 3D face reconstruction from uncalibrated images [paper link][pdf page]

  • Survey on SOTA 3D reconstruction with single RGB image (arxiv2022) State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [paper link]

Papers (Conference and Journey)

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Hand/Head/Person Detection

Materials

Papers

▶Applications

  • ThroughHand (CHI2021) ThroughHand: 2D Tactile Interaction to Simultaneously Recognize and Touch Multiple Objects [paper link][a novel tactile interaction that enables users with visual impairments to interact with multiple dynamic objects in real time, utilize the potential of the human tactile sense, enable users to perceive the objects using the palm]

  • 👍SoloFinger (CHI2021) SoloFinger: Robust Microgestures while Grasping Everyday Objects [paper link][project link][Input / Spatial Interaction / Practice Support, 36 everyday hand-object actions, simple SoloFinger gestures can relieve the need for complex finger configurations or delimiting gestures]

  • Gaze-Supported (CHI2021) Gaze-Supported 3D Object Manipulation in Virtual Reality [paper link][Input / Spatial Interaction / Practice Support, investigates integration, coordination, and transition strategies of gaze and hand input for 3D object manipulation in VR, help guide the design of future VR systems that incorporate gaze input for 3D object manipulation]

  • ARnnotate (UIST2022)(CCF-A) ARnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose Estimation [paper link][pdf link][Purdue University, application in Augmented Reality]

  • Ubi-TOUCH (UIST2023)(CCF-A) Ubi-TOUCH: Ubiquitous Tangible Object Utilization through Consistent Hand-object interaction in Augmented Reality [paper link][Purdue University, application in Augmented Reality]

  • InstruMentAR (CHI2023) InstruMentAR: Auto-Generation of Augmented Reality Tutorials for Operating Digital Instruments Through Recording Embodied Demonstration [paper link][pdf link][Purdue University, application in Augmented Reality]

▶Body/Person

including Crowd Person Detection, Pedestrian DetectionCrowded Pedestrian Detection

  • ReInspect, Lhungarian(CVPR2016) End-To-End People Detection in Crowded Scenes [arxiv link]

  • PRNet(ECCV2020) Progressive Refinement Network for Occluded Pedestrian Detection [paper link][code|official][for Crowded Human Detection]

  • Pedestron(CVPR2021) Generalizable Pedestrian Detection: The Elephant In The Room [paper link][code|official][Pedestrian Detection]

  • OTP-NMS(TIP2023) OTP-NMS: Towards Optimal Threshold Prediction of NMS for Crowded Pedestrian Detection [paper link][CrowdHuman and CityPersons datasets, HNU]

  • VLPD(CVPR2023) VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision [arxiv link][code|official][Vision-Language semantic self-supervision for context-aware Pedestrian Detection]

  • LSFM (Localized Semantic Feature Mixers)(CVPR2023) Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving [paper link][Caltech, CityPersons, Euro City Persons, and TJU-Traffic-Pedestrian datasets][LSFM beats the human baseline for the first time in the history of pedestrian detection]

  • SSCP (Sample Selection for Crowded Pedestrians)(arxiv2023.05) Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection [arxiv link][Crowdhuman and Citypersons datasets]

  • OPL (Optimal Proposal Learning)(CVPR2023) Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection [paper link][code is not available][BUPT]

  • LOAF (ICCV2023) Large-Scale Person Detection and Localization Using Overhead Fisheye Cameras [paper link][project link][arxiv link][code|official][dataset, BUPT-PRIV]

▶Hand Part

including Hand Detection, Hand Tracking, Hand-Object Contact, Hand Pressure Estimation, Hand-Object Interaction, Hand Contact Reconstruction and Hand-Object Manipulation

▶Head Part

including Head Detection, Head Counting

▶Human Parts

including Human-Parts Detection, Human Activity Understanding, Human and Object Reconstruction, Human-Aware Object Placement, Human-Scene Contact, Human-Object Contact, Human-Object Interaction Tracking and Close Human Interaction

  • DID-Net(ACCV2018) Detector-in-Detector: Multi-level Analysis for Human-Parts [paper link][code | official][HumanParts dataset]

  • PROX(ICCV2019) Resolving 3D Human Pose Ambiguities with 3D Scene Constraints [paper link][project link][MPII, The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation.]

  • Hier-R-CNN(TIP2020) Hier R-CNN: Instance-Level Human Parts Detection and A New Benchmark [paper link][code|official][Mask R-CNN as Backbone][FCOS as Hier Branch which needs many hand-crafted tricks][COCOHumanParts dataset]

  • ContactDynamics(ECCV2020) Contact and Human Dynamics from Monocular Video [paper link][project link][code|official][Stanford University, Adobe Research]

  • PaStaNet(CVPR2020) PaStaNet: Toward Human Activity Knowledge Engine [paper link][project link][SJTU, body-part state annotations in the context of HOI][HAKE 1.0 (Human Activity Knowledge Engine) dataset]

  • CHORE(ECCV2022) CHORE: Contact, Human and Object Reconstruction from a Single RGB Image [paper link][project link][MPII, single-person, reason the interactions and recover the spatial arrangement, fine-grained contacts between the human and the object]

  • MOVER(CVPR2022) Human-Aware Object Placement for Visual Environment Reconstruction [paper link][project link][code|official][human-scene interactions (HSIs), MPII]

  • 👍BSTRO(Body-Scene contact TRansfOrmer)(CVPR2022) Capturing and Inferring Dense Full-Body Human-Scene Contact [paper link][project link][code|official][dataset RICH, Interaction-Contact-Humans, MPII, single-person]

  • HAKE(TPAMI2023) HAKE: A Knowledge Engine Foundation for Human Activity Understanding [paper link][arxiv link][project link][HAKE 2.0 (Human Activity Knowledge Engine) dataset]

  • 👍HOT(CVPR2023) Detecting Human-Object Contact in Images [paper link][project link][马普所, HOT dataset, single-person]

  • VisTracker(CVPR2023) Visibility Aware Human-Object Interaction Tracking from Single RGB Camera [arxiv link][project link][MPII, An approach to jointly track the human, the object and the contacts between them, in 3D, from a monocular RGB video.]

  • Hi4D(Humans interacting in 4D)(CVPR2023) Hi4D: 4D Instance Segmentation of Close Human Interaction [arxiv link][project link][ETH Zürich, A dataset of humans in close physical interaction]

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Hand Pose Estimation

also 2D/3D Hand Keypoints Detection or Hand Shape Estimation or 3D Hand Shape and Pose Regression

Materials

  • 👍 (github)(Hand3DResearch) Recent Progress in 3D Hand Tasks [github link]
  • (github) awesome 3d human reconstruction --> 3d_human_hand [github link]

Datasets

Papers

▶Hand Related Survey
▶Hand Modeling Methods
  • MANO (TOG2017, SIGGRAPH ASIA 2017) Embodied Hands: Modeling and Capturing Hands and Bodies Together [paper link][arxiv link][project link (keep updating)][MPII, It attempts to learn hand shape variation with Linear Blend Skinning (LBS) [SIGGRAPH 2000]][it learns from a large variety of high-quality hand scans and represents the geometric changes in the low-dimensional pose and shape space]

  • NIMBLE (TOG2022) NIMBLE: A Non-rigid Hand Model with Bones and Muscles [paper link][arxiv link][project link][code|official][ShanghaiTech University]

▶Hand Keypoints Detection
  • hand3d(ICCV2017) Learning to Estimate 3D Hand Pose From Single RGB Images [paper link][arxiv link][project link][code|official][University of Freiburg, new dataset Rendered Hand Pose Dataset (RHD), 3D Hand Keypoints Detection]
▶3D Hand Reconstruction

also 3D Hand Shape and Pose Regression

  • (ECCV2018) Hand Pose Estimation via Latent 2.5D Heatmap Regression [paper link][arxiv link][No code is available, NVIDIA]

  • (CVPR2019) 3D Hand Shape and Pose From Images in the Wild [paper link][arxiv link][No code is available, based on MANO]

  • (CVPR2019) Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering [paper link][arxiv link][No code is available, based on MANO]

  • Hand+Object(CVPR2019) H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions [paper link][arxiv link][No code is available, 6DoF Object Pose Estimation + 3D Hand Keypoints Detection]

  • 👍hand-graph-cnn(CVPR2019) 3D Hand Shape and Pose Estimation From a Single RGB Image [paper link][arxiv link][code|official][based on MANO, 2D/3D Hand Keypoints Detection + 3D Hand Mesh]

  • HAMR(ICCV2019) End-to-End Hand Mesh Recovery From a Monocular RGB Image [paper link][arxiv link][code|official][based on MANO, 2D/3D Hand Keypoints Detection + 3D Hand Mesh]

  • 👍MobileHand(ICONIP2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image [paper link][project link][code|official][anyang Technological University][based on MANO, 2D/3D Hand Keypoints Detection + 3D Hand Mesh]

  • I2L-MeshNet(ECCV2020) I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image [paper link][arxiv link][code|official][Seoul National University, whole body and related hands]

  • mesh_hands(CVPR2020) Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild [paper link][arxiv link][project link][based on MANO]

  • RGB2Hands(SIGGRAPH Asia 2020) RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [paper link][arxiv link][project link][new dataset RGB2Hands, based on MANO]

  • InterShape(ICCV2021) Interacting Two-Hand 3D Pose and Shape Reconstruction From Single Color Image [paper link][pdf link][project link][code|official][Yangang Wang, based on MANO, using the dataset InterHand2.6M]

  • 👍MobRecon(CVPR2022) MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image [paper link][arxiv link][code|official][Kuaishou Technology]

  • IntagHand(CVPR2022) Interacting Attention Graph for Single Image Two-Hand Reconstruction [paper link][arxiv link][code|official][based on MANO, using the dataset InterHand2.6M]

  • 👍HandOccNet(CVPR2022) HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network [paper link][arxiv link][code|official][based on MANO]

  • MeMaHand(CVPR2023) MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction [paper link][arxiv link][ByteDance, No code is available, based on MANO, compared to IntagHand and InterShape]

  • Im2Hands(CVPR2023) Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes [paper link][arxiv link][project link][code|official][KAIST, compared to IntagHand, based on HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands (3DV 2021) and Occupancy Networks]

  • ACR(CVPR2023) ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction [paper link][arxiv link][code|official][Tencent AI Lab, based on MANO, compared to IntagHand]

  • InterWild(CVPR2023) Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild [paper link][arxiv link][code|official][facebookresearch, single author Gyeongsik Moon, based on MANO, compared to IntagHand]

  • H2ONet(CVPR2023) H2ONet: Hand-Occlusion-and-Orientation-aware Network for Real-time 3D Hand Mesh Reconstruction [paper link][code|official][CUHK, first author Hao XU (徐昊), tested on datasets DexYCB and HO3D]

  • DIR (ICCV2023 Oral) Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image [paper link][arxiv link][project link][code|official][PICO IDL ByteDance + BUPT]

  • 👍HaMeR(CVPR2024)(arxiv2023.12) Reconstructing Hands in 3D with Transformers [arxiv link][project link][code|official][the first author Georgios Pavlakos, University of California, Berkeley, a new dataset HInt which is built by sampling frames from New Days of Hands, EpicKitchens-VISOR and Ego4D and annotating the hands with 2D keypoints.]

  • HMP(WACV2024) HMP: Hand Motion Priors for Pose and Shape Estimation From Video [paper link][project link][code|official][MPII, taking video as the input, tested on datasets HO3D and DexYCB, mainly focusing on hand occlusions]

  • Ev2Hands(3DV2024) 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera [arxiv link][project link][code|official][MPII, a new synthetic large-scale dataset of two interacting hands, Ev2Hands-S, and a new real benchmark with real event streams and ground-truth 3D annotations, Ev2Hands-R.]

  • OHTA(CVPR2024)(arxiv2024.02) OHTA: One-shot Hand Avatar via Data-driven Implicit Priors [arxiv link][project link][code|official][ByteDance][To test OHTA’s performance for the challenging in-the-wild images, they take the whole-body version of MSCOCO for experiments. They utilize the pose estimation results provided by InterWild and generate the masks using SAM]

▶Sign Language Understanding

including Sign Language Recognition and Sign Language Translation

  • BSL(ECCV2020) BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues [paper link]

  • HMA(AAAI2021) Hand-Model-Aware Sign Language Recognition [paper link][Sign Language Recognition (SLR)]

  • SignBERT (ICCV2021) SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition [paper link][arxiv link][Sign Language Recognition (SLR)]

  • 👍SignBERT+ (TPAMI2023) SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding [paper link][arxvi link][project link][Sign Language Understanding (SLU)]

-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-|-+-

▶ Head Pose Estimation

Materials

Datasets

Papers(Survey)

  • Survey(TPAMI2009) Head Pose Estimation in Computer Vision: A Survey [paper link][CSDN blog]

  • Survey(SPI2021) Head pose estimation: A survey of the last ten years [paper link]

  • Survey(PR2022) Head pose estimation: An extensive survey on recent techniques and applications [paper link]

Papers(Journal)

  • HyperFace(TPAMI2017) HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition [paper link]

  • (Neurocomputing2018) Appearance based pedestrians head pose and body orientation estimation using deep learning [paper link][eight orientation bins]

  • HeadFusion(TPAMI2018) HeadFusion: 360 Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction [paper link]

  • QuatNet(TMM2019) Quatnet: Quaternion-based head pose estimation with multiregression loss [paper link][unit quaternion representation]

  • (IVC2020) Improving head pose estimation using two-stage ensembles with top-k regression [paper link]

  • MLD(TPAMI2020) Head Pose Estimation Based on Multivariate Label Distribution [paper link]

  • MNN(TPAMI2021) Multi-Task Head Pose Estimation in-the-Wild [paper link][codes|Tensorflow / C++]

  • MFDNet(TMM2021) MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for Head Pose Estimation [paper link][matrix representation]

  • 2DHeadPose(NN2023) 2DHeadPose: A simple and effective annotation method for the head pose in RGB images and its dataset [paper link][codes|official][annotation tool, dataset, and source code]

  • 6dof_face(TIP2023) Towards 3D Face Reconstruction in Perspective Projection: Estimating 6DoF Face Pose from Monocular Image [paper link][code|official]

  • CIT(IJCV2023) Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose [paper link][code|official][SYSU, Facial Landmark + Head Pose]

  • TokenHPE(TIP2023) Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer [paper link][The journal version of the conference paper TokenHPE(CVPR2023)]

  • OPAL(SRHP+WRHP)(PR2024) On the representation and methodology for wide and short range head pose estimation [paper link][arxiv link][Universidad Politécnica de Madrid]

  • HeadDiff(TIP2024) HeadDiff: Exploring Rotation Uncertainty with Diffusion Models for Head Pose Estimation [paper link][Ningxia University]

  • HHP-Net-Plus(CVIU2024) Head pose estimation with uncertainty and an application to dyadic interaction detection [paper link][code|official][Università degli Studi di Genova, Italy, the extended journal of HHP-Net(WACV2022)]

Papers(Conference)

  • (ITSC2014) Head detection and orientation estimation for pedestrian safety [paper link]

  • Dlib(68 points)(CVPR2014) One Millisecond Face Alignment with an Ensemble of Regression Trees [paper link]

  • 3DDFA(CVPR2016) Face Alignment Across Large Poses: A 3D Solution [paper link]

  • FAN(12 points)(ICCV2017) How Far Are We From Solving the 2D & 3D Face Alignment Problem? (And a Dataset of 230,000 3D Facial Landmarks) [paper link]

  • KEPLER(FG2017) KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors [paper link]

  • FasterRCNN+regression(ACCV2018) Simultaneous Face Detection and Head Pose Estimation: A Fast and Unified Framework [paper link][dataset|AFW and ALFW dataset: from coarse face pose by using Subcategory to generate 12 clusters to fine Euler angles prediction][following the HyperFace]

  • WNet(ACCVW2018) WNet: Joint Multiple Head Detection and Head Pose Estimation from a Spectator Crowd Image [paper link][dataset|spectator crowd S-HOCK dataset: rough orientation labels]

  • SSR-Net-MD(IJCAI2018) SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation [paper link][codes|Tensorflow+Dlib+MTCNN][Inspiring the FSA-Net]

  • HopeNet(CVPRW2018) Fine-Grained Head Pose Estimation Without Keypoints [arxiv link][Codes|PyTorch(official)][CSDN blog]

  • HeadPose(FG2019) Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment [paper link][codes|TensorFlow]

  • FSA-Net(CVPR2019) FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image [paper link][Codes|Keras&Tensorflow(official)][Codes|PyTorch(unofficial)]

  • PADACO(ICCV2019) Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces [paper link][project link][SynHead and BIWI --> SynHead++, SynBiwi+, Biwi+]

  • WHENet(BMVC2020) WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose [arxiv link][Codes|Kears&tensorflow(official)][codes|PyTorch(unofficial)][codes|DMHead(unofficial)]

  • RAFA-Net(ACCV2020) Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose [paper link][codes|keras+tensorflow]

  • FDN(AAAI2020) FDN: Feature decoupling network for head pose estimation [paper link]

  • Rankpose(BMVC2020) RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation [paper link][codes|PyTorch][vector representation]

  • 3DDFA_V2(ECCV2020) Towards Fast, Accurate and Stable 3D Dense Face Alignment [paper link][codes|PyTorch 3DDFA_V2][3D Dense Face Alignment, 3D Face Reconstruction, 3DMM, Lightweight]

  • EVA-GCN(CVPRW2021) EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks [paper link][codes|PyTorch]

  • TriNet(WACV2021) A Vector-Based Representation to Enhance Head Pose Estimation [paper link][codes|Tensorflow+Keras][vector representation]

  • img2pose(CVPR2021) img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation [paper link][codes|PyTorch]

  • OsGG-Net(ACMMM2021) OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation [paper link][codes|PyTorch]

  • (KSE2021) Simultaneous face detection and 360 degree head pose estimation [paper link]【文章使用了FPN+Multi-task的方式,同时检测人头和识别人头姿态,数据集主要使用了CMU-Panoptic,300WLP和BIWI。头姿表示形式上,除了欧拉角,还使用了Rotation Matrix】

  • (KSE2021) UET-Headpose: A sensor-based top-view head pose dataset [paper link] 【全文均在阐述获取数据集的硬件系统,但数据集未公布;HPE算法为FSA-Net,并根据WHENet中的思路拓展为full-range 360°单人头部姿态估计方法】

  • (FG2021) Relative Pose Consistency for Semi-Supervised Head Pose Estimation [paper link][pdf link][Semi-Supervised]

  • HeadPosr(FG2021) HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders [paper link][arxiv link][Naina Dhingra]

  • SynergyNet(3DV2021) Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry [paper link][project link][codes|PyTorch]

  • MOS(BMVC2021) MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation [paper link][codes|PyTorch][re-annotate the WIDER FACE with head pose label]

  • LwPosr(WACV2022) LwPosr: Lightweight Efficient Fine Grained Head Pose Estimation [paper link][Naina Dhingra]

  • HHP-Net(WACV2022) HHP-Net: A Light Heteroscedastic Neural Network for Head Pose Estimation With Uncertainty [paper link][codes|TensorFlow]

  • 6DRepNet(ICIP2022) 6D Rotation Representation For Unconstrained Head Pose Estimation [paper link][codes|PyTorch+RepVGG][Journal Version (6DRepNet360) --> Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation][vector representation]

  • DAD-3DNet(CVPR2022) DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image [paper link][project link👍][codes|official PyTorch][benchmark challenge👍][DAD-3DHeads dataset, by pinatafarm][used as an off-the-shelf head pose estimator in HairNeRF(ICCV2023)]

  • TokenHPE(CVPR2023) TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers [paper link][code|official][Transformer-based method]

  • DSFNet(CVPR2023) DSFNet: Dual Space Fusion Network for Occlusion-Robust Dense 3D Face Alignment [paper link][arxiv link][paperwithcode link][code|official][Head Pose Estimation + Face Alignment + 3D Face Reconstruction]

  • PFA(arxiv2023.08) 3D Face Alignment Through Fusion of Head Pose Information and Features [arxiv link][Soongsil University]

  • OrdinalRegression(ICASSP2024) Language-Driven Ordinal Learning for Imbalanced Head Pose Estimation [paper link][Ningxia University]

  • StructuredLight(ICASSP2024) Adaptive Head Pose Estimation with Real-Time Structured Light [paper link][Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China]

  • FaceXFormer(arxiv2024.03) FaceXFormer: A Unified Transformer for Facial Analysis [arxiv link][project link][code|official][Johns Hopkins University]