Skip to content

Latest commit

 

History

History
133 lines (110 loc) · 25 KB

File metadata and controls

133 lines (110 loc) · 25 KB

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

List of Rankings

Each ranking includes only the best model for one method.

Monocular Depth Estimation Rankings

  1. DA-2K (mostly 1500×2000): Acc (%)>=86
  2. UnrealStereo4K (3840×2160): AbsRel<=0.04
  3. MVS-Synth (1920×1080): AbsRel<=0.06
  4. HRSD (1920×1080): AbsRel<=0.08
  5. Middlebury2021 (1920×1080): SqRel<=0.5
  6. NYU-Depth V2 (640×480): OPW<=0.31
  7. NYU-Depth V2 (640×480): AbsRel<=0.058

2D to 3D Video Conversion Rankings

I. Video Inpainting Rankings

  • (to do)

II. Light Field Video Reconstruction from Monocular Video Rankings

  1. 👑 4DLFVD with up to 10×10 real light field views✔️: LPIPS😍 (no data)
    This will be the King of all rankings. We look forward to ambitious researchers.
  2. 4DLFVD with up to 10×10 real light field views✔️: PSNR😞 (no data)
  3. Hybrid with 7×7 synthetic light field views✖️: LPIPS😍 (no data)
  4. Hybrid with 7×7 synthetic light field views✖️: PSNR😞>=32dB

Appendices


DA-2K (mostly 1500×2000): Acc (%)>=86

RK     Model      Acc (%) ↑ 
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 Depth Anything V2 Giant
CVPR
ENH:
arXiv
Backbone:
DINOv2 (ViT-G/14)
97.4 {1}
arXiv
Pretraining: BlendedMVS & Hypersim & IRS & TartanAir & VKITTI 2
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars
ENH:
GitHub Stars
- -
2 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
88.1 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
3 Marigold
CVPR
Backbone:
Stable Diffusion v2
86.8 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -

Back to Top Back to the List of Rankings

UnrealStereo4K (3840×2160): AbsRel<=0.04

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 ZoeDepth +PFR=128
arXiv
ENH:
CVPR
0.0388 {1}
CVPR
ENH:
UnrealStereo4K
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

MVS-Synth (1920×1080): AbsRel<=0.06

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 ZoeDepth +PFR=128
arXiv
ENH:
CVPR
0.0589 {1}
CVPR
ENH:
MVS-Synth
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

HRSD (1920×1080): AbsRel<=0.08

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 DPT-B + R + AL
ICCV
ENH:
CVPRW
0.074 {1}
CVPRW
ENH:
HRSD
GitHub Stars
ENH:
-
- -

Back to Top Back to the List of Rankings

Middlebury2021 (1920×1080): SqRel<=0.5

RK     Model       SqRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 LeReS-GBDMF
CVPR
ENH:
AAAI
0.444 {1}
AAAI
ENH:
HR-WSI
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

NYU-Depth V2 (640×480): OPW<=0.31

RK     Model       OPW ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 FutureDepth
arXiv
Backbone:
Swin-L
0.303 {4}
arXiv
NYU-Depth V2 - - -

Back to Top Back to the List of Rankings

NYU-Depth V2 (640×480): AbsRel<=0.058

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1-2 BetterDepth
arXiv
Backbone:
Depth Anything & Marigold
0.042 {1}
arXiv
Hypersim & Virtual KITTI - - -
1-2 Metric3D v2 CSTM_label
ICCV
ENH:
arXiv
Backbone:
DINOv2 with registers (ViT-L/14)
0.042 {1}
arXiv
DDAD & Lyft & Driving Stereo & DIML & Arogoverse2 & Cityscapes & DSEC & Mapillary PSD & Pandaset & UASOL & Virtual KITTI & Waymo & Matterport3d & Taskonomy & Replica & ScanNet & HM3d & Hypersim GitHub Stars - -
3 Depth Anything Large
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.043 {1}
CVPR
Pretraining: BlendedMVS & DIML & HR-WSI & IRS & MegaDepth & TartanAir
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars - -
4 MiDaS v3.1 BEiTL-512
TPAMI
ENH:
arXiv
Backbone:
BEiT512-L (ViT-L/16)
0.048 {1}
CVPR
Pretraining: ReDWeb & HR-WSI & BlendedMVS & NYU-Depth V2 & KITTI
Training: ReDWeb & DIML & 3D Movies & MegaDepth & WSVD & TartanAir & HR-WSI & ApolloScape & BlendedMVS & IRS & NYU-Depth V2 & KITTI
GitHub Stars - PyTorch
GitHub Stars
5 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
0.052 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
6 Marigold
CVPR
Backbone:
Stable Diffusion v2
0.055 {1}
CVPR
Hypersim & Virtual KITTI GitHub Stars - -
7 GenPercept
arXiv
Backbone:
Stable Diffusion v2.1
0.056 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -
8 NeWCRFs + LightedDepth
CVPR
ENH:
CVPR
0.057 {2}
CVPR
ENH:
NYU-Depth V2
GitHub Stars
ENH:
GitHub Stars
- -
9 UniDepth-V
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.0578 {1}
CVPR
A2D2 & Argoverse2 & BDD100k & CityScapes & DrivingStereo & Mapillary PSD & ScanNet & Taskonomy & Waymo GitHub Stars - -

Back to Top Back to the List of Rankings

Hybrid with 7×7 synthetic light field views✖️: PSNR😞>=32dB

RK     Model        PSNR ↑   
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 LFVRT
ECCV
MDE: DPT
ICCV
Backbone:
ViT
32.66 {3+1D}
ECCV
GoPro & TAMULF GitHub Stars
MDE:
GitHub Stars
- -

📝 Note: The above ranking includes only one model, as the other methods are image-based and don't have any temporal information making them unsuitable for light field video reconstruction from monocular video.

Back to Top Back to the List of Rankings

Appendix 3: List of all research papers from the above rankings

Method Paper     Venue    
BetterDepth BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation arXiv
Depth Anything Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR
Depth Anything V2 Depth Anything V2 arXiv
DPT Vision Transformers for Dense Prediction ICCV
FutureDepth FutureDepth: Learning to Predict the Future Improves Video Depth Estimation arXiv
GBDMF Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition AAAI
GenPercept Diffusion Models Trained with Large Data Are Transferable Visual Models arXiv
GeoWizard GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image arXiv
LeReS Learning to Recover 3D Scene Shape from a Single Image CVPR
LightedDepth LightedDepth: Video Depth Estimation in light of Limited Inference View Angles CVPR
LFVRT Synthesizing Light Field Video from Monocular Video ECCV
Marigold Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation CVPR
Metric3D Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ICCV
Metric3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation arXiv
MiDaS Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer TPAMI
MiDaS v3.1 MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation arXiv
NeWCRFs Neural Window Fully-connected CRFs for Monocular Depth Estimation CVPR
PatchFusion PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation CVPR
R + AL High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation CVPRW
UniDepth UniDepth: Universal Monocular Metric Depth Estimation CVPR
ZoeDepth ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth arXiv

Back to Top Back to the List of Rankings