Vision | |||
2014 | VAE | Kingma and Welling | [✓] Training on MNIST [✓] Encoder output visualization [✓] Decoder output visualization |
2015 | CAM | Zhou et al. | [✓] Application to GoogleNet |
2016 | Gatys et al., 2016 | Gatys et al. | [✓] Application to VGGNet-19 |
YOLO | Redmon et al. | [✗] Training on VOC 2012 [✗] Class probability map [✗] Ground truth vlisualization on grid |
|
DCGAN | Radford et al. | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Latent space interpolation |
|
Noroozi et al., 2016 | Noroozi et al. | [✓] Architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al., 2016 | Zhang et al. | [✓] Empirical probability distribution [✗] Color space |
|
2014 2017 |
Conditional GAN WGAN-GP |
Mirza et al. Gulrajani et al. |
[✓] Training on MNIST |
2016 2017 |
PixelCNN VQ-VAE |
Oord et al. Oord et al. |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 |
2017 | Pix2Pix | Isola et al. | [✓] Training on Google Maps [✓] Training on Facades [✗] Inference on larger resolution |
CycleGAN | Zhu et al. | [✓] Training on Monet to photo [✓] Training on Vangogh to photo [✓] Training on Cezanne to photo [✓] Training on Ukiyo-e to photo [✓] Training on Horse to zebra [✓] Training on Summer to winter |
|
Noroozi et al., 2017 | Noroozi et al. | [✓] Constrastive loss | |
2018 | PGGAN | Karras et al. | [✓] Training on CelebA-HQ at 512 × 512 |
DeepLab v3 | Chen et al. | [✓] Training on VOC 2012 [✓] Prediction on VOC 2012 validation set [✓] Average mIoU |
|
PixelLink | Deng et al. | [✓] Architecture [✓] Instance-balanced cross entropy loss [✓] Post-processing |
|
RotNet | Gidaris et al | [✓] Attention map visualization | |
2020 | STEFANN | Roy et al. | [✓] FANnet architecture [✓]Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM |
DDPM | Ho et al. | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Denoising process visualization [✓] Linear interpolation sampling [✓] Coarse-to-fine sampling |
|
DDIM | Song et al. | [✓] Sampling [✓] Spherical interpolation sampling [✓] Interpolation on grid sampling [✓] Truncated normal |
|
ViT | Dosovitskiy et al. | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Attention Roll-out [✓] Position embedding similarity [✓] Position embedding interpolation Extra [✓] CutOut [✓] Hide-and-Seek [✓] CutMix |
|
SimCLR | Chen et al. | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR | Carion et al. | [✓] Architecture [✗] Batch normalization freezing [✗] Data preparation [✗] Training on COCO 2017 |
|
2021 | Improved DDPM | Nichol and Dhariwal | [✓] Cosine diffusion schedule |
Classifier-Guidance | Dhariwal and Nichol | [✗] AdaGN [✗] BiGGAN Upsample/Downsample [✗] Improved DDPM sampling [✗] Conditional/Unconditional models [✗] Super-resolution model [✗] Interpolation |
|
ILVR | Choi et al. | [✓] Sampling from single reference [✓] Sampling from various scale factors [✓] Sampling from various conditioning range |
|
SDEdit | Meng et al. | [✓] User input stroke simulation | |
MAE | He et al. | [✓] MAE architecture for pre-training [✗] MAE architecture for self-supervised learning [✗] Training on ImageNet-1K [✗] Fine-tuning [✗] Linear probing |
|
Copy-Paste | Ghiasi et al. | [✓] Large scale jittering [✓] Copy-Paste (within mini-batch) [✗] Gaussian filter |
|
ViViT | Arnab et al. | ||
2022 | CFG | Ho et al. | |
Language | |||
2017 | Transformer | Vaswani et al. | [✓] Architecture [✓] Position encoding visualization |
2019 | BERT | Devlin et al. | [✓] BookCorpus data pre-processing [✓] Architecture [✓] Masked language modeling [✓] SQuAD data pre-processing [✓]SWAG data pre-processing |
Sentence-BERT | Reimers et al. | [✓] Classification loss [✓] Regression loss [✓] Constrastive loss [✓] STSb data pre-processing [✓] WikiSection data pre-processing [✗] NLI data pre-processing |
|
RoBERTa | Liu et al. | [✓] BookCorpus data pre-processing [✓] Masked language modeling [✗] BookCorpus data pre-processing (SEGMENT-PAIR + NSP) [✗] BookCorpus data pre-processing (SENTENCE-PAIR + NSP) [✓] BookCorpus data pre-processing (FULL-SENTENCES) [✗] BookCorpus data pre-processing (DOC-SENTENCES) |
|
Vision-Language | |||
2021 | CLIP | Radford et al. | [✓] Training on Flickr8k + Flickr30k [✓] Zero-shot classification on ImageNet1k (mini) [✓] Linear classification on ImageNet1k (mini) |
Block or Report
Block or report KimRass
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned Loading
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.