[models] Vit: fix intermediate size scale and unify TF to PT #1063

felixdittrich92 · 2022-09-16T07:32:52Z

This PR:

fix intermediate size scaling before: PFF (MLP) dim always 768 (same as d_model) but base model has a size from 3072 so scale x 4
unify TF with PT

Any feedback is welcome 🤗

PT: (@frgfm thanks for torch-scan 👍 )

__________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #
==========================================================================================
visiontransformer            VisionTransformer     (-1, 126)                 0
├─0                          PatchEmbedding        (-1, 65, 768)             37,632
├─1                          EncoderBlock          (-1, 65, 768)             85,022,208
├─2                          ClassifierHead        (-1, 126)                 96,894
==========================================================================================
Trainable params: 85,207,422
Non-trainable params: 0
Total params: 85,207,422
------------------------------------------------------------------------------------------
Model size (params + buffers): 325.04 Mb
Framework & CUDA overhead: 1575.00 Mb
Total RAM usage: 1900.04 Mb
------------------------------------------------------------------------------------------
Floating Point Operations on forward: 6.25 MFLOPs
Multiply-Accumulations on forward: 405.06 kMACs
Direct memory accesses on forward: 102.08 MDMAs
__________________________________________________________________________________________

TF:

_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 patch_embedding (PatchEmbed  (1, 65, 768)             88320
 ding)

 encoder_block (EncoderBlock  (1, 65, 768)             85022208
 )

 classifier_head (Classifier  (1, 126)                 96894
 Head)

=================================================================
Total params: 85,207,422
Trainable params: 85,207,422
Non-trainable params: 0

As you can see the models are similar (only PatchEmbed is different PT: -> linear proj / TF Conv2D proj)
PT model compared with timm's implementation our: ~6,5 GB VRAM timm's: ~7GB VRAM
TF model: ~15GB VRAM @frgfm do you know any reason why ? 😅

Additional timm's implementation:

__________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #        
==========================================================================================
visiontransformer            VisionTransformer     (-1, 126)                 0              
├─patch_embed                PatchEmbed            (-1, 64, 768)             37,632         
├─pos_drop                   Dropout               (-1, 65, 768)             0              
├─blocks                     Sequential            (-1, 65, 768)             85,054,464     
├─norm                       LayerNorm             (-1, 65, 768)             1,536          
├─fc_norm                    Identity              (-1, 768)                 0              
├─head                       Linear                (-1, 126)                 96,894         
==========================================================================================
Trainable params: 85,241,214
Non-trainable params: 0
Total params: 85,241,214
------------------------------------------------------------------------------------------
Model size (params + buffers): 325.17 Mb
Framework & CUDA overhead: 1575.00 Mb
Total RAM usage: 1900.17 Mb
------------------------------------------------------------------------------------------
Floating Point Operations on forward: 10.71 MFLOPs
Multiply-Accumulations on forward: 2.66 MMACs
Direct memory accesses on forward: 108.10 MDMAs

with this PR: (TF is mostly identical)

(doctr-dev) felix@felix-GS66-Stealth-11UH:~/Desktop/doctr$ python3 /home/felix/Desktop/doctr/references/classification/train_pytorch.py vit_b
2022-09-16 10:03:43.296373: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Namespace(amp=False, arch='vit_b', batch_size=64, device=None, epochs=10, export_onnx=False, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, name=None, pretrained=False, push_to_hub=False, resume=None, sched='cosine', show_samples=False, test_only=False, train_samples=1000, val_samples=20, vocab='french', wb=False, weight_decay=0, workers=None)
Validation set loaded in 0.2844s (2520 samples in 40 batches)
Train set loaded in 0.2508s (126000 samples in 1968 batches)
Validation loss decreased inf --> 0.991241: saving state...                                                                                                     
Epoch 1/10 - Validation loss: 0.991241 (Acc: 70.79%)
Validation loss decreased 0.991241 --> 0.758742: saving state...                                                                                                
Epoch 2/10 - Validation loss: 0.758742 (Acc: 77.86%)
Epoch 3/10 - Validation loss: 1.20299 (Acc: 71.55%)                                                                                                             
Validation loss decreased 0.758742 --> 0.347141: saving state...                                                                                                
Epoch 4/10 - Validation loss: 0.347141 (Acc: 86.87%)
Validation loss decreased 0.347141 --> 0.308255: saving state...                                                                                                
Epoch 5/10 - Validation loss: 0.308255 (Acc: 88.69%)
Validation loss decreased 0.308255 --> 0.277491: saving state...                                                                                                
Epoch 6/10 - Validation loss: 0.277491 (Acc: 89.88%)
Validation loss decreased 0.277491 --> 0.153586: saving state...                                                                                                
Epoch 7/10 - Validation loss: 0.153586 (Acc: 94.96%)
Validation loss decreased 0.153586 --> 0.0993369: saving state...                                                                                               
Epoch 8/10 - Validation loss: 0.0993369 (Acc: 96.79%)
Validation loss decreased 0.0993369 --> 0.0867528: saving state...                                                                                              
Epoch 9/10 - Validation loss: 0.0867528 (Acc: 97.06%)
Validation loss decreased 0.0867528 --> 0.0744964: saving state...                                                                                              
Epoch 10/10 - Validation loss: 0.0744964 (Acc: 97.90%)

(doctr-dev-tf) felix@felix-GS66-Stealth-11UH:~/Desktop/doctr$ python3 /home/felix/Desktop/doctr/references/classification/train_tensorflow.py vit_b
Namespace(amp=False, arch='vit_b', batch_size=64, epochs=10, export_onnx=False, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, name=None, pretrained=False, push_to_hub=False, resume=None, show_samples=False, test_only=False, train_samples=1000, val_samples=20, vocab='french', wb=False, workers=None)
Validation set loaded in 1.145s (2520 samples in 40 batches)
Train set loaded in 1.148s (126000 samples in 1968 batches)
Validation loss decreased inf --> 0.142181: saving state...                                                                                                     
Epoch 1/10 - Validation loss: 0.142181 (Acc: 95.83%)
Validation loss decreased 0.142181 --> 0.0494551: saving state...                                                                                               
Epoch 2/10 - Validation loss: 0.0494551 (Acc: 98.21%)
Validation loss decreased 0.0494551 --> 0.0102294: saving state...                                                                                              
Epoch 3/10 - Validation loss: 0.0102294 (Acc: 99.44%)

codecov · 2022-09-16T07:48:54Z

Codecov Report

Merging #1063 (717c4ba) into main (a95baaa) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1063      +/-   ##
==========================================
+ Coverage   95.16%   95.17%   +0.01%     
==========================================
  Files         141      141              
  Lines        5827     5821       -6     
==========================================
- Hits         5545     5540       -5     
+ Misses        282      281       -1

Flag	Coverage Δ
unittests	`95.17% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/models/classification/vit/pytorch.py	`100.00% <ø> (ø)`
doctr/models/modules/transformer/pytorch.py	`100.00% <ø> (ø)`
doctr/models/modules/transformer/tensorflow.py	`99.03% <ø> (ø)`
doctr/models/modules/vision_transformer/pytorch.py	`100.00% <ø> (ø)`
doctr/models/classification/vit/tensorflow.py	`100.00% <100.00%> (+2.17%)`	⬆️
doctr/transforms/modules/base.py	`94.59% <0.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

frgfm

Great work Felix 👏
one comment related to VIT PRs, and another on a docstring typo!

Regarding TF, from the graph, it looks like the patch embedding is not efficient memory wise (as it's the only structural diff)

doctr/models/classification/vit/tensorflow.py

doctr/models/classification/vit/pytorch.py

doctr/models/classification/vit/tensorflow.py

doctr/models/modules/vision_transformer/pytorch.py

doctr/models/modules/transformer/pytorch.py

doctr/models/modules/transformer/tensorflow.py

odulcy-mindee

Thanks @felixdittrich92 ! 👍

frgfm

Thanks Felix 🙏

vit: fix intermediate size scale and unify TF to PT

892fc06

felixdittrich92 self-assigned this Sep 16, 2022

felixdittrich92 added this to the 0.6.0 milestone Sep 16, 2022

felixdittrich92 added type: bug Something isn't working module: models Related to doctr.models framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification labels Sep 16, 2022

felixdittrich92 requested review from frgfm and odulcy-mindee September 16, 2022 07:35

frgfm reviewed Sep 16, 2022

View reviewed changes

doctr/models/classification/vit/tensorflow.py Show resolved Hide resolved

doctr/models/classification/vit/pytorch.py Outdated Show resolved Hide resolved

doctr/models/classification/vit/tensorflow.py Outdated Show resolved Hide resolved

fix typo and docstrings

e72bf40

felixdittrich92 requested a review from frgfm September 16, 2022 11:05

felixdittrich92 mentioned this pull request Sep 16, 2022

handle LocalizationConfusion memory consuption and upgrade min weasyprint version #1062

Merged

felixdittrich92 requested review from aminemindee, charlesmindee and ianardee September 16, 2022 12:07

This was referenced Sep 16, 2022

make pretrained backbone flexible in predictor #1061

Merged

Request For Adding ParSeq - text recognition model #1003

Closed

odulcy-mindee reviewed Sep 16, 2022

View reviewed changes

doctr/models/modules/vision_transformer/pytorch.py Show resolved Hide resolved

doctr/models/modules/transformer/pytorch.py Outdated Show resolved Hide resolved

doctr/models/modules/transformer/tensorflow.py Outdated Show resolved Hide resolved

add comment for dff

717c4ba

odulcy-mindee approved these changes Sep 19, 2022

View reviewed changes

felixdittrich92 merged commit 4e763da into mindee:main Sep 19, 2022

felixdittrich92 deleted the vit-bug branch September 19, 2022 06:51

frgfm reviewed Sep 21, 2022

View reviewed changes

felixdittrich92 mentioned this pull request Sep 26, 2022

Release tracker - v0.6.0 #791

Closed

85 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[models] Vit: fix intermediate size scale and unify TF to PT #1063

[models] Vit: fix intermediate size scale and unify TF to PT #1063

felixdittrich92 commented Sep 16, 2022 •

edited

Loading

codecov bot commented Sep 16, 2022 •

edited

Loading

frgfm left a comment

odulcy-mindee left a comment

frgfm left a comment

[models] Vit: fix intermediate size scale and unify TF to PT #1063

[models] Vit: fix intermediate size scale and unify TF to PT #1063

Conversation

felixdittrich92 commented Sep 16, 2022 • edited Loading

codecov bot commented Sep 16, 2022 • edited Loading

Codecov Report

frgfm left a comment

Choose a reason for hiding this comment

odulcy-mindee left a comment

Choose a reason for hiding this comment

frgfm left a comment

Choose a reason for hiding this comment

felixdittrich92 commented Sep 16, 2022 •

edited

Loading

codecov bot commented Sep 16, 2022 •

edited

Loading