Skip to content

Finetuning with arbitrary input size -> strange behaviour #19

Answered by andreped
andreped asked this question in Q&A
Discussion options

You must be logged in to vote

Seems like increasing the batch size, which initially was challenging due to OOM issues, using gradient accumulation, lowering the learning rate to 1e-4 using the Adam optimizer, and using cross-entropy loss instead of focal loss, seemed to have resolved the issue. At least for GCViT.

That larger batch size is more cruicial for ViTs make sense as they work fundamentally different to CNNs, acting more as low-pass filters compared to CNNs acting as high-pass filters.

Will be interesting to see if the same applies to the other ViTs I tried, but at least this resolved my initial concern.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@andreped
Comment options

@andreped
Comment options

@andreped
Comment options

Answer selected by andreped
@awsaf49
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants