Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes and tweaks for a stronger baseline #7

Merged
merged 7 commits into from
Feb 11, 2019
Merged

Bug fixes and tweaks for a stronger baseline #7

merged 7 commits into from
Feb 11, 2019

Conversation

abhshkdz
Copy link
Member

A few bug fixes and tweaks for a stronger baseline.

This improves MRR from 0.5845 to 0.6155 and NDCG from 0.5070 to 0.5315 on val.

Changes:

  • Switched off dropout during evaluation on val in train.py.
  • Shuffling batches during training (shuffle=True to DataLoader).
  • Explicitly clearing GPU memory cache with torch.cuda.empty_cache(). Negligible time hit on single GPU, and fits batch sizes of up to 32 x no. of GPUs. There's some time gain when training with larger batch sizes.
  • Added a linear learning rate warm up (https://arxiv.org/abs/1706.02677), followed by multi-step decaying.
  • Using a multi-layer LSTM + dropout for the decoder.
  • Switched from dot-product attention to a richer element-wise multiplication + fc layer attention. (The network can learn dot-product attention if it needs to.)

I've updated the config yaml. Will likely update with a trained model on trainval + numbers on test-std in 2-3 days.

@kdexd kdexd merged commit de30951 into batra-mlp-lab:master Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants