Bug fixes and tweaks for a stronger baseline #7

abhshkdz · 2019-02-11T20:47:02Z

A few bug fixes and tweaks for a stronger baseline.

This improves MRR from 0.5845 to 0.6155 and NDCG from 0.5070 to 0.5315 on val.

Changes:

Switched off dropout during evaluation on val in train.py.
Shuffling batches during training (shuffle=True to DataLoader).
Explicitly clearing GPU memory cache with torch.cuda.empty_cache(). Negligible time hit on single GPU, and fits batch sizes of up to 32 x no. of GPUs. There's some time gain when training with larger batch sizes.
Added a linear learning rate warm up (https://arxiv.org/abs/1706.02677), followed by multi-step decaying.
Using a multi-layer LSTM + dropout for the decoder.
Switched from dot-product attention to a richer element-wise multiplication + fc layer attention. (The network can learn dot-product attention if it needs to.)

I've updated the config yaml. Will likely update with a trained model on trainval + numbers on test-std in 2-3 days.

abhshkdz and others added 7 commits February 11, 2019 15:12

Larger batch size, gpu memory clearing, shuffling batches

4d534d9

Better LR schedule -- warmup + multi-step

7506824

Use multi-layer LSTM with dropout for options

bc46ab5

Switch to eval mode when computing metrics on val

2850328

Sets default learning rate as 1e-2 for batch size = 128

ae1b7d8

Implements point-wise mult + fc for attention than dot product

921d73e

Update docstring of lr_lambda_fun and fix epoch enumeration.

711689c

kdexd merged commit de30951 into batra-mlp-lab:master Feb 11, 2019

Provide feedback