Support training multiple models in parallel on each TPU core. #1539
Labels
feature
Is an improvement or enhancement
help wanted
Open to be worked on
priority: 0
High priority task
🚀 Feature
While training a model using K-Fold method, it would be beneficial to train each model parallelly on a separate TPU core. There should be a feature by which we can assign a model training process to a particular TPU core. Similar to
gpus=[0,2]
Motivation
I came across this kernel by Abhishek Thakur, wherein he trains multiple ROBERT models in parallel on each TPU core. It trained the model fast by utilizing all cores. I have tried doing the same with Lightning but realized I cant select a TPU core with it.
Pitch
Not very clear about it. Maybe inbuilt support for KFold method wherein the dataset is split accordingly and a model is trained on each TPU code in a Kfold manner. Or just the support to select a core and train on it. e.g
tpus=[1]
Additional context
Kernel by Abhishek Thakur:
https://www.kaggle.com/abhishek/super-duper-fast-pytorch-tpu-kernel
The text was updated successfully, but these errors were encountered: