You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by pkese October 28, 2023
If anyone is interested...
I made a small language model inspired by https://github.com/karpathy/nanoGPT in both PyTorch and TorchSharp.
The model has 2 layers of transformers totalling 150k parameters and is trained on Shakespeare's text.
I found out that going to smaller data types, improves training time, as does PyTorch's jit.compile, which is not available in TorchSharp.
Here are some benchmarks of model training times (minutes and seconds) with CUDA on a small GPU (RTX 3070).
I couldn't achieve the same bf16 functionality with TorchSharp.
I don't quite understand why default TorchSharp code is slower than default PyTorch code.
After I set torch.backends.cuda.matmul.allow_tf32 = true in both Python and TorchSharp, I get comparable performance (see first vs second column of results).
If someone is interested I can publish the code.
(I was trying to also get TorchScript models to work on both sides which messed up the code quite a bit ... and I might wish to reverse that.)
BTW, TorchScript model was 1% slower to train on PyTorch and crashed in TorchSharp.
The text was updated successfully, but these errors were encountered:
Originally posted by pkese October 28, 2023 If anyone is interested...
I made a small language model inspired by https://github.com/karpathy/nanoGPT in both PyTorch and TorchSharp. The model has 2 layers of transformers totalling 150k parameters and is trained on Shakespeare's text.
I wonder why TorchSharp turned out SOOOO slow.
Did you profile? Can you share the code?
Discussed in #1126
Originally posted by pkese October 28, 2023
If anyone is interested...
I made a small language model inspired by https://github.com/karpathy/nanoGPT in both PyTorch and TorchSharp.
The model has 2 layers of transformers totalling 150k parameters and is trained on Shakespeare's text.
I found out that going to smaller data types, improves training time, as does PyTorch's
jit.compile
, which is not available in TorchSharp.Here are some benchmarks of model training times (minutes and seconds) with CUDA on a small GPU (RTX 3070).
For
bf16
I used:I couldn't achieve the same
bf16
functionality with TorchSharp.I don't quite understand why default TorchSharp code is slower than default PyTorch code.
After I set
torch.backends.cuda.matmul.allow_tf32 = true
in both Python and TorchSharp, I get comparable performance (see first vs second column of results).If someone is interested I can publish the code.
(I was trying to also get TorchScript models to work on both sides which messed up the code quite a bit ... and I might wish to reverse that.)
BTW, TorchScript model was 1% slower to train on PyTorch and crashed in TorchSharp.
The text was updated successfully, but these errors were encountered: