-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is GPU throughput reasonable? #192
Comments
@Crispig, thanks for your question. The TFLOPs on 16xA100-40GB is quite low. What is the batch size? 10B model is too small for zero-infinity with nvme offload, given the overheads of parameter partitioning and nvme offload. You should get much better performance with zero-offload. Some factors to consider in order to understand and improve zero-infinity performance
|
Thank you very much for your reply!
|
Maybe I am too late here but this old Megatron has been deprecated. Can you kindly try the latest code and recipes from here? https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/azure |
I currently have some tests on Zero3 infinite and have had some problems and would like your help.
Machine configuration: two nodes, each node a piece of A100-PCIE-40GB, RAM 126G (actual operation available 60G), SSD 1TB (Samsung 980)
Benchmark Code:/DeepSpeedExamples/Megatron-LM-v1.1.5-ZeRO3/
Model cases tested:
HIDDEN_SIZE / NUM_ATTN_HEADS/ NUM_LAYERS/ BATCHSIZE = 4096/16/50/8 (Model size 10B)
GPU memory occupies 13395/40537MB
RAM occupancy 109/126G, (60G at idle)
80G of swap files stored in nvme file system
Effective Tera Flops per GPU is 1.5TFLPOS
Question:
Whether the GPU throughput achieved under the current environment configuration is reasonable, and whether the throughput can be increased by increasing the batch size or other configurations
Effective Tera Flops per GPU calculated in flops_calculator of DeepSpeedExamples is about 1.5 TFLPOS. But deepspeed profile tested FLOPS per GPU is 2.32 GFLOPS.(deepspeed _profile.txt is generated by deepspeed profile and train.log is the information output during training)
deepspeed _profile.txt
train.log
I hope to get your help, thank you very much!
The text was updated successfully, but these errors were encountered: