-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformer_main.py - Low TPU usage V3-8 #8671
Comments
Hi, what TPU topology you use? e.g. tpu-v3-8? Utilization of TPU Matrix Units (higher is better): 11.9% |
Hi, I'm using TPU V3-8. This is the link for the profiler reports: I generated two of them, just in case. I used exactly the same file generate_data.py, only changing the source data to my parallel training files (2.48 M lines). I have tried changing the batch size, it didn't make any difference. Thanks |
The sorting and topk are from training metrics. It is surprising to see they are that slow. models/official/nlp/transformer/metrics.py Line 137 in 7f92635
|
In fact, when I set --enable_metrics_in_training to False it did increase TPU usage to around 40% TPU type: TPU v3 I wonder if it is possible to get even better TPU usage. Thanks, |
Utilization of TPU Matrix Units (higher is better): 38.9%, already looks ok to me. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer_main.py
2. Describe the bug
I followed the instructions to use TPUs to train a transformer model, but I only get around 12% of TPU utilization when running the code.
3. Steps to reproduce
TPU type: TPU v3
Number of TPU cores: 8 (Replica count = 8, num cores per replica = 1)
TPU idle time (lower is better): 0.058%
Utilization of TPU Matrix Units (higher is better): 11.9%
Step time: 209ms (avg), 209ms (min), 209ms (max)
Infeed percentage: 0.048% (avg), 0.048% (min), 0.048% (max)
4. Expected behavior
I would expect TPU usage to be higher, it does seem to be using only 1 TPU core.
5. Additional context
Memory usage is also low, around 12GB, which again seems to be using just 1 TPU core.
6. System information
== check python ===================================================
python version: 3.7.3
python branch:
python build version: ('default', 'Dec 20 2019 18:57:59')
python compiler version: GCC 8.3.0
python implementation: CPython
== check os platform ===============================================
os: Linux
os kernel version: #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07)
os release version: 4.19.0-9-cloud-amd64
os platform: Linux-4.19.0-9-cloud-amd64-x86_64-with-debian-10.4
linux distribution: ('debian', '10.4', '')
linux os distribution: ('debian', '10.4', '')
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='garden', release='4.19.0-9-cloud-amd64', version='#1 SMP Debian 4.19.11
8-2+deb10u1 (2020-06-07)', machine='x86_64', processor='')
architecture: ('64bit', 'ELF')
machine: x86_64
== are we in docker =============================================
No
== compiler =====================================================
c++ (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
The text was updated successfully, but these errors were encountered: