Drastic drop in GPU speed after approximately 10 games #80

alanballard · 2020-07-03T05:37:37Z

I'm getting an unexpected, but reliable, drop in GPU speed after running 11-ish games of Chapter07/01_dqn_basic.py using the --cuda option. For the first 10 games, I get speeds comparable with the textbook, but speed halves in game 11, and then down to a third in games 12+. I get similar behavior when running Chapter06/02_dqn.pong.py on the GPU too. This happens every time I run the code.

I'm running Python 3.6.10 on Windows 10. I'm using all the textbook's required packages, with the exception of Pytorch. Textbook recommends pytorch==0.4.0, but I couldn't get it to run so I installed pytorch==1.1.0 with cudatoolkits==10.0. I'm using an NVIDIA TITAN RTX, and my PC has 128GB ram.

I also get similar speeds (~400 f/s) when I run the code withOUT the --cuda option for the first 10 or so games but then it drops to around 10 f/s after that.

I have no idea what could be the cause of this. I'm relatively new to RL, CUDA and Python, so I'm not sure if it's a problem with the example code or something on my end.

Any ideas? Has anyone else reported this, or is it just me?

Shmuma · 2020-07-03T06:07:23Z

Hi! Speed drop after first several games is normal, as in the beginning, no training is done, as we're populating replay buffer. Training is the most heavy operation, so, 5-10x slowdown is ok. But, of course it doesn't explain why you're getting slower speed than in the book. It could be wide variety of reasons for that: slower card (I've used gtx 1080ti), wrong drivers setup or just overheating. пт, 3 июл. 2020 г., 8:37 alanballard <notifications@github.com>:

…

I'm getting an unexpected, but reliable, drop in GPU speed after running 11-ish games of Chapter07/01_dqn_basic.py using the --cuda option. For the first 10 games, I get speeds comparable with the textbook, but speed halves in game 11, and then down to a third in games 12+. I get similar behavior when running Chapter06/02_dqn.pong.py on the GPU too. This happens every time I run the code. I'm running Python 3.6.10 on Windows 10. I'm using all the textbook's required packages, with the exception of Pytorch. Textbook recommends pytorch==0.4.0, but I couldn't get it to run so I installed pytorch==1.1.0 with cudatoolkits==10.0. I'm using an NVIDIA TITAN RTX, and my PC has 128GB ram. I *also* get similar speeds (~400 f/s) when I run the code with*OUT* the --cuda option for the first 10 or so games but then it drops to around 40 f/s after that. I have no idea what could be the cause of this. I'm relatively new to RL, CUDA and Python, so I'm not sure if it's a problem with the example code or something on my end. Any ideas? Has anyone else reported this, or is it just me? [image: 01_dqn_basic py cuda performance] <https://user-images.githubusercontent.com/39736471/86434283-f5f61580-bcb1-11ea-85ec-723734860c57.PNG> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#80>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQE2VSO7O7YJEVPOFWXWLRZVVC3ANCNFSM4OPPWUTQ> .

alanballard · 2020-07-03T18:53:26Z

Thank you for your reply. It's been a struggle since I'm learning Python and RL at the same time (not too mention debugging GPU issues), but I've really enjoyed your book so far.

I'm using a NVIDIA TITAN RTX, so I would expect the performance to be at least as good as a GTX 1080ti, if not better. I've updated the drivers but there is no change in performance.

If I use the --cuda option, my GPU usage never exceeds 3% and my CPU usage is approximately 60% for all games. However, the speed drops from ~400 f/s to ~80 f/s after (approximately) game 10 or 11.

If I do not use the --cuda option, then my GPU usage is about 1% (same as internet browsing) and my CPU goes to 100% after game 10. As before, the speed for games #1-#10 is ~400 f/s, but without the cuda option, the speed drops to ~10 f/s after game 10 or 11.

So, for the first 10 games, I can achieve performance close to the textbook's whether I use the --cuda option or not. In either case, the speed dramatically drops after game 10, and my GPU usage is never greater than 3% regardless of which option I choose.

This behavior is almost identical to that reported here:
Issue #32

Maxim (or anyone else who might be reading this), when you have time, would you mind running Chapter07/01_dqn_basic.py --cuda from the 1st edition of the book and answer these questions:

In Chapter 7, pg, 167, you only report up to the 9th game. Can you let it run until game 15 or so and see if you also experience a really significant drop in speed during the game 10-15 range?
What % of your GPU and CPU are you using when you execute Chapter07/01_dqn_basic.py --cuda?

I'm not convinced that it's a bad thing that my GPU usage is so low. It's possible that even with maximum parallelization using the current code, I simply can't use more than 3% of my available GPU resources playing pong. If that's the case, then there may be another reason that the speed is so slow during training. I don't think there's any problem with your code, but it may be an issue of old packages vs. new Windows/GPU specs, or maybe old packages doing something unwelcome in my environment (like unnecessarily copying tensors).

I'm going to clone the git for your 2nd edition book, create a Python environment with the new package requirements and re-test the 2nd-edition version of the code there. That should at least let me know whether it's a package version issue.

Thank you for your help.

Shmuma · 2020-07-04T15:01:30Z

Below are benchmarks on my hardware (1080Ti, nvidia drivers 440.100, cuda 10.2, ubuntu) of the first edition code:

804: done 1 games, mean reward -21.000, speed 585.91 f/s, eps 0.99
1732: done 2 games, mean reward -20.500, speed 722.61 f/s, eps 0.98
2797: done 3 games, mean reward -20.000, speed 721.65 f/s, eps 0.97
3665: done 4 games, mean reward -20.250, speed 721.01 f/s, eps 0.96
4453: done 5 games, mean reward -20.400, speed 719.14 f/s, eps 0.96
5328: done 6 games, mean reward -20.333, speed 720.16 f/s, eps 0.95
6146: done 7 games, mean reward -20.429, speed 721.55 f/s, eps 0.94
7269: done 8 games, mean reward -20.375, speed 718.43 f/s, eps 0.93
8534: done 9 games, mean reward -20.111, speed 712.01 f/s, eps 0.91
9495: done 10 games, mean reward -20.200, speed 722.72 f/s, eps 0.91
10461: done 11 games, mean reward -20.273, speed 244.70 f/s, eps 0.90
11408: done 12 games, mean reward -20.250, speed 147.07 f/s, eps 0.89
12264: done 13 games, mean reward -20.308, speed 146.79 f/s, eps 0.88
13239: done 14 games, mean reward -20.286, speed 146.91 f/s, eps 0.87
14048: done 15 games, mean reward -20.333, speed 146.55 f/s, eps 0.86

During the training, GPU utilisation is about 35%

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
| 34%   63C    P2    76W / 250W |    583MiB / 11178MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   37C    P8     9W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4404      C   python3                                      573MiB |
+-----------------------------------------------------------------------------+

Without --cuda option, I'm getting 510 f/s during replay buffer population (first 10 games) and then, speed is decreasing to 15 f/s. So, speed up is 80-100 times, as it should be.

I see your hardware is much better than mine (my CPU is i5-6600k, system has 32GB of RAM, so, numbers should be better.

I'd start with general system/gpu troubleshooting by running standard deep learning benchmarks (like this one: https://github.com/ryujaehun/pytorch-gpu-benchmark) and comparing the numbers on them.

You might also try to take Chapter09 examples from the second edition. This chapter is devoted to apply GPU/PyTorch tricks to speed up pong game, so, it has plenty of numbers to compare with.
Here is the summary of chapter 9 samples performance on my hardware: https://www.dropbox.com/s/qz1tghrv1029efv/Chapter09-benchmarks.png?dl=0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drastic drop in GPU speed after approximately 10 games #80

Drastic drop in GPU speed after approximately 10 games #80

alanballard commented Jul 3, 2020 •

edited

Loading

Shmuma commented Jul 3, 2020 via email

alanballard commented Jul 3, 2020

Shmuma commented Jul 4, 2020

Drastic drop in GPU speed after approximately 10 games #80

Drastic drop in GPU speed after approximately 10 games #80

Comments

alanballard commented Jul 3, 2020 • edited Loading

Shmuma commented Jul 3, 2020 via email

alanballard commented Jul 3, 2020

Shmuma commented Jul 4, 2020

alanballard commented Jul 3, 2020 •

edited

Loading