-
-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix redundant outputs via Logging in DDP training #500
Conversation
@NanoCode012 I think changing only in multi-gpu affected regions makes sense. Should we wait for the test / detect refactor first before this or do this one first? This PR does not affect test.py and detect.py. "Scanning labels" messages should show up twice (as it treats the train and val datasets as independent, even if they both point to the same images). I see it show up 3 times, but this is not a huge problem. Also sometimes depending on your console tqdm messages may accidentally repeat by themselves even on single gpu. |
Okay I see!
I think so, then we can encapsulate it under one PR. However, you did mention that you want PR in small chunks... Edit: I can create a new PR later to deal with the upcoming changes if you want. I will set unit test to run in case I broke something tomorrow. Edit2: Unit test success. |
@NanoCode012 just tried some multi-gpu training myself and saw the redundant output phenomenon. Let's see if we can resolve these conflicts and get this merged. Can you do a rebase on your side to origin/master? |
@NanoCode012 BTW, actual DDP ops seem to be working flawlessly. My use case is a 2x V100 training of v5x. I'll report the time difference once I get a few epochs completed. |
nvidia-smi from epoch 0 and later to test high device 0 memory usage issue. Epoch 0 train:
epoch 0 test:
epoch 1 train:
|
Just re-ran CI tests. Error on non-persistent buffers. Since PR predates the 1.6 update this seems logical. Ok this PR will need some work to properly rebase with origin/master. Perhaps simplest step is to merge #504 first (which just re-passed all CI tests now), and then tackle this one. @NanoCode012 sound good? |
Hi @glenn-jocher , the reason this PR is not ready is because if we merge the "multi-node" PR later, I will have to redo another "logging" PR to address the logging under multi-node. Edit:
Yep! Sounds good! |
Thanks @glenn-jocher , with multi-node merged, I will get working on this PR to bring it up to speed and add multi-node logging. It should be done by tomorrow or day after as it is late here. |
@NanoCode012 great, no rush. Glad to see we are making steady progress :) |
Hi @glenn-jocher , for the GPU memory issue that was mentioned in #610 , tkianai and I did test and we found out that it spiked after epoch 1 test #610 (comment) and #610 (comment) . Specifically saying the issue lies in I am not sure why it does not happen for you. Maybe it is because your maximum memory is 16 GB and it is already training at 14 GB (near its maximum)? I will run my own test with 2 GPU as comparison. I think this should be in a separate Issue. Edit: Add table Command python -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 64 --data coco.yaml --cfg yolov5s.yaml --weights ''
Edit 2: Due to losing track of time, I did not get to track down the inbetweens, however, we see that it doesn't spike as badly. I am confused why it spiked before. Was it because of 8 GPUs? Edit 3: May take a bit longer for the rebase now. Got some other work to handle. |
DDP time difference: 50min/epoch 1x V100 --batch 16 2x GPU needs 58% of the min/epoch as 1 GPU. Awesome, DDP seems to be working great!! 👍 |
Hi @glenn-jocher , I've done a rebase and fixed some leftovers. Sorry it took a while. I was a bit occupied with other stuff. Logging output for 4 GPU DDP
Current master output 4 GPU DDP
Logging output on non-master machine for multi-node training
Edit: Sorry for the multiple small commits, I just kept finding small inconsistencies while viewing the differences via the site and was correcting them. Please squash when merge. Unit Test passed. This PR is finally ready. Please tell me if you want anything changed/explained. |
@NanoCode012 thanks! I had an interesting idea that might make this change a little easier on the eyes. If we redefine print at the beginning of the functions that uses logger.info, would this cause any problems? I thought this might improve the readability, i.e.: def function():
print = logger.info
# logger.info('hello')
print('hello') Could this work, or am I forgetting something? |
I think it may be possible, not sure. But, I’m not sure it’s good for code readability later on. If someone else reads the code, they may miss the “logging” assignment. |
@NanoCode012 yes, maybe you are right. I've never used the logging package before, but replacing it with print will obscure the command. Ok I'll go ahead and merge! |
@NanoCode012 it looks like the update works error free, but some of the screen printing is different. For example colab isn't showing the CUDA information anymore, and Fusing ... should be followed by updated model.info() showing the new parameter count. I'll take a look. Before:
After:
|
Hi Glenn, I’ll take a look. It would seem we have to set_logging in the test/detect script as well. We might’ve missed other scripts too. Not sure what’s the best choice. Maybe do set_logging in the select_device function because they go hand in hand. |
Hi @glenn-jocher . New fix master...NanoCode012:logging-fix Could not let detect.py
test.py
yolo.py
export.py
The solution is not the cleanest. It would be better to add |
@NanoCode012 looks good! Can you submit a PR? Fusing is ok, I understand. |
* Change print to logging * Clean function set_logging * Add line spacing * Change leftover prints to log * Fix scanning labels output * Fix rank naming * Change leftover print to logging * Reorganized DDP variables * Fix type error * Make quotes consistent * Fix spelling * Clean function call * Add line spacing * Update datasets.py * Update train.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Change print to logging * Clean function set_logging * Add line spacing * Change leftover prints to log * Fix scanning labels output * Fix rank naming * Change leftover print to logging * Reorganized DDP variables * Fix type error * Make quotes consistent * Fix spelling * Clean function call * Add line spacing * Update datasets.py * Update train.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Change print to logging * Clean function set_logging * Add line spacing * Change leftover prints to log * Fix scanning labels output * Fix rank naming * Change leftover print to logging * Reorganized DDP variables * Fix type error * Make quotes consistent * Fix spelling * Clean function call * Add line spacing * Update datasets.py * Update train.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This PR fixes #463 second point.
Tested on coco128
Old output (current master)
New output
The
Old Output
looks somewhat clean still because it is only outputting from 2 devices. 4 or 8 creates a mess.I'm not sure whether I should set to use
logging
for everywhere for consistency, or only in the places that are affected by multi-gpu training.Decide whether to change to
logging
everywhere or not. (Decision: Only multi-gpu areas)Wait for
test
/detect
refactor for multi-gpuWait for merge multi-node ddp support
Fix logging for multi-node
Edit: There is a "Scanning labels.." which repeats but it's part of tqdm, and not print. Not sure how to handle it right now.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced logging information for better tracking of model training processes.
📊 Key Changes
logging
module imports to various files.print
statements withlogger.info
to standardize logging.set_logging
function to configure logging based on rank (necessary for distributed training setups).rank
variable for consistency.train.py
.create_dataloader
methods acrosstrain.py
anddatasets.py
to ensure correct data processing in distributed environments.🎯 Purpose & Impact