CUDATreeLearner: free GPU memory in destructor if any allocated #4963

denmoroz · 2022-01-20T20:51:17Z

Fixes memory freeing issue described here: allocated GPU memory is not released after training cycle is complete, which leads to constant memory growth while using multiple train() invocations inside the same process (like hyperparameters search).

ghost · 2022-01-20T20:51:34Z

All CLA requirements met.

denmoroz · 2022-01-21T10:15:21Z

Ups, seems like some tests are not passing as they are cancelled by timeout. Trying to restart.

denmoroz · 2022-01-21T13:01:15Z

Still getting

🤔

jmoralez · 2022-01-21T15:55:02Z

Thank you for your contribution!

Ups, seems like some tests are not passing as they are cancelled by timeout.

That is due to #4948 and will be solved by #4953, once that is merged we'll let you know to update this branch and everything should succeed.

denmoroz · 2022-01-21T16:10:54Z

@jmoralez
Oh, I see.

once that is merged we'll let you know to update this branch and everything should succeed

Thanks 🙏

jmoralez · 2022-02-11T01:48:26Z

Hi @denmoroz. #4953 has been merged, can you please update this branch to include those changes?

shiyu1994 · 2022-02-16T13:07:29Z

src/treelearner/cuda_tree_learner.cpp

@@ -63,6 +63,43 @@ CUDATreeLearner::CUDATreeLearner(const Config* config)
 }

 CUDATreeLearner::~CUDATreeLearner() {
+  #pragma omp parallel for schedule(static, num_gpu_)


With a static schedule, and num_gpu_ chunk size, I think there will be only 1 thread being used. So I can we simply remove this line?

@shiyu1994
To be honest I've just copy-pasted code from here which uses omp parallel, so I decided that probably it is done intentionally. My expectations were that cleanup for each GPU device will be executed in parallel for efficiency (if num_gpu_ > 1). Do you think we should actually remove this?

After experimenting, I find that schedule(static, num_gpu_) will only result in a single thread. With num_gpu_ block size, and with num_gpu_ total iterations. So this would not parallelize the for loop.
I think it is ok to keep the omp parallel in this PR. And we can fix this with another PR. Thank you!

denmoroz · 2022-02-19T11:11:56Z

Hi @denmoroz. #4953 has been merged, can you please update this branch to include those changes?

@jmoralez Sure! Sorry for delayed response, just figured out that you've asked for update.

UPD: Done

shiyu1994

LGTM. Thanks for your contribution!

github-actions · 2023-08-23T14:10:47Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

CUDATreeLearner: free GPU memory in destruuctor if any allocated

4e9c9e9

denmoroz requested review from btrotta, guolinke, hzy46, shiyu1994 and tongwu-sh as code owners January 20, 2022 20:51

Minor changes: checking for num_gpu_feature_groups is not needed

0804d40

jameslamb requested review from jameslamb and StrikerRUS January 20, 2022 22:55

Trigger CI again

86b36cc

StrikerRUS removed their request for review January 22, 2022 20:15

StrikerRUS added fix awaiting review labels Jan 22, 2022

shiyu1994 reviewed Feb 16, 2022

View reviewed changes

StrikerRUS removed the awaiting review label Feb 16, 2022

denmoroz and others added 3 commits February 19, 2022 14:16

Merge branch 'microsoft:master' into fix-4952

c553276

Merge branch 'master' into fix-4952

fd66751

Merge branch 'fix-4952' of github.com:denmoroz/LightGBM into fix-4952

70162d1

shiyu1994 approved these changes Feb 20, 2022

View reviewed changes

shiyu1994 merged commit 0db573c into microsoft:master Feb 20, 2022

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDATreeLearner: free GPU memory in destructor if any allocated #4963

CUDATreeLearner: free GPU memory in destructor if any allocated #4963

denmoroz commented Jan 20, 2022 •

edited

Loading

ghost commented Jan 20, 2022 •

edited by ghost

Loading

denmoroz commented Jan 21, 2022

denmoroz commented Jan 21, 2022 •

edited

Loading

jmoralez commented Jan 21, 2022

denmoroz commented Jan 21, 2022 •

edited

Loading

jmoralez commented Feb 11, 2022

shiyu1994 Feb 16, 2022

denmoroz Feb 19, 2022 •

edited

Loading

shiyu1994 Feb 20, 2022

denmoroz commented Feb 19, 2022 •

edited

Loading

shiyu1994 left a comment

github-actions bot commented Aug 23, 2023

CUDATreeLearner: free GPU memory in destructor if any allocated #4963

CUDATreeLearner: free GPU memory in destructor if any allocated #4963

Conversation

denmoroz commented Jan 20, 2022 • edited Loading

ghost commented Jan 20, 2022 • edited by ghost Loading

denmoroz commented Jan 21, 2022

denmoroz commented Jan 21, 2022 • edited Loading

jmoralez commented Jan 21, 2022

denmoroz commented Jan 21, 2022 • edited Loading

jmoralez commented Feb 11, 2022

shiyu1994 Feb 16, 2022

Choose a reason for hiding this comment

denmoroz Feb 19, 2022 • edited Loading

Choose a reason for hiding this comment

shiyu1994 Feb 20, 2022

Choose a reason for hiding this comment

denmoroz commented Feb 19, 2022 • edited Loading

shiyu1994 left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

denmoroz commented Jan 20, 2022 •

edited

Loading

ghost commented Jan 20, 2022 •

edited by ghost

Loading

denmoroz commented Jan 21, 2022 •

edited

Loading

denmoroz commented Jan 21, 2022 •

edited

Loading

denmoroz Feb 19, 2022 •

edited

Loading

denmoroz commented Feb 19, 2022 •

edited

Loading