Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic batch size for dp test #1165

Merged
merged 3 commits into from
Sep 24, 2021

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Sep 22, 2021

Resolves #1149.

We start nbatch * natoms from 1024 (or we can set a different number), and iteratively multiply it by 2 until catching the OOM error.

A small issue is that it's a bit slow to catch the TF OOM error. It's a problem of TF and I don't know how to resolve it. Luckily we only need to catch once.

Resolves deepmodeling#1149.

We start nbatch * natoms from 1024 (or we can set a different number), and iteratively multiply it by 2 until catching the OOM error.

A small issue is that it's a bit slow to catch the TF OOM error. It's a problem of TF and I don't know how to resolve it. Luckily we only need to catch once.
@njzjz njzjz requested a review from amcadmus September 22, 2021 23:43
@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2021

Codecov Report

Merging #1165 (57c4d3c) into devel (53f1567) will increase coverage by 0.13%.
The diff coverage is 89.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##            devel    #1165      +/-   ##
==========================================
+ Coverage   75.94%   76.08%   +0.13%     
==========================================
  Files          90       91       +1     
  Lines        7172     7226      +54     
==========================================
+ Hits         5447     5498      +51     
- Misses       1725     1728       +3     
Impacted Files Coverage Δ
deepmd/entrypoints/test.py 11.90% <25.00%> (+0.36%) ⬆️
deepmd/utils/sess.py 54.54% <50.00%> (+4.54%) ⬆️
deepmd/utils/batch_size.py 96.00% <96.00%> (ø)
deepmd/infer/deep_pot.py 68.55% <100.00%> (ø)
deepmd/utils/errors.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53f1567...57c4d3c. Read the comment docs.

self.maximum_working_batch_size = 0
self.minimal_not_working_batch_size = 2**31

def execuate(self, callable: Callable, start_index: int, natoms: int) -> Tuple[int, tuple]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: execuate -> execute

Copy link
Member

@amcadmus amcadmus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a unittest for the AutoBatchSize?

@njzjz njzjz requested a review from amcadmus September 23, 2021 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants