Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage #8669

Merged
merged 9 commits into from
Apr 8, 2022

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Mar 25, 2022

Proposed changes

Issue Number: close #7196

Problem Summary:

Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

1. Accuracy verification

Env: 1 FE, 1 BE
Test Set:  ssb   LINEORDER    600w
set track_new_delete=true
set memory_verbose_trace=true
set parallel_fragment_exec_instance_num=10
set exec_mem_limit = 21474836480`
// The Level use to decide whether to show it in web page,
// each MemTracker have a Level less than or equal to parent, only be set explicit,
// TASK contains query, import, compaction, etc.
enum class MemTrackerLevel { OVERVIEW = 0, TASK, INSTANCE, VERBOSE };

Test SQL

select k1, count(1) from  ( select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY ) a group by k1 limit 10;
  1. mem_tracker_level=0 (default OVERVIEW)

image

  1. mem_tracker_level=1 (TASK)

image

  1. mem_tracker_level=2 (INSTANCE)

image

image

  1. mem_tracker_level=3 (VERBOSE)

image

image

image

image

Test Load

Stream load ssb::LINEORDER, 600w
TODO: At present, the value of load task tracker is negative, and some malloc memory may not be recorded, or the malloc memory in other places may be freed.

  1. mem_tracker_level=0 (default OVERVIEW)

image

  1. mem_tracker_level=1 (TASK)

image

  1. mem_tracker_level=2 (INSTANCE)

image

  1. mem_tracker_level=3 (VERBOSE)

image

image

image

2. Pformance Testing

As of now, the new memory statistics framework will bring about a 1%-2% performance penalty.

Env: 1 FE, 1 BE
Test Set: ssb LINEORDER 600w
set parallel_fragment_exec_instance_num = 10;
set exec_mem_limit = 21474836480;

TEST 1 - Big query

Test SQL:

select k1, count(1) from  ( select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY union all  select LO_CUSTKEY as k1, count(1), max(LO_ORDERKEY) from LINEORDER2 group by LO_CUSTKEY ) a group by k1 limit 10;

Result:

  1. jmeter thread=1
  • track_new_delete=false, Avg time: 586 (ms)
  • track_new_delete=true, memory_verbose_track=false, Avg time: 591 (ms), slow: 0.8%
  • ** track_new_delete=true, memory_verbose_trace=true, Avg time: 598 (ms), slow: 2.04%, switch_thread_mem_tracker_count=5,204,990**
  1. jmeter thread=10, qps: 2.4/s
  • track_new_delete=false, Avg time: 4078 (ms)
  • track_new_delete=true, memory_verbose_track=false, Avg time: 4104 (ms), slow: 0.6%
  • ** track_new_delete=true, memory_verbose_trace=true, Avg time: 4143 (ms), slow: 1.59%, switch_thread_mem_tracker_count=22,990,941**

TEST 2 - Small query

Test SQL:

select LO_ORDERKEY as k1, count(1), max(LO_CUSTKEY)  from LINEORDER2 group by LO_ORDERKEY limit 1;

set untracked_mem_limit_mbytes=2M;
Result:

  1. jmeter thread=100, qps: 60.0/s
  • track_new_delete=false, memory_verbose_track=false, Avg time: 1535 (ms)
  • track_new_delete=true, memory_verbose_track=false, Avg time: 1549 (ms), slow: 0.9%
  • ** track_new_delete=true, memory_verbose_track=true, Avg time: 1555 (ms), slow: 1.3%, switch_thread_mem_tracker_count=11825571**
  1. jmeter thread=200, qps: 58.1/s
  • track_new_delete=false, memory_verbose_track=false, Avg time: 3118 (ms)
  • track_new_delete=true, memory_verbose_track=false, Avg time: 3154 (ms), slow: 1.15%
  • ** track_new_delete=true, memory_verbose_track=true, Avg time: 3176 (ms), slow: 1.86%, switch_thread_mem_tracker_count=24,561,031 **

3. Future

  1. more accurate statistics
  • At present, many VERBOSE trackers have not independently verified the accuracy, and only generally meet expectations;
  1. more comprehensive coverage of all Doris tasks
  • For example, to verify the accuracy of StorageEngine task tracker;
  1. faster tracking
  • As of now, the new memory statistics framework will bring about a 2% performance penalty;
  1. memory leak monitoring
  • Currently memory_leak_detection is still experimental;
  1. more details will be gradually improved in the future
  • For example, the load task statistics missing part of malloc memory, resulting in a negative value...

@morningman morningman added area/memory-consumption dev/backlog waiting to be merged in future dev branch labels Mar 25, 2022
@xinyiZzz xinyiZzz changed the title [feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (Part2) [feature-wip] (memory) Switch TLS mem tracker to separate more detailed memory usage (Part2) Mar 30, 2022
@xinyiZzz xinyiZzz force-pushed the switch_tls_tracker2 branch 2 times, most recently from 82a3cde to d16b88c Compare March 31, 2022 09:14
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Apr 3, 2022

cc @morningman @yangzhg

The purpose of this PR is to add all switch thread mem trackers in all appropriate places and ensure overall performance and accuracy.

At present, the statistics of Load task and other details are inaccurate, and will be fixed in the follow-up.

In addition, there is a problem with Local Test that has not been completely resolved, in commit @xinyiZzz Temporarily fix thread mem tracker missing, I will try my best to fix it as soon as possible. (This will only affect the accuracy a little bit).

@xinyiZzz xinyiZzz requested a review from morningman April 3, 2022 23:07
@@ -217,14 +217,14 @@ inline void ThreadMemTrackerMgr::cache_consume(int64_t size) {
if (_untracked_mem >= config::mem_tracker_consume_min_size_bytes ||
_untracked_mem <= -config::mem_tracker_consume_min_size_bytes) {
DCHECK(_untracked_mems.find(_tracker_id) != _untracked_mems.end());
start_thread_mem_tracker = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to explain why adding start_thread_mem_tracker = false; here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid getting into infinite recursion if there is a temporary memory allocation in mem_tracker.consume/try_consume.

It is mentioned in the comments of tcmalloc_hook.h: Allocating memory in the Hook command causes the Hook to be entered again, infinite recursion.

This needs to ensure that all memory allocated in mem_tracker.consume/try_consume is freed in time to avoid tracking misses.

Add comments when modifying uniformly.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2022

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 7, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2022

PR approved by anyone and no changes requested.

@xinyiZzz xinyiZzz changed the title [feature-wip] (memory) Switch TLS mem tracker to separate more detailed memory usage (Part2) [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage Apr 7, 2022
@morningman morningman merged commit 519305c into apache:master Apr 8, 2022
weizhengte pushed a commit to weizhengte/incubator-doris that referenced this pull request Apr 22, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
zhengshiJ pushed a commit to zhengshiJ/incubator-doris that referenced this pull request Apr 27, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
starocean999 pushed a commit to starocean999/incubator-doris that referenced this pull request May 19, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/memory-consumption dev/backlog waiting to be merged in future dev branch reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Refactored memory statistics framework MemTracker
2 participants