-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation for DataTable Creation #11942
Add instrumentation for DataTable Creation #11942
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11942 +/- ##
============================================
+ Coverage 61.44% 61.49% +0.05%
+ Complexity 1141 207 -934
============================================
Files 2385 2385
Lines 129189 129216 +27
Branches 19998 20001 +3
============================================
+ Hits 79381 79465 +84
+ Misses 44061 43951 -110
- Partials 5747 5800 +53
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 38 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
65a961c
to
bdd423d
Compare
@@ -215,6 +219,14 @@ private byte[] serializeResponse(ServerQueryRequest queryRequest, InstanceRespon | |||
byte[] responseByte = null; | |||
try { | |||
responseByte = instanceResponse.toDataTable().toBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we verify whether there's memory allocation in toBytes()? Do we need to instrument that part as well? Meanwhile, IMO it might be helpful to very briefly explain why we only added instrumentation only for groupby and selection operation utils in the PR description @SabrinaZhaozyf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we verify whether there's memory allocation in toBytes()? Do we need to instrument that part as well?
Good point. There's memory allocation in toBytes (mostly from writeLeadingSections
) and we have seen OOM here as well. Added instrumentation to serializeStringDictionary in toBytes.
IMO it might be helpful to very briefly explain why we only added instrumentation only for groupby and selection operation utils in the PR description
Done.
List<DataTable> dataTables = new ArrayList<>(); | ||
while (numIterations < 100) { | ||
// build data table with sampling | ||
dataTables.add(SelectionOperatorUtils.getDataTableFromRows(rows, dataSchema, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add DataTable.toBytes() as well here? Meanwhile, the testing would probably fit better to the real use case if we call SelectionOperatorUtils.getDataTableFromRows(rows, dataSchema, false)
only once in each thread and submit multiple runner threads. Similar for the groupby case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated both tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing the comments.
Could you take a look at the tests? They are failing and the log size looks pretty large,
Maybe set the log level to info in the new tests?
c414350
to
d3dc89f
Compare
The failed tests were due to a flaky test which has been fixed here. Thank you for the review! @jasperjiaguo |
When the
InstanceResponseBlock
is creating DataTable based on the final results in the combine operator, that data table creation is hitting OOM (see examples below). This PR adds instrumentation for this path under the query killing framework.Instrument is added the impl of
BaseResultsBlock.getDataTable()
, specially toGroupByResultsBlock
,SelectionResultsBlock
, andDistinctResultsBlock
which are the most memory-intensive.Skipped
AggregationResultsBlock
,ExceptionResultsBlock
,ExplainResultsBlock
,MetadataResultsBlock
as they don't have as much memory overhead.Examples from heap dumps
Selection
GroupBy
Testing
ResourceManagerAccountingTest
verify that memory usage is sampled, interruption is checked, andEarlyTerminationException
is thrown ingetDataTable
by manually setting up the accountant after rows/result blocks are created (only samples for data table creation). Without instrumentation, tests will throw OOM exception.Below is an example of query exception in the response when a kill happened exactly at data table creation. While it is not deterministic, to reproduce this, you can set server
accounting.oom.critical.heap.usage.ratio
to be higher inOfflineClusterMemBasedServerQueryKillingTest
and try runningtestSelectionOnlyOOM
.cc @jasperjiaguo @siddharthteotia