Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query context parameter for segment load wait #15076

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Oct 4, 2023

#14322 added the feature of allowing the waiting for segments to be loaded to the MSQ controller. This adds a context parameter to control if this behaviour should be used. It also optimizes the query to make fewer calls to the broker by clubbing all the versions into the same request.

Notes:

  • Add waitTillSegmentsLoad as a query context parameter. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Documentation Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 4, 2023
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
Overall LGTM.

@@ -246,6 +246,7 @@ The following table lists the context parameters for the MSQ task engine:
| `durableShuffleStorage` | SELECT, INSERT, REPLACE <br /><br />Whether to use durable storage for shuffle mesh. To use this feature, configure the durable storage at the server level using `druid.msq.intermediate.storage.enable=true`). If these properties are not configured, any query with the context variable `durableShuffleStorage=true` fails with a configuration error. <br /><br /> | `false` |
| `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on fault tolerance mode or not. Failed workers are retried based on [Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly set to false. | `false` |
| `selectDestination` | SELECT<br /><br /> Controls where the final result of the select query is written. <br />Use `taskReport`(the default) to write select results to the task report. <b> This is not scalable since task reports size explodes for large results </b> <br/>Use `durableStorage` to write results to durable storage location. <b>For large results sets, its recommended to use `durableStorage` </b>. To configure durable storage see [`this`](#durable-storage) section. | `taskReport` |
| `segmentLoadWait` | INSERT, REPLACE<br /><br /> Whether the controller should wait for segments to be loaded before exiting. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query. | `false` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segmentHandedOff ? How does this name sound ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or waitTillSegmentsLoad

@@ -159,29 +159,18 @@ public void waitForSegmentsToLoad()
return;
}

Iterator<String> iterator = versionsToAwait.iterator();
log.info(
log.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log.info this every minute ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And update the log message to say that even if the task is cancelled, the segments will continue to load.

{
Request request = brokerClient.makeRequest(HttpMethod.POST, "/druid/v2/sql/");
SqlQuery sqlQuery = new SqlQuery(StringUtils.format(LOAD_QUERY, datasource, version),
SqlQuery sqlQuery = new SqlQuery(StringUtils.format(LOAD_QUERY, datasource, versionsInClauseString),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running ut's I also saw


2023-10-05T11:21:28,837 WARN [query-2cd60052-3011-47ae-87be-ec47100f649d-segment-load-waiter-0] org.apache.druid.msq.exec.SegmentLoadStatusFetcher - Exception occurred while waiting for segments to load. Exiting.
java.lang.NullPointerException: null
	at org.apache.druid.msq.exec.SegmentLoadStatusFetcher.fetchLoadStatusForVersion(SegmentLoadStatusFetcher.java:262) ~[classes/:?]
	at org.apache.druid.msq.exec.SegmentLoadStatusFetcher.lambda$waitForSegmentsToLoad$0(SegmentLoadStatusFetcher.java:174) ~[classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?]
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?]
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-10-05T11:21:28,846 INFO [main] org.apache.druid.msq.exec.WorkerImpl - Stopping gracefully for taskId [query-2cd60052-3011-47ae-87be-ec47100f649d-worker0_0]
2023-10-05T11:21:28,848 INFO [main] org.apache.druid.msq.test.MSQTestBase - found generated segments: DataSegment{binaryVersion=9, id=foo1_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_test, loadSpec={type=>local, path=>/var/folders/sx/n0v16tnj6mvf6hd14l6j6dth0000gn/T/junit8066366800031570368/localsegments/foo1/-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z/test/0/index/}, dimensions=[dim_mv], metrics=[], shardSpec=NumberedShardSpec{partitionNum=0, partitions=100}, lastCompactionState=null, size=1037}
2023-10-05T11:21:28,859 INFO [main] org.apache.druid.msq.test.MSQTestBase - Found spec: {
  "query" : {
  

Lets fix it in this PR as well.

@@ -246,6 +246,7 @@ The following table lists the context parameters for the MSQ task engine:
| `durableShuffleStorage` | SELECT, INSERT, REPLACE <br /><br />Whether to use durable storage for shuffle mesh. To use this feature, configure the durable storage at the server level using `druid.msq.intermediate.storage.enable=true`). If these properties are not configured, any query with the context variable `durableShuffleStorage=true` fails with a configuration error. <br /><br /> | `false` |
| `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on fault tolerance mode or not. Failed workers are retried based on [Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly set to false. | `false` |
| `selectDestination` | SELECT<br /><br /> Controls where the final result of the select query is written. <br />Use `taskReport`(the default) to write select results to the task report. <b> This is not scalable since task reports size explodes for large results </b> <br/>Use `durableStorage` to write results to durable storage location. <b>For large results sets, its recommended to use `durableStorage` </b>. To configure durable storage see [`this`](#durable-storage) section. | `taskReport` |
| `segmentLoadWait` | INSERT, REPLACE<br /><br /> Whether the controller should wait for segments to be loaded before exiting. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query. | `false` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems very geared toward developers, and not the users. We can remove the fact that it queries the broker (since that is an implementation detail).
WDYT of something along the lines:

Suggested change
| `segmentLoadWait` | INSERT, REPLACE<br /><br /> Whether the controller should wait for segments to be loaded before exiting. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query. | `false` |
| `segmentLoadWait` | INSERT, REPLACE<br /><br /> If set, the ingest query waits for the generated segment to be loaded before exiting, else the ingest query exits without waiting. The task and live reports contain the information (about ??) if this flag is set. The drawback is that the tasks stall till the segments are loaded, however ensures that... (advantage) | `false` |

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM!!

@cryptoe cryptoe merged commit 7e987e3 into apache:master Oct 5, 2023
74 checks passed
abhishekagarwal87 pushed a commit that referenced this pull request Oct 11, 2023
This relies on the work done in #14322 and #15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Oct 14, 2023
This relies on the work done in apache#14322 and apache#15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
LakshSingla added a commit that referenced this pull request Oct 16, 2023
This relies on the work done in #14322 and #15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :

Co-authored-by: Sébastien <sebastien@imply.io>
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
Add segmentLoadWait as a query context parameter. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query.
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
This relies on the work done in apache#14322 and apache#15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
Add segmentLoadWait as a query context parameter. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query.
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
This relies on the work done in apache#14322 and apache#15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants