Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: query from deep storage #14609

Merged
merged 43 commits into from
Aug 4, 2023
Merged

Conversation

317brian
Copy link
Contributor

@317brian 317brian commented Jul 18, 2023

WIP but there's enough here to start the review process

  • Adds a new page for query from deep storage and puts it in the Druid SQL section of Querying
  • Updates SQL API reference to include the query from deep storage endpoint
  • Updates mentions of deep storage that previously stated that you couldn't query from it
  • Moves durable storage info (mostly) out of SQL-based ingestion and to operations since the info is now shared between SQL-based ingestion and query from deep storage. (Note that this is part of breaking up some of the info in there and moving toward an information architecture where SQL-based ingestion is the default for batch.)
  • Updates webconsole screenshots for datasource and segments

This PR has:

  • been self-reviewed.


To enable durable storage, you need to set the following common service properties:

```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cryptoe are there any new durable storage configs specifically for query from deep storage? Or are results written to the tempDir property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tempDir is used more as a staging directory before pushing out bytes to s3. Its not related to results of the query.

@vtlim vtlim requested a review from cryptoe July 18, 2023 23:06
@317brian 317brian marked this pull request as ready for review July 20, 2023 17:09
docs/api-reference/sql-api.md Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Will finish the next round by eod.

You can use the `sql/statements` endpoint to query segments that exist only in deep storage and are not loaded onto your Historical processes as determined by your load rules.

Note that at least part of a datasource must be available on a Historical process so that Druid can plan your query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that at least part of a datasource must be available on a Historical process so that Druid can plan your query.
Note that at least one segment of a datasource must be available on a Historical process so that the broker can plan your query. A quick way to check this is that data source should be visible on the druid console.


- The only supported value for `resultFormat` is JSON.
- Only the user who submits a query can see the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response why execution mode is async is this pojo: https://github.com/apache/druid/blob/master/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/entity/SqlStatementResult.java

We might want to mention that in the response payload.

```

Returns information about the query associated with the given query ID. The response matches the response from the POST API if the query is accepted or running. The response for a completed query includes the same information as an in-progress query with several additions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get query status response and the postReq response is the same when executionMode=async

docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/design/architecture.md Outdated Show resolved Hide resolved
docs/operations/durable-storage.md Show resolved Hide resolved
docs/operations/durable-storage.md Show resolved Hide resolved
docs/operations/rule-configuration.md Outdated Show resolved Hide resolved
@317brian 317brian requested a review from cryptoe July 31, 2023 20:41
docs/operations/durable-storage.md Outdated Show resolved Hide resolved
docs/operations/durable-storage.md Outdated Show resolved Hide resolved
docs/operations/durable-storage.md Outdated Show resolved Hide resolved
docs/operations/rule-configuration.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/querying/query-from-deep-storage.md Outdated Show resolved Hide resolved
docs/tutorials/tutorial-query-deep-storage.md Outdated Show resolved Hide resolved
docs/tutorials/tutorial-query-deep-storage.md Outdated Show resolved Hide resolved
317brian and others added 2 commits July 31, 2023 16:17
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
Rest all LGTM.
Thanks for adding the tutorial page.

docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/multi-stage-query/reference.md Outdated Show resolved Hide resolved
docs/operations/durable-storage.md Outdated Show resolved Hide resolved

To enable durable storage, you need to set the following common service properties:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tempDir is used more as a staging directory before pushing out bytes to s3. Its not related to results of the query.

317brian and others added 3 commits August 1, 2023 13:48
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
@317brian 317brian requested a review from cryptoe August 2, 2023 17:26
@adarshsanjeev adarshsanjeev mentioned this pull request Aug 3, 2023
10 tasks
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left 2 comments.

docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
docs/api-reference/sql-api.md Outdated Show resolved Hide resolved
@317brian
Copy link
Contributor Author

317brian commented Aug 4, 2023

Can we go ahead and merge this? That CI failure looks unrelated to my PR since it is complaining about Intellij

@cryptoe cryptoe merged commit 3b5b6c6 into apache:master Aug 4, 2023
9 of 10 checks passed
@cryptoe
Copy link
Contributor

cryptoe commented Aug 4, 2023

Thanks for the PR @317brian.

@cryptoe cryptoe added this to the 27.0 milestone Aug 4, 2023
317brian added a commit to 317brian/druid that referenced this pull request Aug 4, 2023
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate apache#14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)
AmatyaAvadhanula pushed a commit that referenced this pull request Aug 4, 2023
* docs: query from deep storage (#14609)

* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate #14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)

* change code tab styles to Docusaurus2 format from Docu1 format

* fix spelling file

* doc cleanup

(cherry picked from commit 2218a13)

---------

Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
317brian added a commit to 317brian/druid that referenced this pull request Aug 4, 2023
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate apache#14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)
(cherry picked from commit ed83c56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants