Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport] Doc fixes for query from deep storage and MSQ (#15313) #15319

Merged
merged 1 commit into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions docs/multi-stage-query/known-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,6 @@ an [UnknownError](./reference.md#error_UnknownError) with a message including "N
- `GROUPING SETS` are not implemented. Queries using these features return a
[QueryNotSupported](reference.md#error_QueryNotSupported) error.

- For some `COUNT DISTINCT` queries, you'll encounter a [QueryNotSupported](reference.md#error_QueryNotSupported) error
that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use
`GROUPING SET`s, which are not implemented.

- The numeric varieties of the `EARLIEST` and `LATEST` aggregators do not work properly. Attempting to use the numeric
varieties of these aggregators lead to an error like
`java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair`.
Expand Down
4 changes: 4 additions & 0 deletions docs/querying/query-from-deep-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ title: "Query from deep storage"

Druid can query segments that are only stored in deep storage. Running a query from deep storage is slower than running queries from segments that are loaded on Historical processes, but it's a great tool for data that you either access infrequently or where the low latency results that typical Druid queries provide is not necessary. Queries from deep storage can increase the surface area of data available to query without requiring you to scale your Historical processes to accommodate more segments.

## Prerequisites

Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information.

## Keep segments in deep storage only

Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes.
Expand Down
2 changes: 2 additions & 0 deletions docs/tutorials/tutorial-query-deep-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ To run the queries in this tutorial, replace `ROUTER:PORT` with the location of

For more general information, see [Query from deep storage](../querying/query-from-deep-storage.md).

If you are trying this feature on an existing cluster, make sure query from deep storage [prerequisites](../querying/query-from-deep-storage.md#prerequisites) are met.

## Load example data

Use the **Load data** wizard or the following SQL query to ingest the `wikipedia` sample datasource bundled with Druid. If you use the wizard, make sure you change the partitioning to be by hour.
Expand Down