From d99fffa19a0cd02161956e8f9a194563d01baf4d Mon Sep 17 00:00:00 2001 From: Karan Kumar Date: Fri, 3 Nov 2023 10:52:20 +0530 Subject: [PATCH] Doc fixes for query from deep storage and MSQ (#15313) Minor updates to the documentation. Added prerequisites. Removed a known issue in MSQ since its no longer valid. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> (cherry picked from commit 5036af6fb3433ab98ce965c064184168fb28436b) --- docs/multi-stage-query/known-issues.md | 4 ---- docs/querying/query-from-deep-storage.md | 4 ++++ docs/tutorials/tutorial-query-deep-storage.md | 2 ++ 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/multi-stage-query/known-issues.md b/docs/multi-stage-query/known-issues.md index 570a7f58fa4e..f4e97dc23dad 100644 --- a/docs/multi-stage-query/known-issues.md +++ b/docs/multi-stage-query/known-issues.md @@ -42,10 +42,6 @@ an [UnknownError](./reference.md#error_UnknownError) with a message including "N - `GROUPING SETS` are not implemented. Queries using these features return a [QueryNotSupported](reference.md#error_QueryNotSupported) error. -- For some `COUNT DISTINCT` queries, you'll encounter a [QueryNotSupported](reference.md#error_QueryNotSupported) error - that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use - `GROUPING SET`s, which are not implemented. - - The numeric varieties of the `EARLIEST` and `LATEST` aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like `java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair`. diff --git a/docs/querying/query-from-deep-storage.md b/docs/querying/query-from-deep-storage.md index fba131eba312..c9c97f780ddb 100644 --- a/docs/querying/query-from-deep-storage.md +++ b/docs/querying/query-from-deep-storage.md @@ -24,6 +24,10 @@ title: "Query from deep storage" Druid can query segments that are only stored in deep storage. Running a query from deep storage is slower than running queries from segments that are loaded on Historical processes, but it's a great tool for data that you either access infrequently or where the low latency results that typical Druid queries provide is not necessary. Queries from deep storage can increase the surface area of data available to query without requiring you to scale your Historical processes to accommodate more segments. +## Prerequisites + +Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information. + ## Keep segments in deep storage only Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. diff --git a/docs/tutorials/tutorial-query-deep-storage.md b/docs/tutorials/tutorial-query-deep-storage.md index d4dfe5e69987..dfb4de22eb01 100644 --- a/docs/tutorials/tutorial-query-deep-storage.md +++ b/docs/tutorials/tutorial-query-deep-storage.md @@ -31,6 +31,8 @@ To run the queries in this tutorial, replace `ROUTER:PORT` with the location of For more general information, see [Query from deep storage](../querying/query-from-deep-storage.md). +If you are trying this feature on an existing cluster, make sure query from deep storage [prerequisites](../querying/query-from-deep-storage.md#prerequisites) are met. + ## Load example data Use the **Load data** wizard or the following SQL query to ingest the `wikipedia` sample datasource bundled with Druid. If you use the wizard, make sure you change the partitioning to be by hour.