diff --git a/docs/api-reference/sql-ingestion-api.md b/docs/api-reference/sql-ingestion-api.md index a9cceb8d4d96..3a7f52088b5b 100644 --- a/docs/api-reference/sql-ingestion-api.md +++ b/docs/api-reference/sql-ingestion-api.md @@ -3,6 +3,8 @@ id: sql-ingestion-api title: SQL-based ingestion API sidebar_label: SQL-based ingestion --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + - ``` POST /druid/v2/sql/task @@ -69,7 +72,10 @@ POST /druid/v2/sql/task } ``` - + + + + ```bash # Make sure you replace `username`, `password`, `your-instance`, and `port` with the values for your deployment. @@ -83,7 +89,10 @@ curl --location --request POST 'https://:@:

+ + + + ```python import json @@ -108,7 +117,9 @@ print(response.text) ``` - + + + #### Response @@ -132,22 +143,29 @@ You can retrieve status of a query to see if it is still running, completed succ #### Request - + + + - ``` GET /druid/indexer/v1/task//status ``` - + + + + ```bash # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment. curl --location --request GET 'https://:@:/druid/indexer/v1/task//status' ``` - + + + + ```python import requests @@ -163,7 +181,9 @@ response = requests.get(url, headers=headers, data=payload, auth=('USER', 'PASSW print(response.text) ``` - + + + #### Response @@ -208,22 +228,29 @@ For an explanation of the fields in a report, see [Report response fields](#repo #### Request - + + + - ``` GET /druid/indexer/v1/task//reports ``` - + + + + ```bash # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment. curl --location --request GET 'https://:@:/druid/indexer/v1/task//reports' ``` - + + + + ```python import requests @@ -236,7 +263,9 @@ response = requests.get(url, headers=headers, auth=('USER', 'PASSWORD')) print(response.text) ``` - + + + #### Response @@ -511,7 +540,7 @@ The response shows an example report for a query. "0": 1, "1": 1, "2": 1 - }, + }, "totalMergersForUltimateLevel": 1, "progressDigest": 1 } @@ -587,22 +616,29 @@ The following table describes the response fields when you retrieve a report for #### Request - + + + - ``` POST /druid/indexer/v1/task//shutdown ``` - + + + + ```bash # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment. curl --location --request POST 'https://:@:/druid/indexer/v1/task//shutdown' ``` - + + + + ```python import requests @@ -618,7 +654,9 @@ response = requests.post(url, headers=headers, data=payload, auth=('USER', 'PASS print(response.text) ``` - + + + #### Response diff --git a/docs/querying/datasource.md b/docs/querying/datasource.md index e348bc81c660..c1ef350613b5 100644 --- a/docs/querying/datasource.md +++ b/docs/querying/datasource.md @@ -3,6 +3,10 @@ id: datasource title: "Datasources" --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + - + + + ```sql SELECT column1, column2 FROM "druid"."dataSourceName" ``` - + + + ```json { "queryType": "scan", @@ -48,7 +55,8 @@ SELECT column1, column2 FROM "druid"."dataSourceName" "intervals": ["0000/3000"] } ``` - + + The table datasource is the most common type. This is the kind of datasource you get when you perform [data ingestion](../ingestion/index.md). They are split up into segments, distributed around the cluster, @@ -72,12 +80,15 @@ To see a list of all table datasources, use the SQL query ### `lookup` - - + + + ```sql SELECT k, v FROM lookup.countries ``` - + + + ```json { "queryType": "scan", @@ -89,7 +100,8 @@ SELECT k, v FROM lookup.countries "intervals": ["0000/3000"] } ``` - + + Lookup datasources correspond to Druid's key-value [lookup](lookups.md) objects. In [Druid SQL](sql.md#from), they reside in the `lookup` schema. They are preloaded in memory on all servers, so they can be accessed rapidly. @@ -112,8 +124,9 @@ use table datasources. ### `union` - - + + + ```sql SELECT column1, column2 FROM ( @@ -124,7 +137,9 @@ FROM ( SELECT column1, column2 FROM table3 ) ``` - + + + ```json { "queryType": "scan", @@ -136,7 +151,8 @@ FROM ( "intervals": ["0000/3000"] } ``` - + + Unions allow you to treat two or more tables as a single datasource. In SQL, this is done with the UNION ALL operator applied directly to tables, called a ["table-level union"](sql.md#table-level). In native queries, this is done with a @@ -158,8 +174,9 @@ use union datasources. ### `inline` - - + + + ```json { "queryType": "scan", @@ -175,7 +192,8 @@ use union datasources. "intervals": ["0000/3000"] } ``` - + + Inline datasources allow you to query a small amount of data that is embedded in the query itself. They are useful when you want to write a query on a small amount of data without loading it first. They are also useful as inputs into a @@ -193,8 +211,9 @@ use inline datasources. ### `query` - - + + + ```sql -- Uses a subquery to count hits per page, then takes the average. SELECT @@ -202,7 +221,9 @@ SELECT FROM (SELECT page, COUNT(*) AS hits FROM site_traffic GROUP BY page) ``` - + + + ```json { "queryType": "timeseries", @@ -230,7 +251,8 @@ FROM ] } ``` - + + Query datasources allow you to issue subqueries. In native queries, they can appear anywhere that accepts a `dataSource` (except underneath a `union`). In SQL, they can appear in the following places, always surrounded by parentheses: @@ -246,8 +268,9 @@ Query datasources allow you to issue subqueries. In native queries, they can app ### `join` - - + + + ```sql -- Joins "sales" with "countries" (using "store" as the join key) to get sales by country. SELECT @@ -259,7 +282,9 @@ FROM GROUP BY countries.v ``` - + + + ```json { "queryType": "groupBy", @@ -284,7 +309,8 @@ GROUP BY ] } ``` - + + Join datasources allow you to do a SQL-style join of two datasources. Stacking joins on top of each other allows you to join arbitrarily many datasources. @@ -352,9 +378,9 @@ perform best if `d.field` is a string. 4. As of Druid {{DRUIDVERSION}}, the join operator must evaluate the condition for each row. In the future, we expect to implement both early and deferred condition evaluation, which we expect to improve performance considerably for common use cases. -5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into -Join's children). Druid only supports pushing predicates into the join if they originated from -above the join. Hence, the location of predicates and filters in your Druid SQL is very important. +5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into +Join's children). Druid only supports pushing predicates into the join if they originated from +above the join. Hence, the location of predicates and filters in your Druid SQL is very important. Also, as a result of this, comma joins should be avoided. #### Future work for joins @@ -377,15 +403,15 @@ future versions: Use the `unnest` datasource to unnest a column with multiple values in an array. For example, you have a source column that looks like this: -| Nested | -| -- | +| Nested | +| -- | | [a, b] | | [c, d] | | [e, [f,g]] | When you use the `unnest` datasource, the unnested column looks like this: -| Unnested | +| Unnested | | -- | | a | | b | diff --git a/docs/querying/nested-columns.md b/docs/querying/nested-columns.md index 8f13372fdb43..8b9a716eeb5c 100644 --- a/docs/querying/nested-columns.md +++ b/docs/querying/nested-columns.md @@ -3,6 +3,10 @@ id: nested-columns title: "Nested columns" sidebar_label: Nested columns --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + - + + + ``` REPLACE INTO deserialized_example OVERWRITE ALL WITH source AS (SELECT * FROM TABLE( @@ -358,12 +363,14 @@ SELECT "department", "shipTo", "details", - PARSE_JSON("shipTo") as "shipTo_parsed", + PARSE_JSON("shipTo") as "shipTo_parsed", PARSE_JSON("details") as "details_parsed" FROM source PARTITIONED BY DAY ``` - + + + ``` { "type": "index_parallel", @@ -423,7 +430,8 @@ PARTITIONED BY DAY } } ``` - + + ## Querying nested columns @@ -475,7 +483,7 @@ Example query results: ### Extracting nested data elements -The `JSON_VALUE` function is specially optimized to provide native Druid level performance when processing nested literal values, as if they were flattened, traditional, Druid column types. It does this by reading from the specialized nested columns and indexes that are built and stored in JSON objects when Druid creates segments. +The `JSON_VALUE` function is specially optimized to provide native Druid level performance when processing nested literal values, as if they were flattened, traditional, Druid column types. It does this by reading from the specialized nested columns and indexes that are built and stored in JSON objects when Druid creates segments. Some operations using `JSON_VALUE` run faster than those using native Druid columns. For example, filtering numeric types uses the indexes built for nested numeric columns, which are not available for Druid DOUBLE, FLOAT, or LONG columns. @@ -561,7 +569,7 @@ Example query results: ### Transforming JSON object data -In addition to `JSON_VALUE`, Druid offers a number of operators that focus on transforming JSON object data: +In addition to `JSON_VALUE`, Druid offers a number of operators that focus on transforming JSON object data: - `JSON_QUERY` - `JSON_OBJECT` diff --git a/docs/tutorials/tutorial-unnest-arrays.md b/docs/tutorials/tutorial-unnest-arrays.md index 954142f4fa69..e74500b973b9 100644 --- a/docs/tutorials/tutorial-unnest-arrays.md +++ b/docs/tutorials/tutorial-unnest-arrays.md @@ -4,6 +4,9 @@ sidebar_label: "Unnesting arrays" title: "Unnest arrays within a column" --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + - ```sql REPLACE INTO nested_data OVERWRITE ALL @@ -73,10 +77,12 @@ FROM TABLE( '[{"name":"t","type":"string"},{"name":"dim1","type":"string"},{"name":"dim2","type":"string"},{"name":"dim3","type":"string"},{"name":"dim4","type":"string"},{"name":"dim5","type":"string"},{"name":"m1","type":"float"},{"name":"m2","type":"double"}]' ) ) -PARTITIONED BY YEAR +PARTITIONED BY YEAR ``` - + + + ```json { @@ -135,7 +141,8 @@ PARTITIONED BY YEAR } } ``` - + + ## View the data @@ -168,10 +175,10 @@ For more information about the syntax, see [UNNEST](../querying/sql.md#unnest). The following query returns a column called `d3` from the table `nested_data`. `d3` contains the unnested values from the source column `dim3`: ```sql -SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) +SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) ``` -Notice the MV_TO_ARRAY helper function, which converts the multi-value records in `dim3` to arrays. It is required since `dim3` is a multi-value string dimension. +Notice the MV_TO_ARRAY helper function, which converts the multi-value records in `dim3` to arrays. It is required since `dim3` is a multi-value string dimension. If the column you are unnesting is not a string dimension, then you do not need to use the MV_TO_ARRAY helper function. @@ -191,7 +198,7 @@ Another way to unnest a virtual column is to concatenate them with ARRAY_CONCAT: SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS example_table(d45) ``` -Decide which method to use based on what your goals are. +Decide which method to use based on what your goals are. ### Unnest multiple source expressions @@ -227,7 +234,7 @@ SELECT d3 FROM (SELECT dim1, dim2, dim3 FROM "nested_data"), UNNEST(MV_TO_ARRAY( You can specify which rows to unnest by including a filter in your query. The following query: * Filters the source expression based on `dim2` -* Unnests the records in `dim3` into `d3` +* Unnests the records in `dim3` into `d3` * Returns the records for the unnested `d3` that have a `dim2` record that matches the filter ```sql @@ -240,7 +247,7 @@ You can also filter the results of an UNNEST clause. The following example unnes SELECT * FROM UNNEST(ARRAY[1,2,3]) AS example_table(d1) WHERE d1 IN ('1','2') ``` -This means that you can run a query like the following where Druid only return rows that meet the following conditions: +This means that you can run a query like the following where Druid only return rows that meet the following conditions: - The unnested values of `dim3` (aliased to `d3`) matches `IN ('b', 'd')` - The value of `m1` is less than 2. @@ -256,7 +263,7 @@ The query only returns a single row since only one row meets the conditions. You The following query unnests `dim3` and then performs a GROUP BY on the output `d3`. ```sql -SELECT d3 FROM nested_data, UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) GROUP BY d3 +SELECT d3 FROM nested_data, UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) GROUP BY d3 ``` You can further transform your results by including clauses like `ORDER BY d3 DESC` or LIMIT. @@ -267,7 +274,7 @@ The following section shows examples of how you can use the unnest datasource in You can use a single unnest datasource to unnest multiple columns. Be careful when doing this though because it can lead to a very large number of new rows. -### Scan query +### Scan query The following native Scan query returns the rows of the datasource and unnests the values in the `dim3` column by using the `unnest` datasource type: