-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document new filters and stuff #14760
Conversation
website/static/css/custom.css
Outdated
.getAPI { | ||
color: #0073e6; | ||
font-weight: bold; | ||
} | ||
.postAPI { | ||
color: #00bf7d; | ||
font-weight: bold; | ||
} | ||
.deleteAPI { | ||
color: #f49200; | ||
font-weight: bold; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... idk why this happened, must've been when running mvn commands to spellcheck before I committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to just restore the changes to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some suggestions!
docs/ingestion/schema-design.md
Outdated
@@ -261,6 +261,8 @@ native boolean types, Druid ingests these values as strings if `druid.expression | |||
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested | |||
columns can be queried with the [JSON functions](../querying/sql-json-functions.md). | |||
|
|||
We also highly recommend setting `druid.generic.useDefaultValueForNull=false` when using these columns since it also enables out of the box `ARRAY` type filtering. If this is not set to true, setting `sqlUseBoundsAndSelectors` to `false` on the [SQL query context](../querying/sql-query-context.md) can enable `ARRAY` filtering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. First it says set useDefaultValueForNull
to false
to enable ARRAY filtering. Then it says if this is set to false (not set to true) you can set something else to enable ARRAY filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops yeah, this was a mistake
docs/querying/filters.md
Outdated
| -------- | ----------- | -------- | | ||
| `type` | Must be "selector".| Yes | | ||
| `dimension` | Input column or virtual column name to filter. | Yes | | ||
| `value` | String value to match. | No, if not specified the filter will match NULL values. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `value` | String value to match. | No, if not specified the filter will match NULL values. | | |
| `value` | String value to match. | No, if not specified the filter matches NULL values. | |
docs/querying/filters.md
Outdated
|
||
This is the equivalent of `WHERE <dimension_string> = '<dimension_value_string>'` or `WHERE <dimension_string> IS NULL` | ||
(if the `value` is `null`). | ||
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`, | |
The selector filter can only match against `STRING` (single and multi-valued), `LONG`, `FLOAT`, |
docs/querying/filters.md
Outdated
|
||
The selector filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details. | ||
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value. | |
When the selector filter matches against numeric inputs, the string `value` will be best-effort coerced into a numeric value. |
docs/querying/filters.md
Outdated
|
||
## Logical expression filters | ||
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead. | |
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter is designed to include more SQL-compatible behavior than the selector filter and so can't match null values. To match null values, use the null filter. |
docs/querying/filters.md
Outdated
@@ -449,29 +697,97 @@ The following matches dimension values in `[product_1, product_3, product_5]` fo | |||
|
|||
Druid supports filtering on timestamp, string, long, and float columns. | |||
|
|||
Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to | |||
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to | |
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Queries that filter on other column types must |
docs/querying/filters.md
Outdated
scan those columns. | ||
|
||
### Filtering on multi-value string columns | ||
|
||
All filters will return true if any one of the dimension values is satisfies the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All filters will return true if any one of the dimension values is satisfies the filter. | |
All filters return true if any one of the dimension values satisfies the filter. |
docs/querying/sql-query-context.md
Outdated
@@ -44,6 +44,7 @@ Configure Druid SQL query planning using the parameters in the table below. | |||
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)| | |||
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)| | |||
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)| | |||
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. | | |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. | |
docs/querying/filters.md
Outdated
@@ -490,20 +806,31 @@ should be specified as if the timestamp values were strings. | |||
|
|||
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful. | |
If you want to interpret the timestamp with a specific format, timezone, or locale, use the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function). |
docs/querying/filters.md
Outdated
``` | ||
will successfully match the entire row, but not match a row with value `['a', 'c']`. | ||
|
||
To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters. | |
To express this filter in SQL, use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits on style
docs/querying/filters.md
Outdated
| -------- | ----------- | -------- | | ||
| `type` | Must be "equality".| Yes | | ||
| `column` | Input column or virtual column name to filter. | Yes | | ||
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes | | |
| `matchValueType` | String specifying the type of value to match. For example, `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes | |
docs/querying/filters.md
Outdated
|
||
Search filters can be used to filter on partial string matches. | ||
### Example: equivalent of `WHERE someColumn = someLongColumn`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Example: equivalent of `WHERE someColumn = someLongColumn`. | |
### Example: equivalent of `WHERE someColumn = someLongColumn` |
docs/querying/filters.md
Outdated
|
||
Note that the bound filter matches null values if you don't specify a lower bound. Use the range filter if SQL-compatible behavior. | ||
|
||
### Example: equivalent to `WHERE 21 <= age <= 31`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Example: equivalent to `WHERE 21 <= age <= 31`: | |
### Example: equivalent to `WHERE 21 <= age <= 31` |
docs/querying/filters.md
Outdated
@@ -303,7 +261,7 @@ The following bound filter expresses the condition `21 <= age <= 31`: | |||
} | |||
``` | |||
|
|||
This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order. | |||
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order. | |
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order |
docs/querying/filters.md
Outdated
@@ -328,7 +286,7 @@ Using strict bounds, this filter expresses the condition `21 < age < 31` | |||
} | |||
``` | |||
|
|||
The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`. | |||
### Example: equivalent to `WHERE age < 31`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Example: equivalent to `WHERE age < 31`. | |
### Example: equivalent to `WHERE age < 31` |
docs/querying/filters.md
Outdated
|
||
All filters return true if any one of the dimension values is satisfies the filter. | ||
|
||
#### Example: multi-value match behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Example: multi-value match behavior. | |
#### Example: multi-value match behavior |
docs/querying/filters.md
Outdated
converted into a numeric predicate and will be applied to the numeric column values directly. In some cases (such as | ||
the "regex" filter) the numeric column values will be converted to strings during the scan. | ||
|
||
For example, filtering on a specific value, `myFloatColumn = 10.1`: | ||
#### Example: filtering on a specific value, `myFloatColumn = 10.1`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Example: filtering on a specific value, `myFloatColumn = 10.1`: | |
#### Example: filtering on a specific value, `myFloatColumn = 10.1` |
docs/querying/filters.md
Outdated
"type": "selector", | ||
"dimension": "myFloatColumn", | ||
"value": "10.1" | ||
} | ||
``` | ||
|
||
Filtering on a range of values, `10 <= myFloatColumn < 20`: | ||
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`: | |
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20` |
docs/querying/filters.md
Outdated
"type": "selector", | ||
"dimension": "__time", | ||
"value": "124457387532" | ||
} | ||
``` | ||
|
||
Filtering on day of week: | ||
#### Example: filtering on day of week using an extractionFn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Example: filtering on day of week using an extractionFn | |
#### Example: filtering on day of week using an extraction function |
docs/querying/sql-query-context.md
Outdated
@@ -44,6 +44,7 @@ Configure Druid SQL query planning using the parameters in the table below. | |||
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)| | |||
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)| | |||
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)| | |||
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. | | |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull` | |
Description
Adds documentation to the new filters and SQL query context added in #14542, and also re-arranges some of the native filter documentation and makes things consistently use tables to specify their grammar similar to as I did in #14497.