Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document new filters and stuff #14760

Merged
merged 7 commits into from
Aug 8, 2023
Merged

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Aug 5, 2023

Description

Adds documentation to the new filters and SQL query context added in #14542, and also re-arranges some of the native filter documentation and makes things consistently use tables to specify their grammar similar to as I did in #14497.

Comment on lines 102 to 113
.getAPI {
color: #0073e6;
font-weight: bold;
}
.postAPI {
color: #00bf7d;
font-weight: bold;
}
.deleteAPI {
color: #f49200;
font-weight: bold;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... idk why this happened, must've been when running mvn commands to spellcheck before I committed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to just restore the changes to this file?

Copy link
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some suggestions!

@@ -261,6 +261,8 @@ native boolean types, Druid ingests these values as strings if `druid.expression
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

We also highly recommend setting `druid.generic.useDefaultValueForNull=false` when using these columns since it also enables out of the box `ARRAY` type filtering. If this is not set to true, setting `sqlUseBoundsAndSelectors` to `false` on the [SQL query context](../querying/sql-query-context.md) can enable `ARRAY` filtering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. First it says set useDefaultValueForNull to false to enable ARRAY filtering. Then it says if this is set to false (not set to true) you can set something else to enable ARRAY filtering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yeah, this was a mistake

| -------- | ----------- | -------- |
| `type` | Must be "selector".| Yes |
| `dimension` | Input column or virtual column name to filter. | Yes |
| `value` | String value to match. | No, if not specified the filter will match NULL values. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `value` | String value to match. | No, if not specified the filter will match NULL values. |
| `value` | String value to match. | No, if not specified the filter matches NULL values. |


This is the equivalent of `WHERE <dimension_string> = '<dimension_value_string>'` or `WHERE <dimension_string> IS NULL`
(if the `value` is `null`).
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,
The selector filter can only match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,


The selector filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value.
When the selector filter matches against numeric inputs, the string `value` will be best-effort coerced into a numeric value.


## Logical expression filters
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead.
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter is designed to include more SQL-compatible behavior than the selector filter and so can't match null values. To match null values, use the null filter.

@@ -449,29 +697,97 @@ The following matches dimension values in `[product_1, product_3, product_5]` fo

Druid supports filtering on timestamp, string, long, and float columns.

Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Queries that filter on other column types must

scan those columns.

### Filtering on multi-value string columns

All filters will return true if any one of the dimension values is satisfies the filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All filters will return true if any one of the dimension values is satisfies the filter.
All filters return true if any one of the dimension values satisfies the filter.

@@ -44,6 +44,7 @@ Configure Druid SQL query planning using the parameters in the table below.
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)|
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)|
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |

@@ -490,20 +806,31 @@ should be specified as if the timestamp values were strings.

If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful.
If you want to interpret the timestamp with a specific format, timezone, or locale, use the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function).

```
will successfully match the entire row, but not match a row with value `['a', 'c']`.

To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.
To express this filter in SQL, use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.

Copy link
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some suggestions!

Copy link
Member

@vtlim vtlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on style

| -------- | ----------- | -------- |
| `type` | Must be "equality".| Yes |
| `column` | Input column or virtual column name to filter. | Yes |
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |
| `matchValueType` | String specifying the type of value to match. For example, `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |


Search filters can be used to filter on partial string matches.
### Example: equivalent of `WHERE someColumn = someLongColumn`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent of `WHERE someColumn = someLongColumn`.
### Example: equivalent of `WHERE someColumn = someLongColumn`


Note that the bound filter matches null values if you don't specify a lower bound. Use the range filter if SQL-compatible behavior.

### Example: equivalent to `WHERE 21 <= age <= 31`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE 21 <= age <= 31`:
### Example: equivalent to `WHERE 21 <= age <= 31`

@@ -303,7 +261,7 @@ The following bound filter expresses the condition `21 <= age <= 31`:
}
```

This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order.
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order.
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order

@@ -328,7 +286,7 @@ Using strict bounds, this filter expresses the condition `21 < age < 31`
}
```

The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`.
### Example: equivalent to `WHERE age < 31`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE age < 31`.
### Example: equivalent to `WHERE age < 31`


All filters return true if any one of the dimension values is satisfies the filter.

#### Example: multi-value match behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: multi-value match behavior.
#### Example: multi-value match behavior

converted into a numeric predicate and will be applied to the numeric column values directly. In some cases (such as
the "regex" filter) the numeric column values will be converted to strings during the scan.

For example, filtering on a specific value, `myFloatColumn = 10.1`:
#### Example: filtering on a specific value, `myFloatColumn = 10.1`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on a specific value, `myFloatColumn = 10.1`:
#### Example: filtering on a specific value, `myFloatColumn = 10.1`

"type": "selector",
"dimension": "myFloatColumn",
"value": "10.1"
}
```

Filtering on a range of values, `10 <= myFloatColumn < 20`:
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`:
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`

"type": "selector",
"dimension": "__time",
"value": "124457387532"
}
```

Filtering on day of week:
#### Example: filtering on day of week using an extractionFn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on day of week using an extractionFn
#### Example: filtering on day of week using an extraction function

@@ -44,6 +44,7 @@ Configure Druid SQL query planning using the parameters in the table below.
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)|
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)|
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull` |

@vtlim vtlim merged commit e57f880 into apache:master Aug 8, 2023
10 checks passed
@clintropolis clintropolis deleted the new-filter-docs branch August 8, 2023 23:09
clintropolis added a commit to clintropolis/druid that referenced this pull request Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants