Skip to content

Commit

Permalink
docs: update future development blurbs (#16939)
Browse files Browse the repository at this point in the history
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
  • Loading branch information
317brian and vtlim authored Oct 1, 2024
1 parent 878adff commit 1fc82a9
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 17 deletions.
5 changes: 2 additions & 3 deletions docs/design/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,9 @@ for reading from external data sources and publishing new Druid segments.
[**Indexer**](../design/indexer.md) services are an alternative to Middle Managers and Peons. Instead of
forking separate JVM processes per-task, the Indexer runs tasks as individual threads within a single JVM process.

The Indexer is designed to be easier to configure and deploy compared to the Middle Manager + Peon system and to better enable resource sharing across tasks. The Indexer is a newer feature and is currently designated [experimental](../development/experimental.md) due to the fact that its memory management system is still under
development. It will continue to mature in future versions of Druid.
The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks, which can help streaming ingestion. The Indexer is currently designated [experimental](../development/experimental.md).

Typically, you would deploy either Middle Managers or Indexers, but not both.
Typically, you would deploy one of the following: MiddleManagers, [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md), or Indexers. You wouldn't deploy more than one of these options.

## Colocation of services

Expand Down
3 changes: 1 addition & 2 deletions docs/design/indexer.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ sidebar_label: "Indexer"
-->

:::info
The Indexer is an optional and [experimental](../development/experimental.md) feature.
Its memory management system is still under development and will be significantly enhanced in later releases.
The Indexer is an optional and experimental feature. If you're primarily performing batch ingestion, we recommend you use either the MiddleManager and Peon task execution system or [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md). If you're primarily doing streaming ingestion, you may want to try either [MiddleManager-less ingestion using Kubernetes](../development/extensions-contrib/k8s-jobs.md) or the Indexer service.
:::

The Apache Druid Indexer service is an alternative to the Middle Manager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.
Expand Down
2 changes: 1 addition & 1 deletion docs/development/extensions-core/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Additionally, this extension has following configuration.

### Gotchas

- Label/Annotation path in each pod spec MUST EXIST, which is easily satisfied if there is at least one label/annotation in the pod spec already. This limitation may be removed in future.
- Label/Annotation path in each pod spec MUST EXIST, which is easily satisfied if there is at least one label/annotation in the pod spec already.
- All Druid Pods belonging to one Druid cluster must be inside same kubernetes namespace.
- All Druid Pods need permissions to be able to add labels to self-pod, List and Watch other Pods, create and read ConfigMap for leader election. Assuming, "default" service account is used by Druid pods, you might need to add following or something similar Kubernetes Role and Role Binding.

Expand Down
18 changes: 7 additions & 11 deletions docs/querying/datasource.md
Original file line number Diff line number Diff line change
Expand Up @@ -431,25 +431,21 @@ and how to detect it.
3. One common reason for implicit subquery generation is if the types of the two halves of an equality do not match.
For example, since lookup keys are always strings, the condition `druid.d JOIN lookup.l ON d.field = l.field` will
perform best if `d.field` is a string.
4. The join operator must evaluate the condition for each row. In the future, we expect
to implement both early and deferred condition evaluation, which we expect to improve performance considerably for
common use cases.
4. The join operator must evaluate the condition for each row.
5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into
Join's children). Druid only supports pushing predicates into the join if they originated from
above the join. Hence, the location of predicates and filters in your Druid SQL is very important.
Also, as a result of this, comma joins should be avoided.

#### Future work for joins
#### Limitations for joins

Joins are an area of active development in Druid. The following features are missing today but may appear in
future versions:
Joins in Druid have the following limitations:

- Reordering of join operations to get the most performant plan.
- Preloaded dimension tables that are wider than lookups (i.e. supporting more than a single key and single value).
- RIGHT OUTER and FULL OUTER joins in the native query engine. Currently, they are partially implemented. Queries run
- The order of joins is not entirely optimized. Join operations are not reordered to get the most performant plan.
- Preloaded dimension tables that are wider than lookups (i.e. supporting more than a single key and single value) are not supported.
- RIGHT OUTER and FULL OUTER joins in the native query engine are not fully implemented. Queries run
but results are not always correct.
- Performance-related optimizations as mentioned in the [previous section](#join-performance).
- Join conditions on a column containing a multi-value dimension.
- Join conditions on a column can't contain a multi-value dimension.

### `unnest`

Expand Down

0 comments on commit 1fc82a9

Please sign in to comment.