Support Union in Decoupled planning #17354

kgyrtkirk · 2024-10-15T18:42:14Z

introduces UnionQuery
some changes to enable multiple input datasources for a Query
UnionQuery execution is driven by the QueryExecutor - which could later enable to reduce some complexity in ClientQuerySegmentWalker
to service UnionQueryToolChest methods properly there was a need to get access to other ToolChest-s ; most of the related test system refactor is done in Enhance injector usage during conglomerate builds in tests #17331
renamed UnionQueryRunner to UnionDataSourceQueryRunner
QueryRunnerFactoryConglomerate could act as a QueryToolChestWarehouse which shaves of some unnecessary things here and there
PR contains Enhance injector usage during conglomerate builds in tests #17331

This reverts commit 932d61d.

This reverts commit 27c80e8.

processing/src/test/java/org/apache/druid/query/union/UnionQueryQueryToolChestTest.java

processing/src/test/java/org/apache/druid/query/union/UnionQueryRunnerTest.java

processing/src/test/java/org/apache/druid/query/union/UnionQueryQueryToolChestTest.java

+        .thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());
+
+    QueryRunner<UnionResult> unionRunner = toolChest.makeQueryRunner(query, walker);
+    Sequence<UnionResult> results = unionRunner.run(QueryPlus.wrap(query), null);


processing/src/test/java/org/apache/druid/query/union/UnionQueryRunnerTest.java

+  public void test1()
+  {
+
+    UnionQueryRunner uqr = new UnionQueryRunner(null, null);


imply-cheddar · 2024-10-22T16:44:32Z

processing/src/main/java/org/apache/druid/query/union/UnionQuery.java

+  @Override
+  public DateTimeZone getTimezone()
+  {
+    throw DruidException.defensive("Method supported. [%s]", DruidException.getCurrentMethodName());
+  }


Why do we need the method name in the message? It's part of the exception...

Also, your message is perhaps missing a not? Are these intended to be "not yet implemented" or "not implemented and should never be implemented"?

I should have not started with this - the exception traces are not always shown; which makes the message less meaningfull. Putting more detail into the exception might help identify the actual issue.

but that shouldn't be addressed individually like I was doing here....it would be better to improve on these things separately.

replaced all these exception creations with a DruidException#methodNotSupported

processing/src/main/java/org/apache/druid/query/DefaultQueryRunnerFactoryConglomerate.java

imply-cheddar · 2024-10-22T16:57:47Z

processing/src/main/java/org/apache/druid/query/DefaultQueryRunnerFactoryConglomerate.java

+  @Override
+  @SuppressWarnings("unchecked")
+  public <T, QueryType extends Query<T>> QueryExecutor<T> getQueryExecutor(QueryType query)
+  {
+    QueryToolChest<T, QueryType> toolchest = getToolChest(query);
+    if (toolchest instanceof QueryExecutor) {
+      return (QueryExecutor<T>) toolchest;
+    }
+    return null;
  }


Naming issue: "QueryExecutor" is extremely generic, the name should more reflect where in the lifecycle the query is being executed, so that it's more clear what this is in charge of. We should also have javadoc describing what this is responsible for. I'd suggest something like, QueryEntryPointRunner or something similar to that.

Or we name the interface QueryLogic and have a method called entryPoint. This would actually align with something else I'm doing as well, so I kinda like that. Especially as the distinction between the ToolChest and the QueryRunnerFactory has basically disappeared as the system has evolved.

imply-cheddar · 2024-10-22T17:25:56Z

processing/src/main/java/org/apache/druid/query/Query.java

+  default Query<T> withDataSources(List<DataSource> children)
+  {
+    if (children.size() != 1) {
+      throw new IAE("Must have exactly one child");
+    }
+    return withDataSource(Iterables.getOnlyElement(children));
+  }


These methods for getDataSources and withDataSourcesI would hope that we can eliminate on the interface. They are a leaky abstraction, I think that any code that would actually need these would be able to be avoided by implementing the QueryExecutor thingie which means that these methods don't need to exist on the interface anymore, right?

these were needed for 2 reasons:

before this PR the DataSource class alredy had [get|with]Children methods ; since this changes the the Query to have multiple of them - it feeled like communicating it clearly is better - so that all existing places it

I had to also use this in recursivelyClearContext

generateSubqueryIds and insertSubqueryIds use these to set context variables

alternative options could be:

there is possibly another way to not have these methods: provide a visitor interface which could "visit" all queries / datasources optionally rewriting them

it might worth a try to service these differently from QueryDataSource (so that it does not appear on the main Query interface)

I've done the second approach from above; I'm not sure about the complexity that will cause in the long run but its not that bad right now...I had to add 3 instanceof-s:

2 in QueryDataSource

1 in a test related clearRecursivelyContext method

imply-cheddar · 2024-10-22T17:27:57Z

processing/src/main/java/org/apache/druid/query/QueryExecutor.java

+/**
+ * Executes the query by utilizing the given walker.
+ */
+public interface QueryExecutor<T>
+{
+  QueryRunner<T> makeQueryRunner(
+      Query<T> query,
+      QuerySegmentWalker walker
+  );
+}


I made this comment elsewhere as well, but what do you think about calling this QueryLogic and then the method you are creating is initialEntryPoint()?

processing/src/main/java/org/apache/druid/query/union/UnionQueryRunner.java

processing/src/main/java/org/apache/druid/query/union/UnionResult.java

imply-cheddar · 2024-10-22T18:11:19Z

server/src/main/java/org/apache/druid/server/ClientQuerySegmentWalker.java

+
+          QueryExecutor<Object> subQueryExecutor = conglomerate.getQueryExecutor(subQuery);
+          final QueryRunner subQueryRunner;
+          if (subQueryExecutor != null) {
+            subQueryRunner = subQueryExecutor.makeQueryRunner(subQueryWithSerialization, this);
+          } else {
+            subQueryRunner = subQueryWithSerialization.getRunner(this);
+          }
+
+          queryResults = subQueryRunner


This feels like the wrong place to be checking for the Executor.

Why can't we do it up around line 386?

server/src/test/java/org/apache/druid/query/QueryRunnerBasedOnClusteredClientTestBase.java

imply-cheddar · 2024-10-22T18:39:06Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/logical/DruidUnion.java

+    throw DruidException.defensive("XXXOnly Table and Values are supported as inputs for Union [%s]", sources);
+  }


I'm unsure if this is truly a "defensive" exception or if it is indicative of bad user input? Is there a way that a user can do something to cause this exception to get thrown?

If we don't expect it to be seen, then a message like "Got an input type [%s] that is not supported. This should not happen".

processing/src/test/java/org/apache/druid/query/union/UnionQueryQueryToolChestTest.java

+        .thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());
+
+    QueryRunner<Object> unionRunner = toolChest.entryPoint(query, walker);
+    Sequence<Object> results = unionRunner.run(QueryPlus.wrap(query), null);


processing/src/test/java/org/apache/druid/query/union/UnionQueryQueryToolChestTest.java

+        .thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());
+
+    QueryRunner<Object> unionRunner = toolChest.entryPoint(query, walker);
+    Sequence<Object> results = unionRunner.run(QueryPlus.wrap(query), null);


kgyrtkirk added 30 commits September 30, 2024 14:21

Support UNNEST in decoupled mode

1792656

cleanup

be298b7

some stuff

726eec6

add stuff

fadaa32

updates

6daa7d4

add debug result

1dc39ea

Merge remote-tracking branch 'apache/master' into decouple-unnest

87fc449

Merge branch 'decouple-unnest' into decouple-union

d8f440d

add cast

df44cb5

x

8f9b808

return emptylist

932d61d

Revert "return emptylist"

d62e993

This reverts commit 932d61d.

undo empty

d5ecc0c

use UnionResult for now

4f382e5

add some stuff

44ea85e

make method non-static

24ae912

make non-static more

6035164

make getToolChest method

9ef426f

setwarehouse

2f78fd3

make union work

cae132a

foxes

63caf06

fix serialization

dc25d2f

re-add

60d0408

undo emptylit

43ae60d

x

8f58d49

cant be fixed?

2a4ab81

mods

f96969d

minimalize change

b36e9eb

canMaterializeQuery

27c80e8

Revert "canMaterializeQuery"

5284ad8

This reverts commit 27c80e8.

kgyrtkirk added 6 commits October 17, 2024 14:42

use conglomerate.getQueryExecutor

574b337

use executor

c17a3ef

undo

4a76077

cleanup

15b6987

remove SupportRowSignature

51bbf55

make unionqueryrunnerfactory not necessatry

a3f8445

kgyrtkirk marked this pull request as ready for review October 17, 2024 16:45

kgyrtkirk added 5 commits October 17, 2024 16:49

fix compile

069e700

fix style

66736dd

fix

98cb67a

add test

7fbc842

service ToolChestWarehouse with DefaultQueryRunnerFactoryConglomerate

ab21c40

github-advanced-security bot found potential problems Oct 17, 2024

View reviewed changes

processing/src/test/java/org/apache/druid/query/union/UnionQueryQueryToolChestTest.java Fixed Show fixed Hide fixed

processing/src/test/java/org/apache/druid/query/union/UnionQueryRunnerTest.java Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Oct 17, 2024

View reviewed changes

remove invalid file

1bdcc4f

imply-cheddar reviewed Oct 22, 2024

View reviewed changes

kgyrtkirk added 7 commits October 23, 2024 11:50

add factory method

863b947

rename method

61ab951

remove getDataSources from Query interface

633adc1

remove field

de4c2df

update message

f4d16a6

simpler exception

d4c9877

transform results in the runner

862493e

github-actions bot added the Area - Streaming Ingestion label Oct 23, 2024

kgyrtkirk added 4 commits October 23, 2024 15:00

cleanup/fix style

689b5b5

move inlined subquery eval for querylogic

3cd56c9

typo

43c1078

rename

515d064

github-advanced-security bot found potential problems Oct 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Union in Decoupled planning #17354

Support Union in Decoupled planning #17354

kgyrtkirk commented Oct 15, 2024

imply-cheddar Oct 22, 2024

kgyrtkirk Oct 23, 2024

imply-cheddar Oct 22, 2024

imply-cheddar Oct 22, 2024

imply-cheddar Oct 22, 2024

kgyrtkirk Oct 23, 2024

imply-cheddar Oct 22, 2024

imply-cheddar Oct 22, 2024

imply-cheddar Oct 22, 2024

		throw DruidException.defensive("XXXOnly Table and Values are supported as inputs for Union [%s]", sources);
		}

Support Union in Decoupled planning #17354

Are you sure you want to change the base?

Support Union in Decoupled planning #17354

Conversation

kgyrtkirk commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment