Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Union in Decoupled planning #17354

Open
wants to merge 159 commits into
base: master
Choose a base branch
from

Conversation

kgyrtkirk
Copy link
Member

  • introduces UnionQuery
  • some changes to enable multiple input datasources for a Query
  • UnionQuery execution is driven by the QueryExecutor - which could later enable to reduce some complexity in ClientQuerySegmentWalker
  • to service UnionQueryToolChest methods properly there was a need to get access to other ToolChest-s ; most of the related test system refactor is done in Enhance injector usage during conglomerate builds in tests #17331
  • renamed UnionQueryRunner to UnionDataSourceQueryRunner
  • QueryRunnerFactoryConglomerate could act as a QueryToolChestWarehouse which shaves of some unnecessary things here and there
  • PR contains Enhance injector usage during conglomerate builds in tests #17331

@kgyrtkirk kgyrtkirk marked this pull request as ready for review October 17, 2024 16:45
.thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());

QueryRunner<UnionResult> unionRunner = toolChest.makeQueryRunner(query, walker);
Sequence<UnionResult> results = unionRunner.run(QueryPlus.wrap(query), null);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
public void test1()
{

UnionQueryRunner uqr = new UnionQueryRunner(null, null);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
Comment on lines 134 to 138
@Override
public DateTimeZone getTimezone()
{
throw DruidException.defensive("Method supported. [%s]", DruidException.getCurrentMethodName());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the method name in the message? It's part of the exception...

Also, your message is perhaps missing a not? Are these intended to be "not yet implemented" or "not implemented and should never be implemented"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have not started with this - the exception traces are not always shown; which makes the message less meaningfull. Putting more detail into the exception might help identify the actual issue.

but that shouldn't be addressed individually like I was doing here....it would be better to improve on these things separately.

replaced all these exception creations with a DruidException#methodNotSupported

Comment on lines 61 to 70
@Override
@SuppressWarnings("unchecked")
public <T, QueryType extends Query<T>> QueryExecutor<T> getQueryExecutor(QueryType query)
{
QueryToolChest<T, QueryType> toolchest = getToolChest(query);
if (toolchest instanceof QueryExecutor) {
return (QueryExecutor<T>) toolchest;
}
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming issue: "QueryExecutor" is extremely generic, the name should more reflect where in the lifecycle the query is being executed, so that it's more clear what this is in charge of. We should also have javadoc describing what this is responsible for. I'd suggest something like, QueryEntryPointRunner or something similar to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we name the interface QueryLogic and have a method called entryPoint. This would actually align with something else I'm doing as well, so I kinda like that. Especially as the distinction between the ToolChest and the QueryRunnerFactory has basically disappeared as the system has evolved.

Comment on lines 302 to 308
default Query<T> withDataSources(List<DataSource> children)
{
if (children.size() != 1) {
throw new IAE("Must have exactly one child");
}
return withDataSource(Iterables.getOnlyElement(children));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods for getDataSources and withDataSourcesI would hope that we can eliminate on the interface. They are a leaky abstraction, I think that any code that would actually need these would be able to be avoided by implementing the QueryExecutor thingie which means that these methods don't need to exist on the interface anymore, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were needed for 2 reasons:

  • before this PR the DataSource class alredy had [get|with]Children methods ; since this changes the the Query to have multiple of them - it feeled like communicating it clearly is better - so that all existing places it
  • I had to also use this in recursivelyClearContext
  • generateSubqueryIds and insertSubqueryIds use these to set context variables

alternative options could be:

  • there is possibly another way to not have these methods: provide a visitor interface which could "visit" all queries / datasources optionally rewriting them
  • it might worth a try to service these differently from QueryDataSource (so that it does not appear on the main Query interface)

I've done the second approach from above; I'm not sure about the complexity that will cause in the long run but its not that bad right now...I had to add 3 instanceof-s:

  • 2 in QueryDataSource
  • 1 in a test related clearRecursivelyContext method

Comment on lines 22 to 31
/**
* Executes the query by utilizing the given walker.
*/
public interface QueryExecutor<T>
{
QueryRunner<T> makeQueryRunner(
Query<T> query,
QuerySegmentWalker walker
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment elsewhere as well, but what do you think about calling this QueryLogic and then the method you are creating is initialEntryPoint()?

Comment on lines 453 to 462

QueryExecutor<Object> subQueryExecutor = conglomerate.getQueryExecutor(subQuery);
final QueryRunner subQueryRunner;
if (subQueryExecutor != null) {
subQueryRunner = subQueryExecutor.makeQueryRunner(subQueryWithSerialization, this);
} else {
subQueryRunner = subQueryWithSerialization.getRunner(this);
}

queryResults = subQueryRunner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the wrong place to be checking for the Executor.

Why can't we do it up around line 386?

Comment on lines 114 to 115
throw DruidException.defensive("XXXOnly Table and Values are supported as inputs for Union [%s]", sources);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this is truly a "defensive" exception or if it is indicative of bad user input? Is there a way that a user can do something to cause this exception to get thrown?

If we don't expect it to be seen, then a message like "Got an input type [%s] that is not supported. This should not happen".

.thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());

QueryRunner<Object> unionRunner = toolChest.entryPoint(query, walker);
Sequence<Object> results = unionRunner.run(QueryPlus.wrap(query), null);

Check notice

Code scanning / CodeQL

Unread local variable Note test

Variable 'Sequence results' is never read.
.thenReturn((q, ctx) -> (Sequence) scan2.makeResultSequence());

QueryRunner<Object> unionRunner = toolChest.entryPoint(query, walker);
Sequence<Object> results = unionRunner.run(QueryPlus.wrap(query), null);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants