Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic classes for Query data model #557

Merged
merged 60 commits into from
Nov 20, 2019
Merged

Conversation

fpacifici
Copy link
Contributor

@fpacifici fpacifici commented Oct 31, 2019

The first classes needed to provide the Snuba query class with a proper AST.
This PR represents the Query as a tree. The root is the query. Nodes can be containers
(like selected columns), expressions (simple or nested) or conditions (simple or nested).

The basic idea is to provide three ways to manipulate the Query:

  • One is a simple iterator to go through all the nodes of a given type (like expressions). This is meant to do whatever we do today with get_all_referenced_columns
  • The second is mapping. We would run the mapping function over all the nodes of a given type to replace them (think about replacing columns)
  • [This is not implemented yet] Directly navigate the tree from the root (helped with a find() method to find patterns) to provide additional flexibility.

Limitations (will come in followups - see the size of this PR):

  • Parsing logic is not there yet
  • The top level query navigation methods are not implemented yet
  • formatting logic is not implemented yet

Some notes inline, but as an advice for review:

  • start from the query.py file. Then move to nodes.py, then expressions.py
  • refer to the tests for examples on how to manipulate the expressions.

snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
@fpacifici fpacifici requested a review from a team October 31, 2019 23:09
snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/conditions.py Outdated Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
@fpacifici fpacifici marked this pull request as ready for review November 6, 2019 01:05
@fpacifici fpacifici changed the title (DRAFT) Basic classes for Query data model Basic classes for Query data model Nov 6, 2019
Copy link
Contributor

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had a chance to really digest a lot of this, but here are some initial scattershot thoughts.

The biggest question/confusion I have is around why the distinction between expression and condition is necessary (and I suppose aggregation falls into this category as well.)

snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/collections.py Outdated Show resolved Hide resolved
snuba/query/conditions.py Outdated Show resolved Hide resolved
snuba/query/conditions.py Outdated Show resolved Hide resolved
snuba/query/nodes.py Outdated Show resolved Hide resolved
snuba/query/conditions.py Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/nodes.py Outdated Show resolved Hide resolved
@fpacifici
Copy link
Contributor Author

fpacifici commented Nov 15, 2019

Updates:

  • now expressions are immutable, so transform works like a map method
  • all expressions are now iterable. Which removes the NodeContainer
  • use itertools.chain instead of CompositeNodeContainer
  • add a transform method to Query and the relative test.
  • OrderBY is not an expression anymore

chain(*map(lambda orderby: orderby.node, self.__order_by)),
])

def transform_expression(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See test_query_ast.py for a concrete example.

snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
@@ -62,6 +96,58 @@ def __init__(self, body: MutableMapping[str, Any], data_source: RelationalSource
self.__data_source = data_source
self.__prewhere_conditions: Sequence[Condition] = []

# New data model
# TODO: Provide a better typing for this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still accurate? If it is, what specifically is lacking here at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, will remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok — one more change (sorry): this line was never removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget about this.

Copy link
Contributor Author

@fpacifici fpacifici Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird I was sure I removed it. I guess it must have come back during the merge for black.

snuba/query/expressions.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/expressions.py Outdated Show resolved Hide resolved
tests/query/test_query_ast.py Outdated Show resolved Hide resolved
@fpacifici
Copy link
Contributor Author

Updates:

  • Aggregations pissed me off enough to make it type check properly that I just removed it. Introducing a structured clickhosue query should help in testing the old and the new formatted results are compatible
  • Support multi parameter group function class ( f(X)(Y) )
  • Add more getters for the fields in the Query (now used only for assertions in tests)
  • remove format methods
  • some type fixes

parameters_group1: Sequence[Expression]
# This is for expressions like f(x)(y).
# None means there is no second gorup of parameters
parameters_group2: Optional[Sequence[Expression]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to instead structure this as:

function: Union[str, FunctionCall]
parameters: Sequence[Expression]

… or is this too permissive, since this could lead to calls like f(g(x)(y))(z)? (Is that even a valid construct? I haven't tested it yet, so I honestly have no idea.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue with this is not the example you gave. That one is valid, but it was possible before as well by having one parameter of the first group of f, being a FunctionCall.
The issue is that I could express this:

f(x)(y)(z)(y)

My functional programming enthusiast self speaks now:
My interpretation of this syntax in Clickhouse is that that two sets of parameters are just useless syntactic sugar. But now that you mention it, this may actually a proper currying implementation, in which case it would make sense to consider the external call as an application of a function to the function returned by the application of the internal one.
z = f(x)(y) = f(x) returns g(k) -> z = g(y).
Which means f(x)(y)(z)(w) should be allowed.
My functional programming enthusiast self stops speaking now

In practice my concern with expressing currying properly is that, while it is extremely powerful to evaluate expressions, it may make query processing harder.
I think it is worth giving it a shot anyway, I may just split the FunctionCall implementation, instead of using the Union, this will make it clearer for query processors.

# I need to think over naming better. But this is the idea.
class FunctionCall(ABC):

class BasicFunctionCall(FunctionCall):
    name: str
    params[...]

class CurriedFunctionCall(FunctionCall):
    internal: FunctionCall
     params[...]

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue with this is not the example you gave. That one is valid, but it was possible before as well by having one parameter of the first group of f, being a FunctionCall.

Oops — yeah, that was a bad example, but based on the remainder of your comment I think you ended up on the same train of thought that I was.

But now that you mention it, this may actually a proper currying implementation, in which case it would make sense to consider the external call as an application of a function to the function returned by the application of the internal one.

Yeah, this is how I've thought always of it — whether or not that's an accurate reflection of their intent, I'm not sure one way or another.

In practice my concern with expressing currying properly is that, while it is extremely powerful to evaluate expressions, it may make query processing harder.

Do you have any examples why? Nothing sticks out to me immediately but I haven't put much thought into this.

I think it is worth giving it a shot anyway, I may just split the FunctionCall implementation, instead of using the Union, this will make it clearer for query processors.

# I need to think over naming better. But this is the idea.
class FunctionCall(ABC):

class BasicFunctionCall(FunctionCall):
    name: str
    params[...]

class CurriedFunctionCall(FunctionCall):
    internal: FunctionCall
     params[...]

What do you think?

I think this makes sense in general.

I don't think that we need a shared supertype for these — I'd consider just having FunctionCall(Expression) and CurriedFunctionCall(Expression) with the same attributes you described above. I think that not allowing infinitely recursive currying would be an advantage, since it's unclear whether or not ClickHouse allows that (and I don't think we have a use-case for it right now.)

snuba/query/expressions.py Outdated Show resolved Hide resolved
tests/query/test_query_ast.py Show resolved Hide resolved
Comment on lines 53 to 57
def replace_node(self, new_node: Expression) -> OrderBy:
"""
Returns a new OrderBy clause with a new node.
"""
return OrderBy(self.direction, new_node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can callers use dataclass.replace instead of this function, or does that have typing implications? Looking at the signature of replace I could see how some type information would be lost (but I haven't tested this.)

I general I think actually exposing a replace method here (and on Expression) that binds the instance argument of replace to self would be the nicest API from a caller's perspective. I suspect that this pattern will be used a lot in transform functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I relied on dataclasses.replace because, if I defined the replace method in the classes themselves, I did not manage to convince mypy that the types made sense.

snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
snuba/query/query.py Outdated Show resolved Hide resolved
@fpacifici
Copy link
Contributor Author

Changes:

  • mostly cosmetics
  • CurriedFunctions have their own class.

Copy link
Contributor

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo and one question/possible change, otherwise this looks good to me. Thanks for working so hard on this!

"""
transformed = replace(
self,
internal=func(self.internal_function),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal=func(self.internal_function),
internal_function=func(self.internal_function),

Comment on lines 132 to 139
def transform_expressions(
self,
func: Callable[[Expression], Expression],
) -> None:
"""
Transforms in place the current query object by applying a transformation
function to all expressions contained in this query
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return a new Query to mirror the semantics of transform?

Copy link
Contributor Author

@fpacifici fpacifici Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish , unfortunately we cannot do that just yet:
There are several places during the query execution that rely on Request being a sealed dataclass with a query attribute, thus We cannot swap the Query as of now, since Request is immutable.
This is what drove the QueryProcessor to make changes to the query object in place.

We need first to remove all dependencies on a mutable Query, then we can make Query immutable itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It'd probably a good thing to note in a comment that this was a design decision made as a side effect of another design decision (so we don't have to remember why it was written this way if/when we can change it later.)

@@ -62,6 +96,58 @@ def __init__(self, body: MutableMapping[str, Any], data_source: RelationalSource
self.__data_source = data_source
self.__prewhere_conditions: Sequence[Condition] = []

# New data model
# TODO: Provide a better typing for this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget about this.

@fpacifici fpacifici merged commit 93a4551 into master Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants