Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GROUP BY alias of the expression in the SELECT list. #5581

Closed

Conversation

javainthinking
Copy link
Contributor

Seems that group by alias will fail currently.

presto:default> select count(cname), cid as id from ss_dev.course_rt group by id;
Query 20160705_122534_00018_ej4np failed: line 1:63: Column '"id"' cannot be resolved
select count(cname), cid as id from ss_dev.course_rt group by id

@martint
Copy link
Contributor

martint commented Jul 6, 2016

That's the expected behavior. In ANSI SQL, that's not a valid query. The expressions in the select clause are "evaluated" after group by, window functions, etc. so their names are not in scope for those operations.

@javainthinking
Copy link
Contributor Author

@martint yes, we just migrate lots of application from oracle/mysql to presto (currently running to 0.144.1), found that many query failed due to this, so I changed this to fit for those legacy queries.

@electrum
Copy link
Contributor

electrum commented Jul 6, 2016

This feature is problematic. Per the SQL standard, the GROUP BY column names must be resolved based on the input relations. The additional resolution against output names would have to happen after input names. The consequence is that adding a column to a table can break existing queries. Consider the following:

CREATE TABLE t (a bigint);

SELECT a AS b, count(*)
FROM t
GROUP BY b;

ALTER TABLE t ADD COLUMN b bigint;

Re-running the query would now fail with an error: '"a"' must be an aggregate expression or appear in GROUP BY clause

I'm having trouble coming up with an example, but there might be other queries that change behavior and thus silently return incorrect answers.

@electrum
Copy link
Contributor

electrum commented Jul 6, 2016

Note that there's a standard, portable way to do this (without repeating a complex expression in the GROUP BY clause):

SELECT (a + b + c) AS x, (d * e) AS y, count(*)
FROM t
GROUP BY 1, 2;

I realize this doesn't help with applications that have existing queries, but might help for the future.

@electrum
Copy link
Contributor

electrum commented Jul 6, 2016

PostgreSQL supports this, but resolves names against the input relations first, which illustrates the problem I described above:

dphillips=# CREATE TABLE t (a bigint);
CREATE TABLE
dphillips=# SELECT a AS b, count(*) FROM t GROUP BY b;
 b | count 
---+-------
(0 rows)

dphillips=# ALTER TABLE t ADD COLUMN b bigint;
ALTER TABLE
dphillips=# SELECT a AS b, count(*) FROM t GROUP BY b;
ERROR:  column "t.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a AS b, count(*) FROM t GROUP BY b;
               ^

@martint
Copy link
Contributor

martint commented Jul 6, 2016

Actually, GROUP BY with column ordinals is not in the spec.

@javainthinking
Copy link
Contributor Author

Yeah, new column definition b with the same name of the alias will surely lead to the SQL failure, which is expected. This could be categorized to the kinds of "syntactic sugar", which oracle/postgre/mysql implemented in the past decades.
Even, alias within HAVING clause are used widely in many legacy SQL, such as
SELECT count(id) as cnt, col1
FROM table1
WHERE col2 = 2
GROUP BY col1
HAVING cnt > 2;
which is not supported currently,
we should change the SQL to:
SELECT count(id) as cnt, col1
FROM table1
WHERE col2 = 2
GROUP BY col1
HAVING count(id) > 2;

@ghost ghost added the CLA Signed label Jul 12, 2016
@ghost ghost added the CLA Signed label Aug 26, 2016
@stale
Copy link

stale bot commented Apr 3, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants