-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage Json index in the group-by clause #11494
Comments
The current implementation of json index cannot be used within the group-by clause (or select clause) because it doesn't support extracting values from it, but only support checking if the key-value pair exist. |
@Jackie-Jiang Today's json index stores the list of mapping from |
Think of json index as an inverted index. It is extremely expensive to use inverted index as forward index because each value read requires scanning the whole inverted index. |
Today Pinot supports group-by json-extract-scalar already but in the worst possible by parsing every possible json string. It caused massively oom and timeout even with 1 query. Using the json index would be a huge improvement over that. Pinot json index today already collects and sort json path+value. Group by json path (e.g. $a.b) can take average of the sorted dictionary to quickly locate the portion of the dictionary ids. I think doing this will provide great speed up compared with what we have today. We are working on a proof of concept and will share the test results soon. |
Today Pinot's Json index works in the Filter-By clause (doc).
When a group-by clause involves an indexed Json path, it is NOT clear if/how we can leverage Pinot json index support. The closest we can utilize syntax-wise is JSON_EXTRACT_SCALAR(json_data, '$.country', 'STRING', 'null'). But apparently it does not use Json index.
E.g.
select count(*)
from table
group by JSON_EXTRACT_SCALAR(json_data, '$.country', 'STRING', 'null')
The text was updated successfully, but these errors were encountered: