Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation support improvement #124

Open
Tracked by #817
dai-chen opened this issue Jun 11, 2021 · 1 comment
Open
Tracked by #817

Aggregation support improvement #124

dai-chen opened this issue Jun 11, 2021 · 1 comment
Labels
enhancement New feature or request PPL Piped processing language Priority-High SQL

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Jun 11, 2021

Is your feature request related to a problem? Please describe.
Currently there are limitations in aggregate query support in both SQL and PPL, including:

  1. Sort by aggregate function results: Composite Aggregation API only supports natural ordering of group fields. Reference. Issue: Select global top N records with grouping by multiple fields #121, ORDER BY doesn't work as expected on cardinality aggregations opendistro-for-elasticsearch/sql#378
  2. Sort null first or last independently: Composite Aggregation API doesn't have such option and only does ASC+NULL_FIRST or DESC+NULL_LAST. Reference.
  3. Only first 1000 buckets returned and users have no control. Issue: Aggregation doesn't honor default query limit size opendistro-for-elasticsearch/sql#1039

The use cases, especially #1, are very common in analytics. The limitation blocks the broad adoption of SQL/PPL.

Describe the solution you'd like
Either provide more complete aggregation or workarounds for different use cases specifically.

Describe alternatives you've considered
N/A

Additional context
Examples:

POST http://localhost:9200/_plugins/_sql/_explain
{
  "query": "SELECT state, AVG(balance) FROM accounts GROUP BY state"
}
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[state, AVG(balance)]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"state\":{\"terms\":{\"field\":\"state.keyword\",\"missing_bucket\":true,\"order\":\"asc\"}}}]},\"aggregations\":{\"AVG(balance)\":{\"avg\":{\"field\":\"balance\"}}}}}}, searchDone=false)"
        },
        "children": []
      }
    ]
  }
}
@dai-chen dai-chen added enhancement New feature or request Priority-High SQL PPL Piped processing language labels Jun 11, 2021
@dai-chen dai-chen self-assigned this Aug 4, 2021
@penghuo
Copy link
Collaborator

penghuo commented Jan 13, 2022

For 2. Submit feature request in OpenSearch. opensearch-project/OpenSearch#1899
Working on PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language Priority-High SQL
Projects
None yet
Development

No branches or pull requests

2 participants