Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include statement attributes in EXPLAIN PLAN output #14074

Merged
merged 17 commits into from
Apr 17, 2023

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Apr 12, 2023

This PR adds attributes that contain metadata information about the query in the EXPLAIN PLAN output. The attributes currently contain two items:

  • statementKind: denotes the overall statement type -- possible values SELECT, INSERT, REPLACE
  • targetDataSource: provides the target datasource name for DML statements

A new method explainAttributes, is added to the SqlStatementHandler interface, and the Select, Insert, and Replace handlers implement the methods. The value is then obtained from the statement handler during query planning and set in the planner context.

For backward compatibility, these attributes are included as a third column to the output—updated docs to reflect the output changes. Also, it is added to both the legacy and native query plan outputs.

The EXPLAIN PLAN output for a SELECT query is in the README. Below is the output for an INSERT query:

[
  [
    {
      "query": {
        "queryType": "scan",
        "dataSource": {
          "type": "external",
          "inputSource": {
            "type": "s3",
            "uris": [
              "obfuscated-uri"
            ]
          },
          "inputFormat": {
            "type": "json",
            "keepNullColumns": false,
            "assumeNewlineDelimited": false,
            "useJsonNodeReader": false
          },
          "signature": [
            {
              "name": "time2",
              "type": "STRING"
            },
            {
              "name": "channel",
              "type": "STRING"
            },
            {
              "name": "countryName",
              "type": "STRING"
            }
          ]
        },
        "intervals": {
          "type": "intervals",
          "intervals": [
            "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
          ]
        },
        "virtualColumns": [
          {
            "type": "expression",
            "name": "v0",
            "expression": "timestamp_parse(\"time2\",null,'UTC')",
            "outputType": "LONG"
          },
          {
            "type": "expression",
            "name": "v1",
            "expression": "'Japan'",
            "outputType": "STRING"
          }
        ],
        "resultFormat": "compactedList",
        "orderBy": [
          {
            "columnName": "v1",
            "order": "ascending"
          },
          {
            "columnName": "channel",
            "order": "ascending"
          }
        ],
        "filter": {
          "type": "selector",
          "dimension": "countryName",
          "value": "Japan"
        },
        "columns": [
          "channel",
          "time2",
          "v0",
          "v1"
        ],
        "legacy": false,
        "context": {
          "finalizeAggregations": true,
          "groupByEnableMultiValueUnnesting": true,
          "maxNumTasks": 2,
          "queryId": "27058ca9-47be-4d7d-a4c6-580403209801",
          "scanSignature": "[{\"name\":\"channel\",\"type\":\"STRING\"},{\"name\":\"time2\",\"type\":\"STRING\"},{\"name\":\"v0\",\"type\":\"LONG\"},{\"name\":\"v1\",\"type\":\"STRING\"}]",
          "sqlInsertSegmentGranularity": "\"DAY\"",
          "sqlQueryId": "27058ca9-47be-4d7d-a4c6-580403209801",
          "useNativeQueryExplain": true
        },
        "granularity": {
          "type": "all"
        }
      },
      "signature": [
        {
          "name": "v0",
          "type": "LONG"
        },
        {
          "name": "time2",
          "type": "STRING"
        },
        {
          "name": "channel",
          "type": "STRING"
        },
        {
          "name": "v1",
          "type": "STRING"
        }
      ]
    }
  ],
  [
    {
      "name": "my_ds",
      "type": "DATASOURCE"
    },
    {
      "name": "EXTERNAL",
      "type": "EXTERNAL"
    }
  ],
  {
    "statementKind": "INSERT",
    "targetDataSource": "my_ds"
  }
]

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

// Use testQuery for EXPLAIN (not testIngestionQuery).
testQuery(
PlannerConfig.builder().useNativeQueryExplain(false).build(),
PLANNER_CONFIG_LEGACY_QUERY_EXPLAIN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@zachjsh
Copy link
Contributor

zachjsh commented Apr 12, 2023

Seems like the explain plan output would look something like this:

[
  [
    {
      "query": {
        "queryType": "scan",
        "dataSource": {
          "type": "table",
          "name": "foo"
        },
        "intervals": {
          "type": "intervals",
          "intervals": [
            "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
          ]
        },
        "resultFormat": "compactedList",
        "columns": [
          "dim1"
        ],
        "legacy": false,
        "context": {
          "defaultTimeout": 300000,
          "maxScatterGatherBytes": 9223372036854776000,
          "sqlCurrentTimestamp": "2000-01-01T00:00:00Z",
          "sqlQueryId": "dummy",
          "vectorize": "false",
          "vectorizeVirtualColumns": "false"
        },
        "granularity": {
          "type": "all"
        }
      },
      "signature": [
        {
          "name": "dim1",
          "type": "STRING"
        }
      ]
    },
    {
      "query": {
        "queryType": "scan",
        "dataSource": {
          "type": "table",
          "name": "foo"
        },
        "intervals": {
          "type": "intervals",
          "intervals": [
            "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
          ]
        },
        "resultFormat": "compactedList",
        "filter": {
          "type": "selector",
          "dimension": "dim1",
          "value": "42"
        },
        "columns": [
          "dim1"
        ],
        "legacy": false,
        "context": {
          "defaultTimeout": 300000,
          "maxScatterGatherBytes": 9223372036854776000,
          "sqlCurrentTimestamp": "2000-01-01T00:00:00Z",
          "sqlQueryId": "dummy",
          "vectorize": "false",
          "vectorizeVirtualColumns": "false"
        },
        "granularity": {
          "type": "all"
        }
      },
      "signature": [
        {
          "name": "dim1",
          "type": "STRING"
        }
      ]
    },
    {
      "query": {
        "queryType": "scan",
        "dataSource": {
          "type": "table",
          "name": "foo"
        },
        "intervals": {
          "type": "intervals",
          "intervals": [
            "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
          ]
        },
        "resultFormat": "compactedList",
        "filter": {
          "type": "selector",
          "dimension": "dim1",
          "value": "44"
        },
        "columns": [
          "dim1"
        ],
        "legacy": false,
        "context": {
          "defaultTimeout": 300000,
          "maxScatterGatherBytes": 9223372036854776000,
          "sqlCurrentTimestamp": "2000-01-01T00:00:00Z",
          "sqlQueryId": "dummy",
          "vectorize": "false",
          "vectorizeVirtualColumns": "false"
        },
        "granularity": {
          "type": "all"
        }
      },
      "signature": [
        {
          "name": "dim1",
          "type": "STRING"
        }
      ]
    },
    {
      "statementKind": "SELECT"
    },
    {
      "targetDataSource": "null"
    }
  ],
  [
    {
      "name": "foo",
      "type": "DATASOURCE"
    }
  ]
]

I'm wondering if the targetDataSource and statementKind, should both go into some common metadata section, instead of being in separate objects by themselves within the returned plan

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly reviewed the API design; I just skimmed the code. For the API, a few comments:

  • For best backwards compat with existing clients, we should add the new info as a third column in the EXPLAIN PLAN result rather than adding it to the queries array.
  • We may want to add more statement-related attributes to EXPLAIN PLAN in the future (for example: partitioning for INSERT / REPLACE, or time chunks for REPLACE). So I suggest wrapping all of these into a model object that we can add more fields to over time.
  • This is a change to a user-facing API, so it needs a documentation update. The relevant docs are querying/sql-translation.md and querying/sql.md.
  • There's also some doc debt on querying/sql.md, as it doesn't mention the new EXPLAIN PLAN format, and has a warning box that seems only relevant to the legacy format. If you fix this in this PR, great, if not please raise a follow-up issue and tag it with Documentation.

@gianm gianm added the Needs web console change Backend API changes that would benefit from frontend support in the web console label Apr 13, 2023
@abhishekrb19 abhishekrb19 changed the title Native EXPLAIN PLAN enhancements Include statement attributes in EXPLAIN PLAN output Apr 14, 2023
@abhishekrb19 abhishekrb19 requested a review from gianm April 14, 2023 22:43
@abhishekrb19
Copy link
Contributor Author

@gianm @zachjsh thanks for the reviews. Can you please take another look when you get a chance?

Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left some minor comments.

@kfaraz kfaraz merged commit c98c665 into apache:master Apr 17, 2023
kfaraz pushed a commit that referenced this pull request Jun 12, 2023
This PR adds the following to the ATTRIBUTES column in the explain plan output:
- partitionedBy
- clusteredBy
- replaceTimeChunks

This PR leverages the work done in #14074, which added a new column ATTRIBUTES
to encapsulate all the statement-related attributes.
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
@vogievetsky vogievetsky removed the Needs web console change Backend API changes that would benefit from frontend support in the web console label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants