-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revert ColumnAnalysis type, add typeSignature and use it for DruidSchema #11895
revert ColumnAnalysis type, add typeSignature and use it for DruidSchema #11895
Conversation
Dimension columns will have type `STRING`, `FLOAT`, `DOUBLE`, or `LONG`. | ||
Metric columns will have type `FLOAT`, `DOUBLE`, or `LONG`, or the name of the underlying complex type such as `hyperUnique` in case of COMPLEX metric. | ||
Timestamp column will have type `LONG`. | ||
All columns will contain a `typeSignature` which is the Druid internal representation of the type information for this column, is what is show in [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql.md#columns-table) table in SQL, and is typically the value used to supply Druid with JSON type information at query or ingest time. This value will be `STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>` (e.g. `COMPLEX<hyperUnique>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show -> shown
Timestamp column will have type `LONG`. | ||
All columns will contain a `typeSignature` which is the Druid internal representation of the type information for this column, is what is show in [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql.md#columns-table) table in SQL, and is typically the value used to supply Druid with JSON type information at query or ingest time. This value will be `STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>` (e.g. `COMPLEX<hyperUnique>`). | ||
|
||
Additionally, columns will have a legacy friendly `type` name. This might match `typeSignature` for some column types (`STRING`, `FLOAT`, `DOUBLE`, or `LONG`) but for COMPLEX columns will only contain the name of the underlying complex type such as `hyperUnique`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding:
New applications should use
typeSignature
, nottype
.
@@ -73,6 +78,12 @@ public String getType() | |||
return type; | |||
} | |||
|
|||
@JsonProperty | |||
public ColumnType getTypeSignature() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider placing this getter above getType
, which I think will put it earlier in the JSON and make it seem more "official".
@@ -181,6 +205,7 @@ public String toString() | |||
{ | |||
return "ColumnAnalysis{" + | |||
"type='" + type + '\'' + | |||
", columnType=" + typeSignature + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call this typeSignature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, 👍
); | ||
} | ||
|
||
if (!typeSignature.equals(rhs.getTypeSignature())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll NPE if typeSignature
is null — i.e. if it came from a server that is on an older version. I think this is OK, since people are supposed to update Brokers after segment-serving-servers, but I'd still feel better if this code was resilient to null typeSignature
.
Which reminds me: I think we'll also need a trivial change in the cache key to make sure SegmentAnalysis objects from shared caches are not re-used.
Dimension columns will have type `STRING`, `FLOAT`, `DOUBLE`, or `LONG`. | ||
Metric columns will have type `FLOAT`, `DOUBLE`, or `LONG`, or the name of the underlying complex type such as `hyperUnique` in case of COMPLEX metric. | ||
Timestamp column will have type `LONG`. | ||
All columns will contain a `typeSignature` which is the Druid internal representation of the type information for this column, is what is show in [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql.md#columns-table) table in SQL, and is typically the value used to supply Druid with JSON type information at query or ingest time. This value will be `STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>` (e.g. `COMPLEX<hyperUnique>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All columns will contain a `typeSignature` which is the Druid internal representation of the type information for this column, is what is show in [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql.md#columns-table) table in SQL, and is typically the value used to supply Druid with JSON type information at query or ingest time. This value will be `STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>` (e.g. `COMPLEX<hyperUnique>`). | |
All columns contain a `typeSignature` that Druid uses to represent the column type information internally. The `typeSignature` is typically the same value used to identify the JSON type information at query or ingest time. One of: STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>`, e.g. `COMPLEX<hyperUnique>`. |
|
||
Additionally, columns will have a legacy friendly `type` name. This might match `typeSignature` for some column types (`STRING`, `FLOAT`, `DOUBLE`, or `LONG`) but for COMPLEX columns will only contain the name of the underlying complex type such as `hyperUnique`. | ||
|
||
The timestamp column will always have `typeSignature` and `type` as `LONG`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp column will always have `typeSignature` and `type` as `LONG`. | |
The `typeSignature` and `type` for the timestamp column is always `LONG`. |
Timestamp column will have type `LONG`. | ||
All columns will contain a `typeSignature` which is the Druid internal representation of the type information for this column, is what is show in [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql.md#columns-table) table in SQL, and is typically the value used to supply Druid with JSON type information at query or ingest time. This value will be `STRING`, `FLOAT`, `DOUBLE`, `LONG`, or `COMPLEX<typeName>` (e.g. `COMPLEX<hyperUnique>`). | ||
|
||
Additionally, columns will have a legacy friendly `type` name. This might match `typeSignature` for some column types (`STRING`, `FLOAT`, `DOUBLE`, or `LONG`) but for COMPLEX columns will only contain the name of the underlying complex type such as `hyperUnique`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, columns will have a legacy friendly `type` name. This might match `typeSignature` for some column types (`STRING`, `FLOAT`, `DOUBLE`, or `LONG`) but for COMPLEX columns will only contain the name of the underlying complex type such as `hyperUnique`. | |
Druid retains the legacy friendly `type` name. For some column types, the value may match the `typeSignature` (`STRING`, `FLOAT`, `DOUBLE`, or `LONG`). For `COMPLEX` columns, the `type` only contains the name of the underlying complex type such as `hyperUnique`. | |
New applications should use `typeSignature`, not `type`. |
This pull request introduces 2 alerts when merging bff926e into a5bd0b8 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging be1c2ba into a5bd0b8 - view on LGTM.com new alerts:
|
@@ -47,6 +50,7 @@ public static ColumnAnalysis error(String reason) | |||
|
|||
@JsonCreator | |||
public ColumnAnalysis( | |||
@JsonProperty("typeSignature") ColumnType typeSignature, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this change cause a problem during upgrade? E.g. an historical is on older version while broker is on newer version so the broker gets old json from the historical when running segment metadata queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the supported upgrade order is brokers last, but it would just be null, which shouldn't be a problem i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/druid/pull/11895/files#diff-5aa1494dc5a13e82654e42ce8cdb3cb628c95233f8828312b3ede40dac1b8544R156 - shall this be modified to pick the one if the other is null instead of throwing an error. I am throwing darts in the dark here so please ignore if this is not really a problem :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, if we supported updating the broker before data servers then we might want to consider that, but we don't (so that new query types, filters, aggregators, etc cannot be issued before data servers can understand them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes. you are right. thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit. Otherwise LGTM.
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This pull request introduces 2 alerts when merging 9055810 into 6c196a5 - view on LGTM.com new alerts:
|
Description
#11713 changed the segment metadata query column analysis
type
field, but this might not be very friendly to actual users (e.g. notDruidSchema
) who are calling this query directly because, merging query results of mixed versions of Druid would result in an error due to mismatched types.This PR reverts
type
to its original behavior, and adds a newtypeSignature
which is used byDruidSchema
. Error messaging when mergingColumnAnalysis
has also been improved to include the mismatched types involved in the merge.Key changed/added classes in this PR
ColumnAnalysis
SegmentAnalyzer
This PR has: