Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make dimension column extensible with COMPLEX type #10277

Merged
merged 11 commits into from
Dec 3, 2020

Conversation

himanshug
Copy link
Contributor

@himanshug himanshug commented Aug 13, 2020

WIP tag is added because I am still in the process of testing it more thoroughly, However code in current patch is working in preliminary testing. I will also try and add some UTs for the changes. That said, code is totally reviewable.

Description

This patch adds support for adding COMPLEX type Dimension column via Extensions, similar support for Metric columns exist.

Main changes are to add String getTypeName() in ColumnCapabilities interface and to add DimensionHandlerUtils.registerDimensionHandlerProvider(String typeName, DimensionHandlerProvider) to enable extensions to register custom type specific DimensionHandler, very similar to existing ComplexMetrics.registerSerde(..)

After this patch, Users can have extensions with custom implementations of DimensionHandler, ComplexMetricSerde
and related interfaces to add COMPLEX type Dimension Column.
In addition to above, extension would also make ComplexMetrics.registerSerde(typeName,..) and DimensionHandlerUtils.registerDimensionHandlerProvider(typeName,..) to register custom type specific implementations.

Side-note: I am using this extensibility to add a Map<String, String> type Dimension column for various reasons which will later be added in a separate PR as contrib extension.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • ColumnCapabilities
  • DimensionHandlerUtils

@clintropolis
Copy link
Member

Cool 👍

What do you think about deprecating the ComplexMetric terminology and replacing it with ComplexType (e.g. ComplexMetricSerde -> ComplexTypeSerde, etc) since it isn't strictly associated to metrics? (I have been wanting to do this for a while, just haven't got to it).

@himanshug
Copy link
Contributor Author

@clintropolis thanks, yes they could all be renamed , deserialization is only based on column type , nothing to do with dimension vs metric. however those renaming would be backward incompatible so better done in an independent PR.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm 🤘

*
* If ValueType is COMPLEX, then the typeName associated with it.
*/
String getTypeName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be getComplexTypeName to tie it more strongly to complex types? It would also match the aggregator factory changes in #9638 which I've started working on again recently. If you recall this discussion #9638 (comment), I decided to go ahead and rename it to getComplexTypeName there because there are already breaking changes with the new abstract methods, so might as well make it intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

// Note: Things might be simpler if DimensionSchema had a method "getColumnCapabilities()" which could return
// type specific capabilities by itself. However, for various reasons, DimensionSchema currently lives in druid-core
// while ColumnCapabilities lives in druid-processing which makes that approach difficult.
ColumnCapabilitiesImpl capabilities = makeDefaultCapabilitiesFromValueType(type, dimSchema.getTypeName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the conflicts here, I made some semi disruptive changes to push ColumnCapabilities into the DimensionIndexer implementations so they can be more accurate. I think the changes should still be workable with your addition of DimensionHandlerProvider, just the dimension indexer it provides will need to provide the complex column capabilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be alright, I will update this PR later when I have my MapStringString extension ready/deployed and tested with changes in this patch. thanks for the heads up.

@clintropolis clintropolis mentioned this pull request Aug 18, 2020
5 tasks
Change-Id: I6aad1f268c9304082806e2efb5f8fe9ec03a1aa6
Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d
Change-Id: I4f900940120512b0e058bdbbf9cc0d3b17a1fea0
Change-Id: I687a2799c105cc4ec68bbbc51f613ee36075541a
Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e
@stale
Copy link

stale bot commented Nov 7, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Nov 7, 2020
@himanshug
Copy link
Contributor Author

not stale

@stale
Copy link

stale bot commented Nov 9, 2020

This issue is no longer marked as stale.

@stale stale bot removed the stale label Nov 9, 2020
@himanshug himanshug removed the WIP label Nov 30, 2020
@himanshug
Copy link
Contributor Author

this is ready to review/merge, no longer wip.

@@ -58,6 +58,12 @@ public ColumnBuilder setType(ValueType type)
return this;
}

public ColumnBuilder setTypeName(String typeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be setComplexTypeName probably to match capabilities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yeah updated, I missed it.

@himanshug himanshug merged commit 813e187 into apache:master Dec 3, 2020
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* make dimension column extensible with COMPLEX type

* more changes

Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d

* processing module changes for build fix

Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e

* rename ColumnCapabilities.getTypeName() to getComplexTypeName()

* rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants