Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add max length support in schema builder #6112

Merged
merged 5 commits into from
Oct 14, 2020
Merged

add max length support in schema builder #6112

merged 5 commits into from
Oct 14, 2020

Conversation

deemoliu
Copy link
Contributor

@deemoliu deemoliu commented Oct 6, 2020

Description

Add one more API to set maxLength in schemaBuilder. #6106

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

If you have tagged this as either backward-incompat or release-notes,
you MUST add text here that you would like to see appear in release notes of the
next release.

If you have a series of commits adding or enabling a feature, then
add this section only in final commit that marks the feature completed.
Refer to earlier release notes to see examples of text

Documentation

If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document

@codecov-io
Copy link

codecov-io commented Oct 6, 2020

Codecov Report

Merging #6112 into master will increase coverage by 6.42%.
The diff coverage is 59.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6112      +/-   ##
==========================================
+ Coverage   66.44%   72.87%   +6.42%     
==========================================
  Files        1075     1224     +149     
  Lines       54773    57796    +3023     
  Branches     8168     8524     +356     
==========================================
+ Hits        36396    42116    +5720     
+ Misses      15700    12917    -2783     
- Partials     2677     2763      +86     
Flag Coverage Δ
#integration 45.37% <48.75%> (?)
#unittests 64.01% <38.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/broker/broker/AllowAllAccessControlFactory.java 100.00% <ø> (ø)
.../helix/BrokerUserDefinedMessageHandlerFactory.java 52.83% <0.00%> (-13.84%) ⬇️
...ava/org/apache/pinot/client/AbstractResultSet.java 53.33% <0.00%> (-3.81%) ⬇️
.../main/java/org/apache/pinot/client/Connection.java 44.44% <0.00%> (-4.40%) ⬇️
.../org/apache/pinot/client/ResultTableResultSet.java 24.00% <0.00%> (-10.29%) ⬇️
...not/common/assignment/InstancePartitionsUtils.java 78.57% <ø> (+5.40%) ⬆️
.../apache/pinot/common/exception/QueryException.java 90.27% <ø> (+5.55%) ⬆️
...pinot/common/function/AggregationFunctionType.java 98.27% <ø> (-1.73%) ⬇️
.../pinot/common/function/DateTimePatternHandler.java 83.33% <ø> (ø)
...ot/common/function/FunctionDefinitionRegistry.java 88.88% <ø> (+44.44%) ⬆️
... and 973 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1ab421...0487101. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

* Add single value dimensionFieldSpec with maxLength and a defaultNullValue
*/
public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType, int maxLength,
Object defaultNullValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reformat this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks @Jackie-Jiang for code review.

* Add multi value dimensionFieldSpec with maxLength and a defaultNullValue
*/
public SchemaBuilder addMultiValueDimension(String dimensionName, DataType dataType, int maxLength,
Object defaultNullValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformat this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for code review.

fieldSpec1.setName("svDimension");
fieldSpec1.setDataType(BOOLEAN);
fieldSpec1.setDefaultNullValue(false);
fieldSpec1.setMaxLength(20000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do fixed length data types (BOOLEAN/int/float/etc) need maxLength? If not, then we should not allow setting max-length for those and only apply to variable length data types (String/Bytes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The maxLength field only applies to STRING field right now (not even BYTES)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mayankshriv @Jackie-Jiang for review. I will fix this test. Additional to that, do you suggest me to enforce configurable maxLength on String fields only?
Is it possible that Pinot will support maxLength on other variable length data types in the future?

In our use case, we only allow user to set maxLength on String columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang How about multi-valued fixed length data types? E.g. Does maxLength applied to multi-value long fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deemoliu The maxLength should only be applied to String field. The method names in Schema.java should reflect this as well. i.e., the method name should be addStringMetric().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenboat @Jackie-Jiang thanks for the code review. I added precondition in Schema.java for dimension columns. Regarding metrics column, does METRIC column support String column?

I think it should be supported according to https://docs.pinot.apache.org/configuration-reference/schema#metricfieldspecs

However in the schema.java it weren't been supported.
https://github.com/apache/incubator-pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java#L447

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the doc is not in sync with the code. The code is usually the source of truth. For this PR, I think the update to METRICS column can be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then I will remove the code related to "metrics with max length" (since maxlength ony applied to String and metrics columns cannot be string type) and only keep "dimension with max length".

/**
* Add single value dimensionFieldSpec with maxLength and a defaultNullValue
*/
public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType, int maxLength,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the maxLength should be applicable only to String type only as discussed earlier. You can add a Precondition Check here to make sure dataType is String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, added precondition check. thanks for review.

@Jackie-Jiang Jackie-Jiang merged commit 65be8bd into apache:master Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants