0.14.0-incubating release notes #7126

jon-wei · 2019-02-22T03:09:51Z

Apache Druid 0.14.0-incubating contains over 200 new features, performance/stability/documentation improvements, and bug fixes from 54 contributors. Major new features and improvements include:

New web console
Kinesis indexing service
Decommissioning mode for Historicals
Published segment cache in Broker
Bloom filter aggregator and expression
Updated Parquet extension
Force push down option for nested GroupBy queries
Better segment handoff and drop rule handling
Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed
DogStatsD tag support for statsd emitter
New API for retrieving all lookup specs
New compaction options
More efficient cachingCost segment balancing strategy

The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Amerged+milestone%3A0.14.0

Documentation for this release is at: http://druid.io/docs/0.14.0-incubating/

Highlights

New web console

Druid has a new web console that provides functionality that was previously split between the coordinator and overlord consoles.

The new console allows the user to manage datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console.

For more details, please see http://druid.io/docs/0.14.0-incubating/operations/management-uis.html

Added by @vogievetsky in #6923.

Kinesis indexing service

Druid now supports ingestion from Kinesis streams, provided by the new druid-kinesis-indexing-service core extension.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/kinesis-ingestion.html for details.

Added by @jsun98 in #6431.

Decommissioning mode for Historicals

Historical processes can now be put into a "decommissioning" mode, where the coordinator will no longer consider the Historical process as a target for segment replication. The coordinator will also move segments off the decommissioning Historical.

This is controlled via Coordinator dynamic configuration. For more details, please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#dynamic-configuration.

Added by @egor-ryashin in #6349.

Published segment cache on Broker

The Druid Broker now has the ability to maintain a cache of published segments via polling the Coordinator, which can significantly improve response time for metadata queries on the sys.segments system table.

Please see http://druid.io/docs/0.14.0-incubating/querying/sql.html#retrieving-metadata for details.

Added by @surekhasaharan in #6901

Bloom filter aggregator and expression

A new aggregator for constructing Bloom filters at query time and support for performing Bloom filter checks within Druid expressions have been added to the druid-bloom-filter extension.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/bloom-filter.html

Added by @clintropolis in #6904 and #6397

Updated Parquet extension

druid-extensions-parquet has been moved into the core extension set from the contrib extensions and now supports flattening and int96 values.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.

Added by @clintropolis in #6360

Force push down option for nested GroupBy queries

Outer query execution for nested GroupBy queries can now be pushed down to Historical processes; previously, the outer queries would always be executed on the Broker.

Please see #5471 for details.

Added by @samarthjain in #5471.

Better segment handoff and retention rule handling

Segment handoff will now ignore segments that would be dropped by a datasource's retention rules, avoiding ingestion failures caused by issue #5868.

Period load rules will now include the future by default.

A new "Period Drop Before" rule has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/rule-configuration.html#period-drop-before-rule for details.

Added by @QiuMM in #6676, #6414, and #6415.

Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed

Druid will now automatically terminate MapReduce jobs created by Hadoop batch ingestion tasks when the ingestion task is killed.

Added by @ankit0811 in #6828.

DogStatsD tag support for statsd-emitter

The statsd-emitter extension now supports DogStatsD-style tags. Please see http://druid.io/docs/0.14.0-incubating/development/extensions-contrib/statsd.html

Added by @deiwin in #6605, with support for constant tags added by @glasser in #6791.

New API for retrieving all lookup specs

A new API for retrieving all lookup specs for all tiers has been added. Please see http://druid.io/docs/0.14.0-incubating/querying/lookups.html#get-all-lookups for details.

Added by @jihoonson in #7025.

New compaction options

Auto-compaction now supports the maxRowsPerSegment option. Please see http://druid.io/docs/0.14.0-incubating/design/coordinator.html#compacting-segments for details.

The compaction task now supports a new segmentGranularity option, deprecating the older keepSegmentGranularity option for controlling the segment granularity of compacted segments. Please see the segmentGranularity table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.

Added by @jihoonson in #6758 and #6780.

More efficient cachingCost segment balancing strategy

The cachingCost Coordinator segment balancing strategy will now only consider Historical processes for balancing decisions. Previously the strategy would unnecessarily consider active worker tasks as well, which are not targets for segment replication.

Added by @QiuMM in #6879.

New metrics:

New allocation rate metric jvm/heapAlloc/bytes, added by @egor-ryashin in Added an allocation rate metric #6604 #6710.
New query count metric query/count, added by @QiuMM in QueryCountStatsMonitor: emit query/count #6473.
SQL query metrics sqlQuery/bytes and sqlQuery/time, added by @gaodayue in Add SQL id, request logs, and metrics #6302.
Kafka ingestion lag metrics ingest/kafka/maxLag and ingest/kafka/avgLag, added by @QiuMM in emit maxLag/avgLag in KafkaSupervisor #6587
Task count metrics task/success/count, task/failed/count, task/running/count, task/pending/count, task/waiting/count, added by @QiuMM in Add TaskCountStatsMonitor to monitor task count stats #6657

New interfaces for extension developers

RequestLogEvent

It is now possible to control the fields in RequestLogEvent, emitted by EmittingRequestLogger. Please see #6477 for details. Added by @leventov.

Custom TLS certificate checks

An extension point for custom TLS certificate checks has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/tls-support.html#custom-tls-certificate-checks for details. Added by @jon-wei in #6432.

Kafka Indexing Service no longer experimental

The Kafka Indexing Service extension has been moved out of experimental status.

SQL Enhancements

Enhancements to dsql

The dsql command line client now supports CLI history, basic autocomplete, and specifying query timeouts in the query context.

Added in #6929 by @gianm.

Add SQL id, request logs, and metrics

SQL queries now have an ID, and native queries executed as part of a SQL query will have the associated SQL query ID in the native query's request logs. SQL queries will now be logged in the request logs.

Two new metrics, sqlQuery/time and sqlQuery/bytes, are now emitted for SQL queries.

Please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#request-logging and http://druid.io/docs/0.14.0-incubating/querying/sql.html#sql-metrics for details.

Added by @gaodayue in #6302

More SQL aggregator support

The follow aggregators are now supported in SQL:

DataSketches HLL sketch
DataSketches Theta sketch
DataSketches quantiles sketch
Fixed bins histogram
Bloom filter aggregator

Added by @jon-wei in #6951 and @clintropolis in #6502

Other SQL enhancements

SQL: Add support for queries with project-after-semijoin. SQL: Add support for queries with project-after-semijoin. #6756
SQL: Support for selecting multi-value dimensions. SQL: Support for selecting multi-value dimensions. #6462
SQL: Support AVG on system tables. Offheap incremental index #601
SQL: Add "POSITION" function. SQL: Add "POSITION" function. #6596
SQL: Set INFORMATION_SCHEMA catalog name to "druid". SQL: Set INFORMATION_SCHEMA catalog name to "druid". #6595
SQL: Fix ordering of sort, sortProject in DruidSemiJoin. SQL: Fix ordering of sort, sortProject in DruidSemiJoin. #6769

Added by @gianm.

Updating from 0.13.0-incubating and earlier

Kafka ingestion downtime when upgrading

Due to the issue described in #6958, existing Kafka indexing tasks can be terminated unnecessarily during a rolling upgrade of the Overlord. The terminated tasks will be restarted by the Overlord and will function correctly after the initial restart.

Parquet extension changes

The druid-parquet-extensions extension has been moved from contrib to core. When deploying 0.14.0-incubating, please ensure that your extensions-contrib directory does not have any older versions of the Parquet extension.

Additionally, there are now two styles of Parquet parsers in the extension:

parquet-avro: Converts Parquet to Avro, and then parses the Avro representation. This was the existing parser prior to 0.14.0-incubating.
parquet: A new parser that parses the Parquet format directly. Only this new parser supports int96 values.

Prior to 0.14.0-incubating, a specifying a parquet type parser would have a task use the Avro-converting parser. In 0.14.0-incubating, to continue using the Avro-converting parser, you will need to update your ingestion specs to use parquet-avro instead.

The inputFormat field in the inputSpec for tasks using Parquet input must also match the choice of parser:

parquet: org.apache.druid.data.input.parquet.DruidParquetInputFormat
parquet-avro: org.apache.druid.data.input.parquet.DruidParquetInputFormat

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.

Running Druid with non-2.8.3 Hadoop

If you plan to use Druid 0.14.0-incubating with Hadoop versions other than 2.8.3, you may need to do the following:

Set the Hadoop dependency coordinates to your target version as described in http://druid.io/docs/0.14.0-incubating/operations/other-hadoop.html under Tip #3: Use specific versions of Hadoop libraries.
Rebuild Druid with your target version of Hadoop by changing hadoop.compile.version in the main Druid pom.xml and then following the standard build instructions.

Other Behavior changes

Old task cleanup

Old task entries in the metadata storage will now be cleaned up automatically together with their task logs. Please see http:/druid.io/docs/0.14.0-incubating/development/extensions-core/configuration/index.html#task-logging and #6592 for details.

Automatic processing buffer sizing

The druid.processing.buffer.sizeBytes property has new default behavior if it is not set. Druid will now automatically choose a value for the processing buffer size using the following formula:

processingBufferSize = totalDirectMemory / (numMergeBuffers + numProcessingThreads + 1)
processingBufferSize = min(processingBufferSize, 1GB)

Where:

totalDirectMemory: The direct memory limit for the JVM specified by -XX:MaxDirectMemorySize
numMergeBuffers: The value of druid.processing.numMergeBuffers.
numProcessingThreads: The value of druid.processing.numThreads.

At most, Druid will use 1GB for the automatically chosen processing buffer size. The processing buffer size can still be specified manually.

Please see #6588 for details.

Retention rules now include the future by default

Please be aware that new retention rules will now include the future by default. Please see #6414 for details.

Property changes

Segment announcing

The druid.announcer.type property used for choosing between Zookeeper or HTTP-based segment management/discovery has been moved to druid.serverview.type. If you were using http prior to 0.14.0-incubating, you will need to update your configs to use the new druid.serverview.type.

Please see the following for details:

fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory

The druid.peon.defaultSegmentWriteOutMediumFactory.@type property has been fixed. The property is now druid.peon.defaultSegmentWriteOutMediumFactory.type without the "@".

Please see #6656 for details.

Deprecations

Approximate Histogram aggregator

The ApproximateHistogram aggregator has been deprecated; it is a distribution-dependent algorithm without formal error bounds and has significant accuracy issues.

The DataSketches quantiles aggregator should be used instead for quantile and histogram use cases.

Please see Histogram and Quantiles Aggregators

Cardinality/HyperUnique aggregator

The Cardinality and HyperUnique aggregators have been deprecated in favor of the DataSketches HLL aggregator and Theta Sketch aggregator. These aggregators have better accuracy and performance characteristics.

Please see Count Distinct Aggregators for details.

Query Chunk Period

The chunkPeriod query context configuration is now deprecated, along with the associated query/intervalChunk/time metric. Please see #6591 for details.

`keepSegmentGranularity` for Compaction

The keepSegmentGranularity option for compaction tasks has been deprecated. Please see #6758 and the segmentGranularity table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.

Interface changes for extension developers

`SegmentId` class

Druid now uses a SegmentId class instead of plain Strings to represent segment IDs. Please see #6370 for details.

Added by @leventov.

`druid-api`, `druid-common`, `java-util` moved to `druid-core`

The druid-api, druid-common, java-util modules have been moved into druid-core. Please update your dependencies accordingly if your project depended on these libraries.

Please see #6443 for details.

Credits

Thanks to everyone who contributed to this release!

@a2l007
@AlexanderSaydakov
@anantmf
@ankit0811
@asdf2014
@awelsh93
@benhopp
@Caroline1000
@clintropolis
@dclim
@deiwin
@DiegoEliasCosta
@drcrallen
@dyf6372
@Dylan1312
@egor-ryashin
@elloooooo
@evans
@FaxianZhao
@gaodayue
@gianm
@glasser
@Guadrado
@hate13
@hoesler
@hpandeycodeit
@janeklb
@jihoonson
@jon-wei
@jorbay-au
@jsun98
@justinborromeo
@kamaci
@leventov
@lxqfy
@mirkojotic
@navkumar
@niketh
@patelh
@pzhdfy
@QiuMM
@rcgarcia74
@richardstartin
@robertervin
@samarthjain
@seoeun25
@Shimi
@surekhasaharan
@taiii
@thomask
@VincentNewkirk
@vogievetsky
@yunwan
@zhaojiandong

The text was updated successfully, but these errors were encountered:

jon-wei · 2019-02-23T02:59:01Z

I think this is largely complete now, please let me know if there's anything I should correct or add.

QiuMM · 2019-02-23T15:37:56Z

@jon-wei new metrics in #6587 and #6657.

glasser · 2019-03-01T22:12:06Z

Maybe you're aware of this, but the web-consoles doc page linked from the first entry doesn't seem to exist yet (I don't just mean the link is broken: I mean there's no web-consoles.md in master or 0.14-incubating branches). cc @vogievetsky

Also I think it might be fun to have a screen shot of the web console in the release notes!

jon-wei · 2019-03-01T22:15:22Z

Ah, the new page would be "management-uis", it was changed during PR review but I haven't updated these notes to reflect that yet.

A screen shot sounds like a good idea, thanks!

gianm · 2019-03-11T03:38:21Z

The notes talk about 'Maintenance mode for Historicals', which have been renamed recently (#7154).

sascha-coenen · 2019-03-16T09:27:59Z

In the processing buffer sizing section above, it says:

processingBufferSize = max(processingBufferSize, 1GB)

Shouldn't it rather be a min() operation? I imagine that the buffer size is supposed to be capped at 1 Gig.

vogievetsky · 2019-04-04T05:14:12Z

@glasser quick note: there has also been an entire page of docs added just for the new console: https://github.com/apache/incubator-druid/blob/master/docs/content/operations/druid-console.md
It is linked to from the page in the release notes

trtg · 2019-04-09T07:00:44Z

What is the state of this release? I.e. when it will be downloadable? It's no longer marked as WIP and on the release page it's no longer tagged as a RC, so is it going to be marked as the current stable release soon?

jon-wei · 2019-04-09T21:48:06Z

@sascha-coenen Thanks, it should be min() there.

@trtg 0.14.0 is released now, the vote passed yesterday but we needed to wait ~24 hours for the artifacts to propagate across mirrors

jon-wei added Release Notes WIP labels Feb 22, 2019

jon-wei added this to the 0.14.0 milestone Feb 22, 2019

jon-wei changed the title ~~[WIP] 0.14.0-incubating release notes~~ [DRAFT] 0.14.0-incubating release notes Feb 23, 2019

jon-wei changed the title ~~[DRAFT] 0.14.0-incubating release notes~~ 0.14.0-incubating release notes Mar 16, 2019

jon-wei removed the WIP label Mar 16, 2019

jihoonson closed this as completed May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.14.0-incubating release notes #7126

0.14.0-incubating release notes #7126

jon-wei commented Feb 22, 2019 •

edited by fjy

Loading

jon-wei commented Feb 23, 2019

QiuMM commented Feb 23, 2019

glasser commented Mar 1, 2019

jon-wei commented Mar 1, 2019

gianm commented Mar 11, 2019

sascha-coenen commented Mar 16, 2019

vogievetsky commented Apr 4, 2019

trtg commented Apr 9, 2019 •

edited

Loading

jon-wei commented Apr 9, 2019

0.14.0-incubating release notes #7126

0.14.0-incubating release notes #7126

Comments

jon-wei commented Feb 22, 2019 • edited by fjy Loading

Highlights

New web console

Kinesis indexing service

Decommissioning mode for Historicals

Published segment cache on Broker

Bloom filter aggregator and expression

Updated Parquet extension

Force push down option for nested GroupBy queries

Better segment handoff and retention rule handling

Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed

DogStatsD tag support for statsd-emitter

New API for retrieving all lookup specs

New compaction options

More efficient cachingCost segment balancing strategy

New metrics:

New interfaces for extension developers

RequestLogEvent

Custom TLS certificate checks

Kafka Indexing Service no longer experimental

SQL Enhancements

Enhancements to dsql

Add SQL id, request logs, and metrics

More SQL aggregator support

Other SQL enhancements

Updating from 0.13.0-incubating and earlier

Kafka ingestion downtime when upgrading

Parquet extension changes

Running Druid with non-2.8.3 Hadoop

Other Behavior changes

Old task cleanup

Automatic processing buffer sizing

Retention rules now include the future by default

Property changes

Segment announcing

fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory

Deprecations

Approximate Histogram aggregator

Cardinality/HyperUnique aggregator

Query Chunk Period

keepSegmentGranularity for Compaction

Interface changes for extension developers

SegmentId class

druid-api, druid-common, java-util moved to druid-core

Credits

jon-wei commented Feb 23, 2019

QiuMM commented Feb 23, 2019

glasser commented Mar 1, 2019

jon-wei commented Mar 1, 2019

gianm commented Mar 11, 2019

sascha-coenen commented Mar 16, 2019

vogievetsky commented Apr 4, 2019

trtg commented Apr 9, 2019 • edited Loading

jon-wei commented Apr 9, 2019

jon-wei commented Feb 22, 2019 •

edited by fjy

Loading

`keepSegmentGranularity` for Compaction

`SegmentId` class

`druid-api`, `druid-common`, `java-util` moved to `druid-core`

trtg commented Apr 9, 2019 •

edited

Loading