You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal is to bring support for common compression formats already implemented in Druid’s code base (zstd, gzip, etc) to Kinesis streams.
Compression would be exposed via an optional configuration parameter in the Kinesis ioConfig, ‘compressionFormat’, that when enabled will perform decompression of records at the point of record collection.
Motivation
Unlike Kafka, Kinesis by default does not offer much opportunity for compression out of the box. Because of this, it is a common usage pattern for Kinesis customers to compress/decompress their own data across the wire.
Given that Druid already has internal concepts for compression in various popular formats (zstd, gzip, etc), it would be useful for high throughput customers to have the ability to compress data across the wire.
Our team (a fleet of enterprise Druid clusters at petabyte scale) has seen Kinesis cost reduction to the tune of 50-80% by implementing a custom build of Druid with Kinesis decompression capabilities with little to no discernible impact on ingestion overhead.
PR forthcoming in a few days, but I wanted to open this feature request for community discussion.
The text was updated successfully, but these errors were encountered:
Description
Placeholder Feature Request for an upcoming PR.
This proposal is to bring support for common compression formats already implemented in Druid’s code base (zstd, gzip, etc) to Kinesis streams.
Compression would be exposed via an optional configuration parameter in the Kinesis ioConfig, ‘compressionFormat’, that when enabled will perform decompression of records at the point of record collection.
Motivation
Unlike Kafka, Kinesis by default does not offer much opportunity for compression out of the box. Because of this, it is a common usage pattern for Kinesis customers to compress/decompress their own data across the wire.
Given that Druid already has internal concepts for compression in various popular formats (zstd, gzip, etc), it would be useful for high throughput customers to have the ability to compress data across the wire.
Our team (a fleet of enterprise Druid clusters at petabyte scale) has seen Kinesis cost reduction to the tune of 50-80% by implementing a custom build of Druid with Kinesis decompression capabilities with little to no discernible impact on ingestion overhead.
PR forthcoming in a few days, but I wanted to open this feature request for community discussion.
The text was updated successfully, but these errors were encountered: