Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v1.5.2
->v1.5.5
Release Notes
facebook/zstd (zstd)
v1.5.5
: Zstandard v1.5.5Compare Source
Zstandard v1.5.5 Release Note
This is a quick fix release. The primary focus is to correct a rare corruption bug in high compression mode, detected by @danlark1 . The probability to generate such a scenario by random chance is extremely low. It evaded months of continuous fuzzer tests, due to the number and complexity of simultaneous conditions required to trigger it. Nevertheless, @danlark1 from Google shepherds such a humongous amount of data that he managed to detect a reproduction case (corruptions are detected thanks to the checksum), making it possible for @terrelln to investigate and fix the bug. Thanks !
While the probability might be very small, corruption issues are nonetheless very serious, so an update to this version is highly recommended, especially if you employ high compression modes (levels 16+).
When the issue was detected, there were a number of other improvements and minor fixes already in the making, hence they are also present in this release. Let’s detail the main ones.
Improved memory usage and speed for the
--patch-from
modeV1.5.5
introduces memory-mapped dictionaries, by @daniellerozenblit, for both posix #3486 and windows #3557.This feature allows
zstd
to memory-map large dictionaries, rather than requiring to load them into memory. This can make a pretty big difference for memory-constrained environments operating patches for large data sets.It's mostly visible under memory pressure, since
mmap
will be able to release less-used memory and continue working.But even when memory is plentiful, there are still measurable memory benefits, as shown in the graph below, especially when the reference turns out to be not completely relevant for the patch.
This feature is automatically enabled for
--patch-from
compression/decompression when the dictionary is larger than the user-set memory limit. It can also be manually enabled/disabled using--mmap-dict
or--no-mmap-dict
respectively.Additionally, @daniellerozenblit introduces significant speed improvements for
--patch-from
.An
I/O
optimization in #3486 greatly improves--patch-from
decompression speed on Linux, typically by+50%
on large files (~1GB).Compression speed is also taken care of, with a dictionary-indexing speed optimization introduced in #3545. It wildly accelerates
--patch-from
compression, typically doubling speed on large files (~1GB), sometimes even more depending on exact scenario.This speed improvement comes at a slight regression in compression ratio, and is therefore enabled only on non-ultra compression strategies.
Speed improvements of middle-level compression for specific scenarios
The row-hash match finder introduced in version 1.5.0 for levels 5-12 has been improved in version 1.5.5, enhancing its speed in specific corner-case scenarios.
The first optimization (#3426) accelerates streaming compression using
ZSTD_compressStream
on small inputs by removing an expensive table initialization step. This results in remarkable speed increases for very small inputs.The following scenario measures compression speed of
ZSTD_compressStream
at level 9 for different sample sizes on a linux platform running an i7-9700k cpu.v1.5.4
(MB/s)v1.5.5
(MB/s)The second optimization (#3552) speeds up compression of incompressible data by a large multiplier. This is achieved by increasing the step size and reducing the frequency of matching when no matches are found, with negligible impact on the compression ratio. It makes mid-level compression essentially inexpensive when processing incompressible data, typically, already compressed data (note: this was already the case for fast compression levels).
The following scenario measures compression speed of
ZSTD_compress
compiled withgcc-9
for a ~10MB incompressible sample on a linux platform running an i7-9700k cpu.v1.5.4
(MB/s)v1.5.5
(MB/s)Miscellaneous
There are other welcome speed improvements in this package.
For example, @felixhandte managed to increase processing speed of small files by carefully reducing the nb of system calls (#3479). This can easily translate into +10% speed when processing a lot of small files in batch.
The Seekable format received a bit of care. It's now much faster when splitting data into very small blocks (#3544). In an extreme scenario reported by @P-E-Meunier, it improves processing speed by x90. Even for more "common" settings, such as using 4KB blocks on some "normally" compressible data like
enwik
, it still provides a healthy x2 processing speed benefit. Moreover, @dloidolt merged an optimization that reduces the nb ofI/O
seek()
events during reads (decompression), which is also beneficial for speed.The release is not limited to speed improvements, several loose ends and corner cases were also fixed in this release. For a more detailed list of changes, please take a look at the changelog.
Change Log
mmap
large dictionaries to save memory, by @daniellerozenblit--patch-from
mode (~+50%) (#3545) by @daniellerozenblitzstd
no longer crashes when requested to write into write-protected directory (#3541) by @felixhandte-o
(#3584, @Cyan4973) reported by @georgmucmake
does no longer require 3.18 as minimum version (#3510) by @koutests/fullbench
can benchmark multiple files (#3516) by @dloidoltFull change list (auto-generated)
f
-variants ofchmod()
andchown()
by @felixhandte in https://github.com/facebook/zstd/pull/3479setvbuf()
on Null File Pointer by @felixhandte in https://github.com/facebook/zstd/pull/3541-std=c++11
When Default is Older by @felixhandte in https://github.com/facebook/zstd/pull/3574dest
is valid for decompression by @daniellerozenblit in https://github.com/facebook/zstd/pull/3555New Contributors
Full Changelog: facebook/zstd@v1.5.4...v1.5.5
v1.5.4
: Zstandard v1.5.4Compare Source
Zstandard
v1.5.4
is a pretty big release benefiting from one year of work, spread over > 650 commits. It offers significant performance improvements across multiple scenarios, as well as new features (detailed below). There is a crop of little bug fixes too, a few ones targeting the 32-bit mode are important enough to make this release a recommended upgrade.Various Speed improvements
This release has accumulated a number of scenario-specific improvements, that cumulatively benefit a good portion of installed base in one way or another.
Among the easier ones to describe, the repository has received several contributions for
arm
optimizations, notably from @JunHe77 and @danlark1. And @terrelln has improved decompression speed for non-x64 systems, includingarm
. The combination of this work is visible in the following example, using an M1-Pro (aarch64
architecture) :v1.5.2
v1.5.4
silesia.tar
silesia.tar
Middle compression levels (5-12) receive some care too, with @terrelln improving the dispatch engine, and @danlark1 offering
NEON
optimizations. Exact speed up vary depending on platform, cpu, compiler, and compression level, though one can expect gains ranging from +1 to +10% depending on scenarios.v1.5.2
v1.5.4
silesia.tar
silesia.tar
silesia.tar
silesia.tar
silesia.tar
silesia.tar
silesia.tar
silesia.tar
silesia.tar
Speed of the streaming compression interface has been improved by @embg in scenarios involving large files (where size is a multiple of the
windowSize
parameter). The improvement is mostly perceptible at high speeds (i.e. ~level 1). In the following sample, the measurement is taken directly atZSTD_compressStream()
function call, using a dedicated benchmark tooltests/fullbench
.v1.5.2
v1.5.4
ZSTD_compressStream()
-1silesia.tar
ZSTD_compressStream()
-1silesia.tar
ZSTD_compressStream()
-1silesia.tar
Finally, dictionary compression speed has received a good boost by @embg. Exact outcome varies depending on system and corpus. The following result is achieved by cutting the
enwik8
compression corpus into 1KB blocks, generating a dictionary from these blocks, and then benchmarking the compression speed at level 1.v1.5.2
v1.5.4
enwik8
-B1Kenwik8
-B1Kenwik8
-B1KThere are a few more scenario-specifics improvements listed in the
changelog
section below.I/O Performance improvements
The 1.5.4 release improves IO performance of
zstd
CLI, by using system buffers (macos
) and adding a new asynchronous I/O capability, enabled by default on large files (when threading is available). The user can also explicitly control this capability with the--[no-]asyncio
flag . These new threads remove the need to block on IO operations. The impact is mostly noticeable when decompressing large files (>= a few MBs), though exact outcome depends on environment and run conditions.Decompression speed gets significant gains due to its single-threaded serial nature and the high speeds involved. In some cases we observe up to double performance improvement (local Mac machines) and a wide +15-45% benefit on Intel Linux servers (see table for details).
On the compression side of things, we’ve measured up to 5% improvements. The impact is lower because compression is already partially asynchronous via the internal MT mode (see release v1.3.4).
The following table shows the elapsed run time for decompressions of
silesia
andenwik8
on several platforms - some Skylake-era Linux servers and an M1 MacbookPro. It compares the time it takes for versionv1.5.2
to versionv1.5.4
with asyncio on and off.platform | corpus |
v1.5.2
|v1.5.4-no-asyncio
|v1.5.4
| Improvement-- | -- | -- | -- | -- | --
Xeon D-2191A CentOS8 |
enwik8
| 280 MB/s | 280 MB/s | 324 MB/s | +16%Xeon D-2191A CentOS8 |
silesia.tar
| 303 MB/s | 302 MB/s | 386 MB/s | +27%i7-1165g7 win10 |
enwik8
| 270 MB/s | 280 MB/s | 350 MB/s | +27%i7-1165g7 win10 |
silesia.tar
| 450 MB/s | 440 MB/s | 580 MB/s | +28%i7-9700K Ubuntu20 |
enwik8
| 600 MB/s | 604 MB/s | 829 MB/s | +38%i7-9700K Ubuntu20 |
silesia.tar
| 683 MB/s | 678 MB/s | 991 MB/s | +45%Galaxy S22 |
enwik8
| 360 MB/s | 420 MB/s | 515 MB/s | +70%Galaxy S22 |
silesia.tar
| 310 MB/s | 320 MB/s | 580 MB/s | +85%MBP M1 |
enwik8
| 428 MB/s | 734 MB/s | 815 MB/s | +90%MBP M1 |
silesia.tar
| 465 MB/s | 875 MB/s | 1001 MB/s | +115%Support of externally-defined sequence producers
libzstd
can now support external sequence producers via a new advanced registration functionZSTD_registerSequenceProducer()
(#3333).This API allows users to provide their own custom sequence producer which libzstd invokes to process each block. The produced list of sequences (literals and matches) is then post-processed by libzstd to produce valid compressed blocks.
This block-level offload API is a more granular complement of the existing frame-level offload API
compressSequences()
(introduced inv1.5.1
). It offers an easier migration story for applications already integrated withlibzstd
: the user application continues to invoke the same compression functionsZSTD_compress2()
orZSTD_compressStream2()
as usual, and transparently benefits from the specific properties of the external sequence producer. For example, the sequence producer could be tuned to take advantage of known characteristics of the input, to offer better speed / ratio.One scenario that becomes possible is to combine this capability with hardware-accelerated matchfinders, such as the Intel® QuickAssist accelerator (Intel® QAT) provided in server CPUs such as the 4th Gen Intel® Xeon® Scalable processors (previously codenamed Sapphire Rapids). More details to be provided in future communications.
Change Log
perf: +20% faster huffman decompression for targets that can't compile x64 assembly (#3449, @terrelln)
perf: up to +10% faster streaming compression at levels 1-2 (#3114, @embg)
perf: +4-13% for levels 5-12 by optimizing function generation (#3295, @terrelln)
pref: +3-11% compression speed for
arm
target (#3199, #3164, #3145, #3141, #3138, @JunHe77 and #3139, #3160, @danlark1)perf: +5-30% faster dictionary compression at levels 1-4 (#3086, #3114, #3152, @embg)
perf: +10-20% cold dict compression speed by prefetching CDict tables (#3177, @embg)
perf: +1% faster compression by removing a branch in ZSTD_fast_noDict (#3129, @felixhandte)
perf: Small compression ratio improvements in high compression mode (#2983, #3391, @Cyan4973 and #3285, #3302, @daniellerozenblit)
perf: small speed improvement by better detecting
STATIC_BMI2
forclang
(#3080, @TocarIP)perf: Improved streaming performance when
ZSTD_c_stableInBuffer
is set (#2974, @Cyan4973)cli: Asynchronous I/O for improved cli speed (#2975, #2985, #3021, #3022, @yoniko)
cli: Change
zstdless
behavior to align withzless
(#2909, @binhdvo)cli: Keep original file if
-c
or--stdout
is given (#3052, @dirkmueller)cli: Keep original files when result is concatenated into a single output with
-o
(#3450, @Cyan4973)cli: Preserve Permissions and Ownership of regular files (#3432, @felixhandte)
cli: Print zlib/lz4/lzma library versions with
-vv
(#3030, @terrelln)cli: Print checksum value for single frame files with
-lv
(#3332, @Cyan4973)cli: Print
dictID
when present with-lv
(#3184, @htnhan)cli: when
stderr
is not the console, disable status updates, but preserve final summary (#3458, @Cyan4973)cli: support
--best
and--no-name
ingzip
compatibility mode (#3059, @dirkmueller)cli: support for
posix
high resolution timerclock_gettime()
, for improved benchmark accuracy (#3423, @Cyan4973)cli: improved help/usage (
-h
,-H
) formatting (#3094, @dirkmueller and #3385, @jonpalmisc)cli: Fix better handling of bogus numeric values (#3268, @ctkhanhly)
cli: Fix input consists of multiple files and
stdin
(#3222, @yoniko)cli: Fix tiny files passthrough (#3215, @cgbur)
cli: Fix for
-r
on empty directory (#3027, @brailovich)cli: Fix empty string as argument for
--output-dir-*
(#3220, @embg)cli: Fix decompression memory usage reported by
-vv --long
(#3042, @u1f35c, and #3232, @zengyijing)cli: Fix infinite loop when empty input is passed to trainer (#3081, @terrelln)
cli: Fix
--adapt
doesn't work when--no-progress
is also set (#3354, @terrelln)api: Support for External Sequence Producer (#3333, @embg)
api: Support for in-place decompression (#3432, @terrelln)
api: New
ZSTD_CCtx_setCParams()
function, set all parameters defined in aZSTD_compressionParameters
structure (#3403, @Cyan4973)api: Streaming decompression detects incorrect header ID sooner (#3175, @Cyan4973)
api: Window size resizing optimization for edge case (#3345, @daniellerozenblit)
api: More accurate error codes for busy-loop scenarios (#3413, #3455, @Cyan4973)
api: Fix limit overflow in
compressBound
anddecompressBound
(#3362, #3373, Cyan4973) reported by @nigeltaoapi: Deprecate several advanced experimental functions: streaming (#3408, @embg), copy (#3196, @mileshu)
bug: Fix corruption that rarely occurs in 32-bit mode with wlog=25 (#3361, @terrelln)
bug: Fix for block-splitter (#3033, @Cyan4973)
bug: Fixes for Sequence Compression API (#3023, #3040, @Cyan4973)
bug: Fix leaking thread handles on Windows (#3147, @animalize)
bug: Fix timing issues with cmake/meson builds (#3166, #3167, #3170, @Cyan4973)
build: Allow user to select legacy level for cmake (#3050, @shadchin)
build: Enable legacy support by default in cmake (#3079, @niamster)
build: Meson build script improvements (#3039, #3120, #3122, #3327, #3357, @eli-schwartz and #3276, @neheb)
build: Add aarch64 to supported architectures for zstd_trace (#3054, @ooosssososos)
build: support AIX architecture (#3219, @qiongsiwu)
build: Fix
ZSTD_LIB_MINIFY
build macro, which now reduces static library size by half (#3366, @terrelln)build: Fix Windows issues with Multithreading translation layer (#3364, #3380, @yoniko) and ARM64 target (#3320, @cwoffenden)
build: Fix
cmake
script (#3382, #3392, @terrelln and #3252 @Tachi107 and #3167 @Cyan4973)doc: Updated man page, providing more details for
--train
mode (#3112, @Cyan4973)doc: Add decompressor errata document (#3092, @terrelln)
misc: Enable Intel CET (#2992, #2994, @hjl-tools)
misc: Fix
contrib/
seekable format (#3058, @yhoogstrate and #3346, @daniellerozenblit)misc: Improve speed of the one-file library generator (#3241, @wahern and #3005, @cwoffenden)
PR list (generated by Github)
ip1
into Table by @felixhandte in https://github.com/facebook/zstd/pull/3129wlog
when doing--long
by @zengyijing in https://github.com/facebook/zstd/pull/3226make clean
list maintenance by adding aCLEAN
variable by @Cyan4973 in https://github.com/facebook/zstd/pull/3256-E
flag insed
by @haampie in https://github.com/facebook/zstd/pull/3245ZSTD_count
call by @JunHe77 in https://github.com/facebook/zstd/pull/3199zstd
CLI accepts bogus values for numeric parameters by @ctkhanhly in https://github.com/facebook/zstd/pull/3268clang
by @MaskRay in https://github.com/facebook/zstd/pull/3273Configuration
📅 Schedule: Branch creation - "every weekend" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.