Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules/zstd: Add ZstdDecoder top level proc #1315

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lpawelcz
Copy link
Contributor

This PR supersedes #1169
It is a part of #1211.
NOTE: this is based on #1314.
Please ignore all commits with [TEMP] in commit message when reviewing. Those are squashed commits of all previous PRs (links are available in the second line of each commit message)

This PR adds a top level proc for ZSTD Decoder which integrates all its components in order to allow decoding of ZSTD frames. It includes C++ tests which:

  • generate with decodecorpus tool valid ZSTD frames
  • decode the frames with libzstd and ZstdDecoder
  • compare the results

NOTE: currently it is possible to decode frames with simple block types as RAW and RLE. There are however still some issues with decoder implementation as not all tests are passing, hence the commented out tests with generating random frames.

@proppy
Copy link
Member

proppy commented Feb 28, 2024

@lpawelcz can you add some markdown documentation on how to test the implementation against decorecorpus? (even if that just reference the blaze rules to run)

@proppy
Copy link
Member

proppy commented Feb 28, 2024

is the conflict in ir_convert because of #1204? should we rebase?

@proppy
Copy link
Member

proppy commented Feb 28, 2024

NOTE: currently it is possible to decode frames with simple block types as RAW and RLE. There are however still some issues with decoder implementation as not all tests are passing, hence the commented out tests with generating random frames.

Would it make sense to add a README.md w/ a known limitation section in the zstd directory to document this?

this->ParseAndCompareWithZstd(frame.value());
}

//class ZstdDecoderSeededTest : public ZstdDecoderTest,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single test case from this test suite generate a random ZSTD frame with decodecorpus utility. It is decoded with libzstd and then the same encoded frame is processed through the simulation of ZSTD Decoder. The output of the simulation is gathered and compared against results from libzstd decoding.
Such test case is repeated 50 times with generating ZSTD frames containing only RAW blocks and 50 times with generating only RLE blocks.
Currently, some of those test cases are failing and we are looking into those. For the time being we commented these tests out so that it would be easily visible if something unexpected will cause the CI to fail in this PR.

@lpawelcz
Copy link
Contributor Author

lpawelcz commented Mar 7, 2024

@lpawelcz can you add some markdown documentation on how to test the implementation against decorecorpus? (even if that just reference the blaze rules to run)

Would it make sense to add a README.md w/ a known limitation section in the zstd directory to document this?

We added a README.md which describes the ZSTD Decoder, its components, how to test against libzstd and what are the known limitations of the decoder in the current state.

is the conflict in ir_convert because of #1204? should we rebase?

Yes, the conflict was caused by changes introduced with regards to #1308, #1202 and #1204. Fixed with rebase

EDIT:
I've also temporarily included changes from #1326 as those are required for fixing physical design flows done with openroad.

data = [
":zstd_dec_verilog.ir",
],
#shard_count = 50,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that intentionally commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is useful for running ZstdDecoderSeededTest which are now commented out as we work on fixing all cases tested there.

)

cc_test(
name = "zstd_dec_cc_test",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this currently PASS 👍 but with the following warning:

  WARNING: //xls/modules/zstd:zstd_dec_cc_test: Test execution time (19.1s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test execution time will surely change after enabling back the ZstdDecoderSeededTest. Thanks for pointing that out. We will make sure to set those attributes correctly.

@@ -0,0 +1,241 @@
// Copyright 2020 The XLS Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to fail w/ a timeout when introducing a deliberate failure (i.e: changing MAGIC_NUMBER from u32:0xFD2FB528; to u32:0xFD2FB527;

Target //xls/modules/zstd:zstd_dec_cc_test up-to-date:
  bazel-bin/xls/modules/zstd/zstd_dec_cc_test
INFO: Elapsed time: 300.804s, Critical Path: 300.53s
INFO: 3 processes: 3 linux-sandbox.
INFO: Build completed, 1 test FAILED, 3 total actions
//xls/modules/zstd:zstd_dec_cc_test                                     TIMEOUT in 300.0s
  /usr/local/google/home/proppy/.cache/bazel/_bazel_proppy/fb962cb496438c85ace9ac1a1a0be573/execroot/com_google_xls/bazel-out/k8-fastbuild/testlogs/xls/modules/zstd/zstd_dec_cc_test/test.log

Executed 1 out of 1 test: 1 fails locally.

is that expected? or is there a way to make it fail earlier?
(attached the test.log)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would imagine that in the case of passing invalid ZSTD frame (e.g. wrong MAGIC_NUMBER), decoder should spin in DECODE_MAGIC_NUMBER state trying to detect it in incoming data packets and discarding the oldest byte in the buffer each time it failed. In your case decoder got stuck in ERROR state which requires restarting the decoder and we will be fixing that.

When it comes to failing early we'll have to look closely if this is possible.

# ZSTD decoder

The ZSTD decoder decompresses the correctly formed ZSTD frames and blocks.
It implements the [RFC 8878](https://www.rfc-editor.org/rfc/rfc8878.html) decompression algorithm.
Copy link
Member

@proppy proppy Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add paragraph breaks (extra newline) here and in other paragraphs below?


### Top level Proc
This state machine is responsible for receiving encoded ZSTD frames, buffering the input and passing it to decoder's internal components based on the state of the proc.
The states defined for the processing of ZSTD frame are as follows:
Copy link
Member

@proppy proppy Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* FEED_BLOCK_DECODER
* DECODE_CHECKSUM

After going through initial stages of decoding magic number and frame header, decoder starts the block division process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about inlining the description under of each state bullet point?

xls/modules/zstd/README.md Outdated Show resolved Hide resolved
After transmitting all data required for current block, it loops around to the block header decoding state and when next block header is not found it decodes checksum when it was requested in frame header or finishes ZSTD frame decoding and loops around to magic number decoding.

### ZSTD frame header decoder
This part of the design starts with detecting the ZSTD magic number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/This part of the design/This module/ ? to match with other description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frame header decoding is actually implemented as a series of functions that are called in the top level proc. I'm not sure if module would be a good word here.

xls/modules/zstd/README.md Outdated Show resolved Hide resolved
* FEED_BLOCK_DECODER
* DECODE_CHECKSUM

After going through initial stages of decoding magic number and frame header, decoder starts the block division process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like we're also somehow repeating what we say just below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the Top level Proc paragraph we provide only a short description of each stage so that it would be easier to get gist of the flow. After that, each particular stage is described in detail.

xls/modules/zstd/README.md Outdated Show resolved Hide resolved
xls/modules/zstd/README.md Outdated Show resolved Hide resolved
xls/modules/zstd/README.md Outdated Show resolved Hide resolved

## Known Limitations

* **[WIP]** Uses old version of `SequenceExecutor` (up-to-date version is available in [google/xls/pull/1295](https://github.com/google/xls/pull/1295)) due to [reported issues](https://github.com/google/xls/pull/1295#issuecomment-1943857515) with verilog generation
Copy link
Member

@proppy proppy Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongted is that a workaround we can share externally for #1018?


![](img/ZSTD_decoder.png)

## ZSTD decoder architecture
Copy link
Member

@proppy proppy Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add link/pointed to the corresponding source(s) for each section?

@cdleary cdleary added the app Application level functionality (examples, uses of XLS stack) label Mar 27, 2024
@lpawelcz
Copy link
Contributor Author

https://github.com/google/xls/actions/runs/9486126272/job/26139737436 shows that the test fails with:

=================================================================
=== BUG FOUND!

xls/modules/zstd/frame_header_test.cc:309: Counterexample found for FrameHeaderFuzzTest.ParseMultipleRandomFrameHeaders.
The test fails with input:
argument 0: "\241\001\331\331\017\242\242"

=================================================================

Is that enough information to reproduce the failure? or should we use libzstd to decode and print a structured version of the header?

This is sufficient information for reproducing the failure. We can easily write the example test case that takes the reported vector of bytes as the input and tries to interpret it as the frame header. We do the same thing with the reference zstd library and we compare the results. We check there for both positive and negative outcomes of the frame header decoding.

@lpawelcz
Copy link
Contributor Author

lpawelcz commented Jun 13, 2024

can you leave some failing test that capture the current known limitation (or document here how to make the CI fail so that I can check how it reports failure).

I've reverted 3 temporary commits that:

  • filter out 3 failing test cases for decoding ZSTD frames with RLE blocks
  • disable IR benchmarks and place and route rules due to proc inlining issue (Internal error during proc inlining #1455)
  • disable FuzzTests of the frame header decoder.

We now expect CI failures in related tests in ZSTD-specific CI workflow.

It looks like failures at the build stage (attempt to generate an optimized IR) cause a critical failure that prevents bazel from running the test stage for the rest of the targets. Because of that we can see in the CI that all tests have NO STATUS and one verilog generation target failed. This is rather unfortunate because we are unable to see the results of the whole testing procedure because of this early exit.

@lpawelcz
Copy link
Contributor Author

I've reworked the ZSTD-specific workflow a little bit. Now we first build and test all ZSTD modules and only after that we run the benchmarks.
This way we can see the full tests results in the log before the workflow fails on the IR optimization.

@@ -0,0 +1,46 @@
name: Continuous Integration (ZSTD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to modules-zstd.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -41,4 +41,7 @@ jobs:

- name: Bazel Test All (opt)
run: |
bazel test -c opt --noshow_progress --test_output=errors -- //xls/...
# Explicitly build the whole XLS except the benchmarking targets for the ZSTD module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you state why were are disabling the benchmarking rules w/ a comment? (is that because of #1455 ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because we wanted to skip running long place_and_route flows in the main CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagged all benchmark macros and rules as manual. With this modification we were able to get back to the simpler bazel configuration with only one package and all tests and benchmarks defined in a single BUILD file. I've also reworked the CI workflows to take that change into account.

Unfortunately, there seems to be some kind of an issue with the propagation of the tags attribute for some of the XLS macros (see #1487). I'm working on fixing that in #1488

@proppy
Copy link
Member

proppy commented Jun 18, 2024

Can you rebase and collapse the history?

@lpawelcz
Copy link
Contributor Author

Can you rebase and collapse the history?

Rebased on top of #1488 and cleaned the commit history.
#1488 still needs some improvements from my side before it will be fully functional

@lpawelcz lpawelcz force-pushed the 52186-zstd-top branch 2 times, most recently from 997d04f to 559b8c0 Compare June 19, 2024 09:19
@lpawelcz
Copy link
Contributor Author

Fixed #1488 and rebased on top of it once again. Cleaned the commit history - removed temporary commits that disabled some of the failing test cases.

At this point we expect the following failures in ZSTD-specific CI workflow:

  • failing FuzzTests for the frame header decoder
  • 3 failing test cases for decoding ZSTD frames with only RLE blocks in top level proc C++ tests, generator seeds: 41, 64, 92
  • IR optimization failure due to Internal error during proc inlining #1455 - will not be visible in CI as this is a build rule launched in a separate CI job step which is placed after the step that performs tests with two errors mentioned above

run: |
bazel test -c opt --test_output=errors -- //xls/modules/zstd:all

- name: Build ZSTD verilog targets (opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if we want to attempt to run those even of the build fails? (I'd be fine with both)

Copy link
Contributor Author

@lpawelcz lpawelcz Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added continue-on-error: true to specific steps of the workflow

Please do have in mind that now failing tests will be marked as passing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want continue-on-error at the job level, so that we can get a red mark on the steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganized the workflow. There are multiple jobs defined now, each has continue-on-error: true.

There is:

  • building and running tests of the ZSTD components,
  • running IR benchmarks,
  • verilog generation,
  • synthesis,
  • physical design flow

Testing job fails due to errors in the frame header fuzz test setup and 3 failing test cases for the top level proc. The rest of the jobs fail only due to the #1455.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the miss-understanding. I didn't meant really to split those in different jobs but rather to move the continue-on-error attribute to the job level (rather than keeping it in on the individual steps).

One drawback on the current approach (separate jobs) is that you'll end up recompiling everything from scratch in each of them (and potentially trash out the cache, as all jobs are using the same key)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right continue-on-error seem to be related to matrix/workflow status: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idcontinue-on-error

Copy link
Member

@proppy proppy Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be a well-known limitation of actions: actions/runner#2347

Copy link
Member

@proppy proppy Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using if: ${{ !cancelled() }} on indivual step that we want to run inconditionally as a workaround: https://docs.github.com/en/actions/learn-github-actions/expressions#always

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just revert to the original version (which stop the workflow on the first failure); as you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if: ${{ !cancelled() }} does the trick. We can now have a single job for testing the whole ZSTD module and we are able to see the execution logs for each of the steps.

@lpawelcz lpawelcz force-pushed the 52186-zstd-top branch 2 times, most recently from 6ffde50 to 85e1fc1 Compare June 20, 2024 12:39
@proppy
Copy link
Member

proppy commented Jun 21, 2024

can you collapse the history?

@lpawelcz
Copy link
Contributor Author

can you collapse the history?

@proppy would you like to have a single commit for this PR or 1 commit per each major proc?

@proppy
Copy link
Member

proppy commented Jun 24, 2024

one commit for the whole PR is ok.

@lpawelcz
Copy link
Contributor Author

one commit for the whole PR is ok.

@proppy done

Please note that currently this PR also includes the changes from #1488
Once that PR is merged I will rebase this one back onto current main.

@lpawelcz
Copy link
Contributor Author

@proppy #1488 got merged. Rebased on top of current main.

This dependency was unnecessarily removed in google@5b16a18.
It is required for testing the ZSTD Decoder

Signed-off-by: Pawel Czarnecki <pczarnecki@antmicro.com>
This commit adds an initial implementation of the ZSTD Decoder.
It is capable of decoding simple ZSTD frames containing raw and rle
blocks.

This is a squashed commit that was created from the following changes:

modules/zstd: Add buffer library
modules/zstd: Add Buffer use-case example
modules/zstd: Add library for parsing magic number
modules/zstd: Add library for parsing frame header
dependency_support/libzstd: Make zstd_errors.h public
dependency_support: Add decodecorpus binary
modules/zstd: Add data generator library
modules/zstd: Add zstd frame header tests
modules/zstd: Add common zstd definitions
modules/zstd: Add raw block decoder
modules/zstd: Add rle block decoder
modules/zstd: Add block header parsing library
modules/zstd: Add SequenceExecutorPacket to common definitions
modules/zstd: Add block data muxer library
modules/zstd: Add block demuxer library
modules/zstd: Add block decoder module
modules/zstd/common: Specify decoder output format
examples/ram: Export internal RAM API to other modules
modules/zstd: Add Offset type to common zstd definitions
modules/zstd: Add RamPrinter Proc
modules/zstd: Add SequenceExecutor Proc
modules/zstd: Add repacketizer
modules/zstd: Add ZSTD decoder
modules/zstd: Add ZSTD Decoder documentation
CI: Add custom ZSTD module workflow

Co-authored-by: Maciej Dudek <mdudek@antmicro.com>
Co-authored-by: Pawel Czarnecki <pczarnecki@antmicro.com>
Co-authored-by: Robert Winkler <rwinkler@antmicro.com>
Co-authored-by: Roman Dobrodii <rdobrodii@antmicro.com>
Signed-off-by: Maciej Dudek <mdudek@antmicro.com>
Signed-off-by: Pawel Czarnecki <pczarnecki@antmicro.com>
Signed-off-by: Robert Winkler <rwinkler@antmicro.com>
Signed-off-by: Roman Dobrodii <rdobrodii@antmicro.com>
@lpawelcz
Copy link
Contributor Author

lpawelcz commented Jun 27, 2024

@hzeller, @proppy FYI
5b16a18 removed libzstd dependency that was introduced in #1166. libzstd is required for testing the ZSTD decoder. ef8aa0e adds it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app Application level functionality (examples, uses of XLS stack)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants