-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modules/zstd: Add ZstdDecoder top level proc #1315
base: main
Are you sure you want to change the base?
Conversation
71be328
to
84b31f3
Compare
@lpawelcz can you add some markdown documentation on how to test the implementation against |
is the conflict in |
Would it make sense to add a README.md w/ a known limitation section in the zstd directory to document this? |
xls/modules/zstd/zstd_dec_test.cc
Outdated
this->ParseAndCompareWithZstd(frame.value()); | ||
} | ||
|
||
//class ZstdDecoderSeededTest : public ZstdDecoderTest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason this is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single test case from this test suite generate a random ZSTD frame with decodecorpus
utility. It is decoded with libzstd
and then the same encoded frame is processed through the simulation of ZSTD Decoder. The output of the simulation is gathered and compared against results from libzstd
decoding.
Such test case is repeated 50 times with generating ZSTD frames containing only RAW blocks and 50 times with generating only RLE blocks.
Currently, some of those test cases are failing and we are looking into those. For the time being we commented these tests out so that it would be easily visible if something unexpected will cause the CI to fail in this PR.
We added a
Yes, the conflict was caused by changes introduced with regards to #1308, #1202 and #1204. Fixed with rebase EDIT: |
xls/modules/zstd/BUILD
Outdated
data = [ | ||
":zstd_dec_verilog.ir", | ||
], | ||
#shard_count = 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that intentionally commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is useful for running ZstdDecoderSeededTest
which are now commented out as we work on fixing all cases tested there.
) | ||
|
||
cc_test( | ||
name = "zstd_dec_cc_test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this currently PASS
👍 but with the following warning:
WARNING: //xls/modules/zstd:zstd_dec_cc_test: Test execution time (19.1s excluding execution overhead) outside of range for MODERATE tests. Consider setting timeout="short" or size="small".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test execution time will surely change after enabling back the ZstdDecoderSeededTest
. Thanks for pointing that out. We will make sure to set those attributes correctly.
@@ -0,0 +1,241 @@ | |||
// Copyright 2020 The XLS Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to fail w/ a timeout when introducing a deliberate failure (i.e: changing MAGIC_NUMBER
from u32:0xFD2FB528;
to u32:0xFD2FB527;
Target //xls/modules/zstd:zstd_dec_cc_test up-to-date:
bazel-bin/xls/modules/zstd/zstd_dec_cc_test
INFO: Elapsed time: 300.804s, Critical Path: 300.53s
INFO: 3 processes: 3 linux-sandbox.
INFO: Build completed, 1 test FAILED, 3 total actions
//xls/modules/zstd:zstd_dec_cc_test TIMEOUT in 300.0s
/usr/local/google/home/proppy/.cache/bazel/_bazel_proppy/fb962cb496438c85ace9ac1a1a0be573/execroot/com_google_xls/bazel-out/k8-fastbuild/testlogs/xls/modules/zstd/zstd_dec_cc_test/test.log
Executed 1 out of 1 test: 1 fails locally.
is that expected? or is there a way to make it fail earlier?
(attached the test.log)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would imagine that in the case of passing invalid ZSTD frame (e.g. wrong MAGIC_NUMBER
), decoder should spin in DECODE_MAGIC_NUMBER
state trying to detect it in incoming data packets and discarding the oldest byte in the buffer each time it failed. In your case decoder got stuck in ERROR
state which requires restarting the decoder and we will be fixing that.
When it comes to failing early we'll have to look closely if this is possible.
# ZSTD decoder | ||
|
||
The ZSTD decoder decompresses the correctly formed ZSTD frames and blocks. | ||
It implements the [RFC 8878](https://www.rfc-editor.org/rfc/rfc8878.html) decompression algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add paragraph breaks (extra newline) here and in other paragraphs below?
|
||
### Top level Proc | ||
This state machine is responsible for receiving encoded ZSTD frames, buffering the input and passing it to decoder's internal components based on the state of the proc. | ||
The states defined for the processing of ZSTD frame are as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state diagram with https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams#creating-mermaid-diagrams (instead of image files)?
* FEED_BLOCK_DECODER | ||
* DECODE_CHECKSUM | ||
|
||
After going through initial stages of decoding magic number and frame header, decoder starts the block division process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about inlining the description under of each state bullet point?
After transmitting all data required for current block, it loops around to the block header decoding state and when next block header is not found it decodes checksum when it was requested in frame header or finishes ZSTD frame decoding and loops around to magic number decoding. | ||
|
||
### ZSTD frame header decoder | ||
This part of the design starts with detecting the ZSTD magic number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/This part of the design/This module/ ? to match with other description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frame header decoding is actually implemented as a series of functions that are called in the top level proc. I'm not sure if module
would be a good word here.
* FEED_BLOCK_DECODER | ||
* DECODE_CHECKSUM | ||
|
||
After going through initial stages of decoding magic number and frame header, decoder starts the block division process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels like we're also somehow repeating what we say just below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the Top level Proc
paragraph we provide only a short description of each stage so that it would be easier to get gist of the flow. After that, each particular stage is described in detail.
xls/modules/zstd/README.md
Outdated
|
||
## Known Limitations | ||
|
||
* **[WIP]** Uses old version of `SequenceExecutor` (up-to-date version is available in [google/xls/pull/1295](https://github.com/google/xls/pull/1295)) due to [reported issues](https://github.com/google/xls/pull/1295#issuecomment-1943857515) with verilog generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
![](img/ZSTD_decoder.png) | ||
|
||
## ZSTD decoder architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add link/pointed to the corresponding source(s) for each section?
This is sufficient information for reproducing the failure. We can easily write the example test case that takes the reported vector of bytes as the input and tries to interpret it as the frame header. We do the same thing with the reference zstd library and we compare the results. We check there for both positive and negative outcomes of the frame header decoding. |
I've reverted 3 temporary commits that:
We now expect CI failures in related tests in ZSTD-specific CI workflow. It looks like failures at the |
I've reworked the ZSTD-specific workflow a little bit. Now we first build and test all ZSTD modules and only after that we run the benchmarks. |
@@ -0,0 +1,46 @@ | |||
name: Continuous Integration (ZSTD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to modules-zstd.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -41,4 +41,7 @@ jobs: | |||
|
|||
- name: Bazel Test All (opt) | |||
run: | | |||
bazel test -c opt --noshow_progress --test_output=errors -- //xls/... | |||
# Explicitly build the whole XLS except the benchmarking targets for the ZSTD module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you state why were are disabling the benchmarking rules w/ a comment? (is that because of #1455 ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because we wanted to skip running long place_and_route
flows in the main CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could tag them as manual
, see https://bazel.build/reference/be/common-definitions#common-attributes:~:text=it%27s%20executed%20remotely.-,manual,-keyword%20will%20exclude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagged all benchmark macros and rules as manual
. With this modification we were able to get back to the simpler bazel configuration with only one package and all tests and benchmarks defined in a single BUILD
file. I've also reworked the CI workflows to take that change into account.
Unfortunately, there seems to be some kind of an issue with the propagation of the tags
attribute for some of the XLS macros (see #1487). I'm working on fixing that in #1488
Can you rebase and collapse the history? |
997d04f
to
559b8c0
Compare
Fixed #1488 and rebased on top of it once again. Cleaned the commit history - removed temporary commits that disabled some of the failing test cases. At this point we expect the following failures in ZSTD-specific CI workflow:
|
run: | | ||
bazel test -c opt --test_output=errors -- //xls/modules/zstd:all | ||
|
||
- name: Build ZSTD verilog targets (opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if we want to attempt to run those even of the build fails? (I'd be fine with both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added continue-on-error: true
to specific steps of the workflow
Please do have in mind that now failing tests will be marked as passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want continue-on-error
at the job level, so that we can get a red mark on the steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganized the workflow. There are multiple jobs defined now, each has continue-on-error: true
.
There is:
- building and running tests of the ZSTD components,
- running IR benchmarks,
- verilog generation,
- synthesis,
- physical design flow
Testing job fails due to errors in the frame header fuzz test setup and 3 failing test cases for the top level proc. The rest of the jobs fail only due to the #1455.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the miss-understanding. I didn't meant really to split those in different jobs but rather to move the continue-on-error
attribute to the job level (rather than keeping it in on the individual steps).
One drawback on the current approach (separate jobs) is that you'll end up recompiling everything from scratch in each of them (and potentially trash out the cache, as all jobs are using the same key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right continue-on-error
seem to be related to matrix/workflow status: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idcontinue-on-error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a well-known limitation of actions: actions/runner#2347
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using if: ${{ !cancelled() }}
on indivual step that we want to run inconditionally as a workaround: https://docs.github.com/en/actions/learn-github-actions/expressions#always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe just revert to the original version (which stop the workflow on the first failure); as you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if: ${{ !cancelled() }}
does the trick. We can now have a single job for testing the whole ZSTD module and we are able to see the execution logs for each of the steps.
6ffde50
to
85e1fc1
Compare
can you collapse the history? |
@proppy would you like to have a single commit for this PR or 1 commit per each major proc? |
one commit for the whole PR is ok. |
This dependency was unnecessarily removed in google@5b16a18. It is required for testing the ZSTD Decoder Signed-off-by: Pawel Czarnecki <pczarnecki@antmicro.com>
This commit adds an initial implementation of the ZSTD Decoder. It is capable of decoding simple ZSTD frames containing raw and rle blocks. This is a squashed commit that was created from the following changes: modules/zstd: Add buffer library modules/zstd: Add Buffer use-case example modules/zstd: Add library for parsing magic number modules/zstd: Add library for parsing frame header dependency_support/libzstd: Make zstd_errors.h public dependency_support: Add decodecorpus binary modules/zstd: Add data generator library modules/zstd: Add zstd frame header tests modules/zstd: Add common zstd definitions modules/zstd: Add raw block decoder modules/zstd: Add rle block decoder modules/zstd: Add block header parsing library modules/zstd: Add SequenceExecutorPacket to common definitions modules/zstd: Add block data muxer library modules/zstd: Add block demuxer library modules/zstd: Add block decoder module modules/zstd/common: Specify decoder output format examples/ram: Export internal RAM API to other modules modules/zstd: Add Offset type to common zstd definitions modules/zstd: Add RamPrinter Proc modules/zstd: Add SequenceExecutor Proc modules/zstd: Add repacketizer modules/zstd: Add ZSTD decoder modules/zstd: Add ZSTD Decoder documentation CI: Add custom ZSTD module workflow Co-authored-by: Maciej Dudek <mdudek@antmicro.com> Co-authored-by: Pawel Czarnecki <pczarnecki@antmicro.com> Co-authored-by: Robert Winkler <rwinkler@antmicro.com> Co-authored-by: Roman Dobrodii <rdobrodii@antmicro.com> Signed-off-by: Maciej Dudek <mdudek@antmicro.com> Signed-off-by: Pawel Czarnecki <pczarnecki@antmicro.com> Signed-off-by: Robert Winkler <rwinkler@antmicro.com> Signed-off-by: Roman Dobrodii <rdobrodii@antmicro.com>
This PR supersedes #1169
It is a part of #1211.
NOTE: this is based on #1314.
Please ignore all commits with
[TEMP]
in commit message when reviewing. Those are squashed commits of all previous PRs (links are available in the second line of each commit message)This PR adds a top level proc for ZSTD Decoder which integrates all its components in order to allow decoding of ZSTD frames. It includes C++ tests which:
decodecorpus
tool valid ZSTD frameslibzstd
andZstdDecoder
NOTE: currently it is possible to decode frames with simple block types as RAW and RLE. There are however still some issues with decoder implementation as not all tests are passing, hence the commented out tests with generating random frames.