Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test orientation & distribution format issues #1311

Closed
protolambda opened this issue Jul 23, 2019 · 6 comments
Closed

Test orientation & distribution format issues #1311

protolambda opened this issue Jul 23, 2019 · 6 comments

Comments

@protolambda
Copy link
Collaborator

protolambda commented Jul 23, 2019

This issue is set up as a starting point to address inconveniences clients have experienced with the test distribution and/or format.

Pain points

Main pain points identified so far:

  1. LFS deals with the large files
    1.1) not that nice with CI, new dependency
    1.2) clones without LFS active are troublesome
    1.3) requires authentication.
    • Shouldn't happen, but relatively new issue. May be forced due to rate limits / settings.
  2. The sizes of the files are very large
    2.1) Need to read header to filter for tests, do not want to read the full thing in memory
    2.2) Cannot load the full suite of tests in memory
  3. I need format X, because feature Y of the current format is a bad experience for me.
  4. Configuration is missing, do not like it in the specs repo.

File sizes 0.8.1

du -ah tests | grep -v "/$" | sort -rh

2.8G	tests
1.4G	tests/operations
654M	tests/epoch_processing
576M	tests/sanity
420M	tests/sanity/blocks
419M	tests/sanity/blocks/sanity_blocks_mainnet.yaml
374M	tests/operations/attestation
372M	tests/operations/attestation/attestation_mainnet.yaml
327M	tests/operations/attester_slashing
326M	tests/operations/attester_slashing/attester_slashing_mainnet.yaml
250M	tests/epoch_processing/justification_and_finalization
249M	tests/operations/deposit
249M	tests/epoch_processing/justification_and_finalization/justification_and_finalization_mainnet.yaml
248M	tests/operations/deposit/deposit_mainnet.yaml
185M	tests/ssz_static/core
185M	tests/ssz_static
171M	tests/operations/proposer_slashing/proposer_slashing_mainnet.yaml
171M	tests/operations/proposer_slashing
158M	tests/ssz_static/core/ssz_mainnet_random.yaml
156M	tests/sanity/slots
156M	tests/operations/voluntary_exit
155M	tests/sanity/slots/sanity_slots_mainnet.yaml
155M	tests/operations/voluntary_exit/voluntary_exit_mainnet.yaml
125M	tests/epoch_processing/final_updates/final_updates_mainnet.yaml
125M	tests/epoch_processing/final_updates
94M	tests/operations/block_header
94M	tests/epoch_processing/slashings
94M	tests/epoch_processing/registry_updates
94M	tests/epoch_processing/crosslinks
93M	tests/operations/block_header/block_header_mainnet.yaml
93M	tests/epoch_processing/slashings/slashings_mainnet.yaml
93M	tests/epoch_processing/registry_updates/registry_updates_mainnet.yaml
93M	tests/epoch_processing/crosslinks/crosslinks_mainnet.yaml
9.9M	tests/ssz_static/core/ssz_minimal_lengthy.yaml
6.6M	tests/ssz_static/core/ssz_minimal_random.yaml
6.3M	tests/ssz_static/core/ssz_minimal_random_chaos.yaml
3.8M	tests/ssz_static/core/ssz_minimal_one.yaml
1.9M	tests/sanity/blocks/sanity_blocks_minimal.yaml
1.7M	tests/operations/transfer/transfer_minimal.yaml
1.7M	tests/operations/transfer
1.5M	tests/operations/attestation/attestation_minimal.yaml
1.2M	tests/operations/attester_slashing/attester_slashing_minimal.yaml
968K	tests/epoch_processing/justification_and_finalization/justification_and_finalization_minimal.yaml
896K	tests/operations/deposit/deposit_minimal.yaml
608K	tests/operations/proposer_slashing/proposer_slashing_minimal.yaml
548K	tests/operations/voluntary_exit/voluntary_exit_minimal.yaml
544K	tests/sanity/slots/sanity_slots_minimal.yaml
520K	tests/genesis
444K	tests/epoch_processing/final_updates/final_updates_minimal.yaml
444K	tests/epoch_processing/crosslinks/crosslinks_minimal.yaml
400K	tests/shuffling
396K	tests/shuffling/core
332K	tests/operations/block_header/block_header_minimal.yaml
328K	tests/epoch_processing/registry_updates/registry_updates_minimal.yaml
324K	tests/epoch_processing/slashings/slashings_minimal.yaml
276K	tests/genesis/validity
272K	tests/genesis/validity/genesis_validity_minimal.yaml
240K	tests/genesis/initialization
236K	tests/genesis/initialization/genesis_initialization_minimal.yaml
212K	tests/ssz_static/core/ssz_minimal_zero.yaml
204K	tests/ssz_static/core/ssz_minimal_max.yaml
200K	tests/ssz_static/core/ssz_minimal_nil.yaml
196K	tests/shuffling/core/shuffling_minimal.yaml
196K	tests/shuffling/core/shuffling_full.yaml
92K	tests/bls
28K	tests/ssz_generic
24K	tests/ssz_generic/uint
24K	tests/bls/sign_msg
20K	tests/bls/sign_msg/sign_msg.yaml
20K	tests/bls/aggregate_sigs
16K	tests/bls/msg_hash_g2_uncompressed
16K	tests/bls/aggregate_sigs/aggregate_sigs.yaml
12K	tests/bls/msg_hash_g2_uncompressed/g2_uncompressed.yaml
12K	tests/bls/msg_hash_g2_compressed
8.0K	tests/ssz_generic/uint/uint_wrong_length.yaml
8.0K	tests/ssz_generic/uint/uint_random.yaml
8.0K	tests/bls/priv_to_pub
8.0K	tests/bls/msg_hash_g2_compressed/g2_compressed.yaml
8.0K	tests/bls/aggregate_pubkeys
4.0K	tests/ssz_generic/uint/uint_bounds.yaml
4.0K	tests/bls/priv_to_pub/priv_to_pub.yaml
4.0K	tests/bls/aggregate_pubkeys/aggregate_pubkeys.yaml

As you can see: mainnet files are the biggest source of trouble. Otherwise the maximum individual file size would be 10 MB (SSZ). Or just 2 MB for a single state transition suite. Compare this to the 419 MB block processing test suite for mainnet.

Solutions in status quo

These are the current solutions, not pretty, but functional:

  1. LFS deals with the large files
    1.1) assumed to run your own docker image in CI already, should be easy to add to the image (even available to Alpine linux)
    1.2) Non-LFS clones are just exactly that, we cannot have these large files in the normal Git system, some of these files are just too large to even consider diffing it.
    1.3) Authentication shouldn't be required for a public repo. It worked in a CI test setting before. But this is a relatively new issue. And it may be forced due to rate limits / settings. Use the gzipped tar in CI for now instead.
  2. The sizes of the files are very large
    2.1) The first X lines can be read, cut at the test_cases: line, and parsed. Not pretty, but the alternative of duplicate data / separate headers is not either. If the files were small, it wouldn't be an issue. (Legacy of early choice for yaml)
    2.2) Loading it fully in memory before processing is bad, even if parsed into states, the mainnet state objects are still big, too much to keep a few hundred of them in memory. Also consider that there are pre and post states. Too much data to deal with at once really.
  3. Similar to cross client communication, a format that works for everyone is difficult. Status quo is to keep it:
    • Simple to implement
    • As readable as possible
    • Generic enough to deal with it in some way or another, even if not likeable.
  4. Configuration is always an issue:
    • Many teams / languages
    • Constants in the spec that may not override as easily.
      • Since there are not nearly as many spec changes anymore after freeze, keeping up manually is effectively more efficient, although "dumb" work.
    • A client needs to deal with their language choice + other configuration anyway
      • Stronger enforcement of the config in practice may not be worth it currently because of it. It will break the workflow of some teams for sure.
      • Loading a yaml from the specs repo, and checking automatically if it matches the client config, may be a good temporary solution.
      • Or a script to convert the yaml file in whatever format is preferred.

Attention points for new solution

  • Need versioning: so LFS may be the only git compatible solution.
  • Need configuration: but copying from spec to tests repo is error prone + duplicate code
  • Need to consider configuration loading: changing constants per test case, or even per suite, is inefficient and/or not supported by some clients. Compile / start time configuration should be ok. E.g. run test suite once with mainnet loaded, and skip of minimal tests. Then vice versa.
  • Need definition efficiency: so filtering/running tests is easy
  • Need storage efficiency: so I do not have to deal with a 2.8 GB clone.
  • Need to consider phase 1: which pushes up storage of a full state another order of magnitude in some cases.
  • Need well structured categorization: going through many different files is hard and inefficient
  • Keep layered categorization: avoid duplicate code for each suite. Instead, generalize one topic as a "runner", and specifics for case formats in "handlers"
  • Need config separation: may want to only use minimal tests in some settings (e.g. quick sanity check during development)
  • Need to consider pyspec: althoug the most tested + reviewed, the pyspec has limits as an super-unoptimized python program. Generation is working, but super slow, especially mainnet.
  • Need to keep BLS flexibility: currently tests are generated with BLS on, but allow for BLS to be off during running, except when the test data says not to, according to format.
  • Need to consider future forking. Current fork-timeline design may not work, or result in new issues. Feedback from clients necessary.
  • Need to consider readability. Someone is going to hit a test case that does not pass, better make it clear and easy to see what was ingested.

Ideas

List of ideas, in no particular order, to think about:

  • Split suite files (collection of test cases) in files
    • IO overhead / more nesting/files
    • Less memory at a time
    • Clear individual test cases
    • Separate header could be good, but also a considerable change from status quo.
  • Compress things
    • Suites could be zip archives of yaml files, each a case
    • Suites could be gzipped yaml files
    • Test cases themselves could be compressed
    • Could use niche (and likely unsupported for some) cross-reference support of yaml, to not duplicate fields. Not readable either, but made for this.
    • Could manually reduce state size, by encoding only part of the data
  • Use partials (no proof data, just indices + contents)
    • High complexity
    • unsupported by all clients currently
    • pyspec may be possible in a week or two, but test-generation may need to adapt for it.
    • nice isolation of fields that change, and fields that don't affect the test.
  • Share configuration by:
    • copying to output
    • moving to separate git repo, and submodule in specs repo + client repos
  • Ignore mainnet
    • Risk of thinking everything is ok with minimal config, but silently not getting mainnet passing.
    • Alleviates all immediate problems in storage size + efficiency
    • Mainnet has considerably different behavior in some cases, due to constants changing. E.g. more than 1 shard per slot being handled.
    • could alleviate Pyspec test generation speed too
  • Build out mainnet tests with Go spec
    • Much higher speed, may generate tests on the fly
    • No BLS support yet
    • In progress 0.8.1 tests passing
    • More time needed to work on test generation implementation
    • Double work with pyspec
      • Contradicting tests may not help clients much, but could find bugs quicker
    • Possibly nice for new testing integrations
      • Not like whiteblock was looking to do, this would be static testing, no networking / dynamic states.
  • Ditch YAML
    • closes early testing legacy
    • possibly more space efficiency
    • need to decide on replacement. SSZ seems strict to define many test formats in, although supported by everyone.
    • readability concerns

Survey

Please answer in a DM on discord or telegram:

  • I will keep even the most controversial suggestions anonymous
  • Will not judge any answer (promise)
  • Just need ideas/improvements, with straight forward motivation, unadjusted.
  • If everyone is holding back, we won't make progress.
  • Do not need to instigate unnecessary conflict either, hence private initially.

And then I will publicly share anonymized aggregated findings (time TBD). And hopefully find some better solution than status quo.

Please consider answering the following questions (answers may be brief/long):

  1. Do all formats work? If not, which of them are a pain, or is it because of delays/late start elsewhere?
  2. Does the full suite of minimal configuration run reasonably?
  3. Does LFS work for you?
  4. Are you using the gzipped tarball of tests/ in release instead of LFS clone?
  5. Do you cache in CI?
  6. Do you like YAML?
  7. Do you need readability in raw data, or does printing ingested information from the test runner work fine?
  8. Is compression difficult to handle in your test loader? What does it entail for you to handle compressed files during test runs?
  9. Do you run minimal tests? And how often / which scenario?
  10. Do you run mainnet tests? And how often / which scenario?
  11. Do you like to see more configurations?
  12. Would you support SSZ or another binary format for testing?
  13. What are your ideas about duplication of subsets of states in tests? Avoid? Compress?
  14. Pick 2 out of 3: readability, amount, quality
  15. Pick 3 out of 9:
    • phase 0 light client tests
    • phase 1 tests
    • phase 1 light client tests
    • fork choice tests
    • rewards/penalties tests
    • more BLS tests
    • validator tests
    • more coverage focused tests for phase0
    • *your suggestion*
  16. Pick 1, for short term shared effort:
    • active testnetwork monitoring (verify transitions live)
    • SSZ and BLS fuzzer executables, non-libfuzzer
    • benchmarks of transition
  17. Pick 1, for middle term shared effort:
    • fuzzing your own state transition
    • chaintests
    • *your suggestion*
  18. Which current idea(s) do you like?
  19. Any no no?
  20. Suggested format to optimize your own use (no answer = indecisive, prefer status quo, okay with it)
  21. Suggested format, if optimizing for everyone (no answer = indecisive, prefer status quo, okay with it)

TLDR

Testing workload and format is a lot to deal with, sharing thoughts + taking survey to make some progress.

@protolambda
Copy link
Collaborator Author

protolambda commented Jul 25, 2019

Survey results:

Surveyed most teams, thank you for the quick and extensive responses!

  • The test-formats (properties, functionality) themselves work. Structure, file format/size, and distribution of in tests is the bigger problem.
  • Minimal config runs well, not fully for some yet.
  • Mainnet runs for some, not for others. Suite file size is a big issue.
  • LFS works and is used by half of the teams. Not everyone caches the files in CI, causing slowdowns (and bandwith bills). The tar-ball alternative works well for the other teams. Good idea to use that in CI.
  • YAML is generally accepted, although teams struggle with size in one way or another (some just need test filtering without reading the full suite, or want more performance)
  • Compression can be handled, but many teams prefer zipping the whole thing, over zipping individual suites or cases. Duplication/storage is not so much a concern, memory usage is.
  • Advanced gzip file streaming is too much, but could generally work. Similarly, leveldb (with Snappy compression) could be used to store cases in, and fetch during testing.
  • Minimal tests not running in CI fully for all yet.
  • Mainnet tests are run in CI by some, even though the memory and being computationally slower. Others cannot handle the size, and/or the mainnet configuration well.
  • Everyone can handle more configurations. Some prefer linking, others load dynamically during runtime. But extra config or two are very welcome.
  • SSZ for testing data is generally well received, as it removes a dependency somewhat: no yaml -> intermediate -> ssz anymore. Just load bytes.
    But others prefer to stick to yaml for readability and/or "it works". Readability is less of a concern if there was a SSZ test viewer.
  • Pick 2 out of 3 preferences: Quality first, then amount, then readability. Everyone had a vote for quality. Readability was a thing to work around: print parsed formatted data during test runs.
  • Pick 3 out of 9: fork choice tests first. Then very mixed and equally chosen preferences, but all phase 0 things.
  • Short term winner: process test network transitions live with executable spec.
  • Middle term winner: fuzzing the state transition. (Some seem biased towards middle term here, as their tests need to run first)
  • Liked ideas: split suites, copy configs to output, fuzzing
  • Keep configuration format, keep yaml for now.
  • suggestions: binary format support, possibly from a DB, file-tree filterable tests, an index file that lists all tests by name and properties (or like a file tree).

Plan:

Split tests into a deeply structured file tree, to filter without memory overhead, at the cost of a bit of disk reading.

File path structure:
tests/<config name>/<fork or phase name>/<test runner name>/<test handler name>/<test suite name>/<test case>/<output part>

<config name>/ -- Configs are upper level. Some clients want to run minimal first, and useful for sanity checks during development too.
                  As a top level dir, it is not duplicated, and the used config can be copied right into this directory as reference.
<fork or phase name>/ -- This would be: "phase0", "transferparty", "phase1", etc. Each introduces new tests, but does not copy tests that do not change. 
                         If you like to test phase 1, you run phase 0 tests, with the configuration that includes phase 1 changes. Out of scope for now however.
<test runner name>/   -- The well known bls/shuffling/ssz_static/operations/epoch_processing/etc. Handlers can change the format, but there is a general target to test.
<test handler name>/  -- Specialization within category. All suites in here will have the same test case format.
<test suite name>/    -- Suites are split up. Suite size does not change memory bounds, and makes lookups of particular tests fast to find and load.
<test case>/          -- Cases are split up too. This enables diffing of parts of the test case, tracking changes per part, while still using LFS. Also enables different formats for some parts.
<output part>         -- E.g. "pre.yaml", "deposit.yaml", "post.yaml".
                         - Diffing a pre.yaml and post.yaml gives you all ther information for testing, good for readability of the change. Then compare the diff to anything that changes the pre state, e.g. "deposit.yaml"
                         - Allows for custom format for some parts of the test. E.g. something encoded in SSZ.
                         - "pre.ssz", "deposit.ssz", "post.ssz" etc. is the next step: place a copy, but in binary format, right next to legacy yaml.
                            Clients can then shift to ssz inputs for efficiency, while we implement a SSZ viewer. 
                            And when that alleviates the readability concern, we can drop the yaml files for state encoding.
                            This also means that some clients can drop their YAML -> JSON/other -> SSZ work-arounds that had 
                            to be implemented to support the uint64 YAML, hex, etc. that is not idiomatic to their language.
                         - We keep yaml for metadata, and non-SSZ things. (E.g. shuffling and BLS tests)

The test case formats itself do not change, the properties are just loaded from multiple files, instead of sub-properties of one file.
For the better, it reduces memory requirements, and makes test case filtering much better.

I support the LevelDB idea too, but versioning is important, and we do not get that with leveldb.
Instead, I recommend clients to use the gzipped tarball to read tests from in their CI setting, or write their own tooling to push the file structure into their own leveldb.
Also, please cache your tarball, it helps performance, and saves us all good amounts of bandwidth (may not be prohibitive costs though).

And instead of zipping the tests/ dir, I think we can zip the individual <config name> directories, so people can keep them separate easily if they want to.

Open questions:

  • Marking metadata. I like essential metadata to live as close as possible to the test. So the two options for the current data are:
    • Small files / file markers:
      • an empty bls_required/bls_ignored file (alike to .gitkeep markers), a simple file system check works: if bls_required.exists(): enable_bls() elif bls_ignored.exists(): disable_bls() else: default_bls_preference()
      • a post.hash file, for optimistic equality checking of the resulting post state.
    • A meta data file, meta.yaml, which lists such properties.
    • I considered bls as file level, but the extra level for each such property is a bit much, and requires a filesystem check just like the empty file marker. And it does not work for data such as the post state root.
  • The tests index file. A yaml file with a tree structure of all tests alleviates the need (and possible errors) in walking the full file tree.
    On the other hand, duplication is error prone too, and a recursive ls or tree call works just as well to generate your own index.

test-viewer

I have a POC based on the SSZ collapsible tree-view I implemented for https://simpleserialize.com (awesome site by Chainsafe, check it out).
The basic proposed functionality is:

Browse to somedomain.io/v0.8.1/tests/phase0/operations/deposits/common/success_top_up/ and get 3 tree views next to each other, annotated with SSZ types: pre, post and the deposit.

However, their JS types are not fully usable for 0.8.1 yet, so it cannot load the ssz types that changed yet.
Diffing the pre.yaml and post.yaml when ingesting pre.ssz and post.ssz should work well enough for debugging for now though.

Also, there is too much tooling to implement, so I will prioritize testing and the testnetwork live verification tool,
over deprecating the yaml. But maybe others like to help with the test viewer? (Chainsafe? @GregTheGreek ?)

Review of attention points

  • versioning: LFS, but do not force on clients, provide tar ball
  • configuration copy to tests
  • config loading: top level change in system, not much switching
  • filter efficiency: file tree gives high filter precision, low memory usage
  • storage efficiency: traded off, teams are concerned with memory more so. Use compressed tar ball if you like
  • phase 1 and beyond: file size is bounded by biggest content type, not multiplied by amount of cases. Files should all be smaller than 5 MB.
  • keep categorization: deeper file structure + index file solve this even better
  • config separation: configs are top level dir change
  • pyspec performance: generation will not be faster (and slower possibly due to SSZ output addition), but memory usage much lower, and no big file writes.
  • Keep BLS flexibility: see open question, solution is generally there.
  • Consider future forks: included in file structure
  • readability: yaml stays for now, is better accessible, and replaced with a SSZ viewer some time in the future.

Config

Understand the concern of not changing the config. What I do want to standardize, is how we handle forks in configs.
Overwriting "constants" is very impractical, and confusing, and requires configs to live next to each other as full copies during forks.

Instead, not many constants change anyway, and forks are considered to be backwards compatible in some basic form (we still need to sync old data, with old code or not).
Also, we would like to add new fork constants in advance, to test functionality in test settings, without configuration overhead.

I propose to (minimally) prefix constants that changed in a fork, and prepare prefixes for constants when we know they are going to change.

Long term, we can rotate forks out when we do not need to sync it anymore (deprecating it essentially), and remove the prefixes from constants that are considered stable.

Prepared change example: P0_MAX_TRANSFERS = 0 (P0 = phase 0), then fork to MAX_TRANSFERS = 16.

Rotation example: PROPOSER_REWARD_QUOTIENT = 8, then fork to XY_PROPOSER_REWARD_QUOTIENT = 16. Then deprecate the old constant as P0_PROPOSER_REWARD_QUOTIENT, and start calling XY_PROPOSER_REWARD_QUOTIENT just PROPOSER_REWARD_QUOTIENT.

Pro: single config, easy config management (no forks), and code can do the switching however it likes.
Con: prefixes (although clearer) are less pretty.

Simple decision, and gets us to transfer testing without complicated configuration or management changes. Sounds good?

For forks, the timeline idea still holds: simple key-value mapping to declare the slots for fork names. We can consider other fork activation later, when we have a good example.

TLDR

Deeper test structure for lower constraints and better filtering. Super small config change to enable forks (and transfer tests without hacks).

@mpetrunic
Copy link
Contributor

I support the LevelDB idea too, but versioning is important, and we do not get that with leveldb.

You could commit leveldb files to git?

@protolambda
Copy link
Collaborator Author

You could commit leveldb files to git?

@mpetrunic We could, but part of the versioning reasoning is to see which tests are new and/or have been changed. And to roll back a test easily if necessary (as client, or as test maintainer). With raw leveldb chunks you do not get that. As an implementer, you can always write a little script to put the files in leveldb, and use the filepath as key. If that works better for you, please go ahead.

@GregTheGreek
Copy link
Contributor

Also, there is too much tooling to implement, so I will prioritize testing and the testnetwork live verification tool,
over deprecating the yaml. But maybe others like to help with the test viewer? (Chainsafe? @GregTheGreek ?)

Let me know what you need

@protolambda
Copy link
Collaborator Author

@GregTheGreek Awesome, thanks. Will implement first things first though (the structure change in test generation). But yes, could use some help in:

  • a published ssz types package for minimal and mainnet config. BeaconState and operations primarily. Current simpleserialize.com still copies the types over.
  • some communication to pull the ssz treeview + lazily evaluated tree parse function I implemented earlier for simpleserialize.com, into its own package. So we can share the code between the two sites (and ideally future improvements).

If I get to implement the testing change this week, we could get this viewer going some time next week maybe. Let's discuss in discord chat some time.

This is the look of the current POC I implemented yesterday (beacon state type not updated, so using the change thing 3x to get a feel for layout):
image
It loads yaml data from a public (CORS enabled) github media endpoint. But could switch it to fetching ssz files, and parsing them with ssz-js.

I could put the otherwise static site in a simple firebase hosting wrapper, to deal with routing of dynamic urls. And then we could link to versioned tests by url using the viewing site, instead of the raw ssz on github.

@JustinDrake
Copy link
Collaborator

Closing 👍

Screenshot 2019-08-20 at 15 53 35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants