Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Flatbuffer format #3989

Merged
merged 50 commits into from
Feb 9, 2024
Merged

Conversation

Sharvani2002
Copy link
Contributor

@Sharvani2002 Sharvani2002 commented Jun 20, 2022

Changes the FB data format to be more performant by avoiding extra level of tables at the feature data level.

Remaining work for CI:

  • Fix flatbuffer conversion tests in DevOps CI
  • WASM - it is sporadic - has succeeded and failed on the same change from inside this PR
  • Coverage - going to avoid trying to test the implementation-specific error outputs for now

command-cmake-build.sh Outdated Show resolved Hide resolved
command-run-flat-test.sh Outdated Show resolved Hide resolved
@olgavrou
Copy link
Collaborator

Closing for now, we can revisit this later, thank you :)

The reason for odd pass/fail behaviour across the CI is that MSVC introduces additional code in debug mode (or fails to apply certain optimizations, I did not fully investigate) which leads it to use a move on return of the test object (introduced when we removed the copy constructor).

Because Linux was optimizing that away in all cases, the test runs in CI (or locally) did not pick up the double-free bug which the previous commit (5deef6b) fixed - an issue that was discovered on Windows[Debug].

The issue now is that the move constructor failed to copy the value of p.
note: the added test is not yet working
In the prototype_namespace_t's validator, we need to carefully iterate the example's features because we need to be in the correct extent. There was an indexing issue where we were using the extent-index in a pure namespace-index manner, which caused us to read beyond the end of the vector, leading to spurious data being loaded.
In order to simulate enough of the parse dispatch loop to get the flatbuffer_parser to read multiple examples, we need to "dispatch" the examples after each pass through the parser, othrwise it is going to attempt to write in place over a single example.

When dispatching was introduced in the tests, it was used to swallow an extraneous newline example (which is used and swallowed internally by the example dispatcher to collect examples into multi_ex for, e.g. cb_adf), and without cleanup it leaks.
@lokitoth lokitoth marked this pull request as ready for review February 6, 2024 16:07
* Expands the example data generation to create larger test scenarios
* Fixes the example dispatch logic to handle dispatching streams of multi_ex examples
* Fix support for unlabeled examples
Copy link
Member

@rajan-chari rajan-chari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The error handling is much better.

* Remove api_status as a CLI flag - this is a liberary affordance; when in CLI mode we should always report errors to the user. Later we may want to consider adding a verbosity selector.
* Remove an unnecessary bitshift in the alignment padding calculation.
* Add explanation of what is going on with the desired_align inputs in the io_buf code.
* finish removing --api_status as a CLI flag
@lokitoth lokitoth merged commit db7f024 into VowpalWabbit:master Feb 9, 2024
116 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants