Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Use foyer for block cache #2431

Merged
merged 7 commits into from
Jul 16, 2024
Merged

Conversation

Ishiihara
Copy link
Contributor

@Ishiihara Ishiihara commented Jul 1, 2024

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR fixes bugs that readers and writers share the cache. The fix is as follows: 1. fork calls get instead of getting data from the read cache 2. commit returns the newly created blocks 3. flush takes the newly created blocks as input.
    • Temporarily disable the shuttle test as there is a compatibility issue between tokio filesystem calls and shuttle runtime. Tokio filesystem call will panic if not wrapped in tokio runtime.
  • New functionality
    • This PR introduces cache with eviction policies. Foyer (https://github.com/MrCroxx/foyer) is used as the cache. In the future, Foyer will be used as a hybrid cache.
    • Introduced cache-related configs to be used by different components.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

Copy link

github-actions bot commented Jul 1, 2024

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

Ishiihara commented Jul 1, 2024

@Ishiihara Ishiihara force-pushed the liquan_block_cache_with_eviction branch 6 times, most recently from aaab13c to b253dcc Compare July 2, 2024 16:27
@Ishiihara Ishiihara marked this pull request as ready for review July 2, 2024 16:46
for i in 0..n {
let key = format!("key{}", i);
let read = block.get::<&str, &str>("prefix", &key);
values_before_flush.push(read.unwrap().to_string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - would prefer if the test here collected the values we generate instead of reading them from the block. avoid bugs where we persist the value wrong but its consistent

@@ -573,6 +629,14 @@ mod tests {
blockfile_provider:
Arrow:
max_block_size_bytes: 16384
block_manager_config:
Copy link
Collaborator

@HammadB HammadB Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note - we will have to remember to update the config in staging when we merge this otherwise staging will crash

str::FromStr,
sync::{atomic::AtomicU32, Arc},
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - no whitespace between use

codetheweb and others added 7 commits July 12, 2024 14:45
*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Makes the block size configurable, leaving default the same as before.
	 - Add Blockfile provider config, removing a todo
	 - Runs config tests in serial, since they can race
 - New functionality
	 - None really

*How are these changes tested?*
Existing tests, this is a non-functional change
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

None
*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Fixes allowed_ids and disallowed_ids to also take care of
updates/deletes/upserts. For e.g. if there is an update on the log that
does not update the embedding and it is in the query list then today we
are never going to return this record even if it is in the top k
    - Adds sync points to test_embeddings + increase test timeout
    - Adds another rule in test_embeddings for compaction
    - Suppresses health check warning for filtering too much
- Fixes the case when trying to commit and flush an empty block (can
happen due to deletes). Sparse index start key can also get changed to
something that is not SparseIndexDelimiter::Start. We decided to go
ahead with flushing a dummy block if blockfile becomes fully empty so
that our segment abstraction is only uninitialized until the first
compaction; post that it is always initialized albeit with empty block
- Fixes a bug in FTS delete document where we were incorrectly panicing
- Fixes a bug in record segment apply materialization where for deletes
and updates we missed writing the max offset id
- Updates to metadata segment were missing updating the document and
were only updating the metadata
- Don't return error from the metadata segment if document is supplied
as None for an update

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

None
*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - Enables s3 retry by default
 - New functionality
	 - None

*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

None
@Ishiihara Ishiihara force-pushed the liquan_block_cache_with_eviction branch from ce5123c to 3bcf445 Compare July 16, 2024 06:34
@Ishiihara Ishiihara merged commit 4644217 into main Jul 16, 2024
66 checks passed
@Ishiihara Ishiihara self-assigned this Jul 16, 2024
let block = block_manager.get(&delta.id).await.unwrap();
// TODO: enable this assertion after the sizing is fixed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO is not needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants