Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip expired object while using DBWithTtl #403

Merged
merged 1 commit into from
Apr 4, 2023
Merged

Conversation

ofriedma
Copy link
Contributor

@ofriedma ofriedma commented Feb 15, 2023

Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394 , #417

@udi-speedb
Copy link
Contributor

Does the partial / hybrid support (Get / MultiGet strict ttl while iterator / merge non-strict ttl) make sense for users?
I understand that compaction is a background process and, therefore, the user has no way of knowing when the keys would be compacted (and consequently removed), but in this PR a user may issue a Get() -> No key, Iter() -> Key returns, Get() -> No key, which will not happen in non-strict ttl.

@ofriedma ofriedma force-pushed the NoRetExpiredObjs branch 3 times, most recently from 3ddffa1 to 1b0add1 Compare February 16, 2023 10:27
@udi-speedb
Copy link
Contributor

Please add the issue's / pr's number at the end of the commit message

examples/speedb_with_ttl_example.cc Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Outdated Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Outdated Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Show resolved Hide resolved
examples/Makefile Outdated Show resolved Hide resolved
examples/Makefile Outdated Show resolved Hide resolved
utilities/ttl/db_ttl_impl.cc Outdated Show resolved Hide resolved
utilities/ttl/ttl_test.cc Show resolved Hide resolved
ASSERT_TRUE(values[0].empty());
CloseTtl();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above, but also:

  1. Please also test the mutli-get versions that accept a single cf and multiple cf-s
    I suggest adding another cf, but have the 2 cf-s have different ttl-s and verifying that each cf complies with its ttl

@ofriedma ofriedma force-pushed the NoRetExpiredObjs branch 2 times, most recently from 88cc532 to 9997ce2 Compare February 22, 2023 11:47
@ofriedma ofriedma force-pushed the NoRetExpiredObjs branch 2 times, most recently from b81f150 to 55f9d83 Compare February 23, 2023 14:51
@udi-speedb
Copy link
Contributor

Please add the PR's number (#403) at the end of the commit message.

examples/.gitignore Outdated Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Outdated Show resolved Hide resolved
examples/speedb_with_ttl_example.cc Show resolved Hide resolved
statuses = db->MultiGet(ropts, keys, &values);
for (const auto& i : statuses) {
if (s.IsNotFound()) {
std::cout << "Key has been expired as expected by MultiGet" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to cout the key's contents that has been expired

statuses[i] = StripTS(&(*values)[i]);
} else {
(*values)[i] = "";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest factoring out the duplicate code (used in both Get and here) and add a helper function that accepts value and a cf handle and returns whether or not that value is stale. Basically an IsStale() but with a cf handle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

@@ -514,7 +527,25 @@ std::vector<Status> DBWithTTLImpl::MultiGet(
if (!statuses[i].ok()) {
continue;
}
statuses[i] = StripTS(&(*values)[i]);
if (options.skip_expired_data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move line 521 (is_stale = false) to here, right above the if (options.skip_expired_data). This is where it is actually used and make it clearer that it is re set / calculated (correctly) in each loop iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

Options cfopts = GetOptions(column_family);
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>(
cfopts.compaction_filter_factory);
int32_t ttl = filter->GetTtl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also suggesting adding a second helper function (that may be used by the 1st I have suggested above) to extract_ttl(cf handle, cf opts)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

utilities/ttl/db_ttl_impl.h Show resolved Hide resolved
utilities/ttl/db_ttl_impl.h Outdated Show resolved Hide resolved
utilities/ttl/db_ttl_impl.h Show resolved Hide resolved
@udi-speedb
Copy link
Contributor

udi-speedb commented Mar 1, 2023

@ofriedma - I have submitted all of my comments except for the ttl_test.cc file
I will add them in the next review cycle, after you update the existing files following my comment.

@ofriedma ofriedma requested a review from udi-speedb March 1, 2023 15:02
ofriedma pushed a commit that referenced this pull request Mar 1, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394

Pull Request: #403
udi-speedb
udi-speedb previously approved these changes Mar 5, 2023
@udi-speedb
Copy link
Contributor

@ofriedma - Please add the issue's number to the commit message. Thanks

ofriedma pushed a commit that referenced this pull request Mar 5, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394

Pull Request: #403
ofriedma pushed a commit that referenced this pull request Mar 5, 2023
Add support for db_bench to use DBWithTTL::Open() just like db_stress.
Add support for using skip_expired_data

Resolves: #417
Pull Request: #403
@ofriedma ofriedma changed the title Skip expired object when using DBWithTtl Skip expired object while using DBWithTtl Mar 5, 2023
@udi-speedb
Copy link
Contributor

What about ttl support in db_stress?
Does it exist? If not, is it possible to add it?

DEFINE_int32(ttl, -1,
"Opens the db with this ttl value if this is not -1. "
"Carefully specify a large value such that verifications on "
"deleted values don't fail");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If ttl <= 0 the ttl is useless. So, better document this and also validate (below, where applicable)
  • How will the user know how to "Carefully specify a large value"? What is a large value?
  • You should sanitize (validate) these 2 flags. For example, it doesn't make sense to skip expired data if ttl <= 0.

Copy link
Contributor Author

@ofriedma ofriedma Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wanted to keep the same documentation as the db_stress (this is a copy of line by line).
I will try to change it to be better understandable here and I will sanitise those.

}
db->cfh.resize(1);
db->num_created = 1;
db->num_hot = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why are you not supporting ttl / db with ttl when there are multiple column families?
  • Why no ttl db when in read only?
  • If it is indeed the case, you should add it to the sanitization part. IMO invalid combinations should cause db_bench to abort and not run as the user has explicitly requested for specific behaviour and db_bench is not testing the desired behaviour.
  • And, if this code stays, you should refactor somehow the duplicate code in this case and the one below (the else block)

{(int32_t)FLAGS_ttl_seconds});
if (s.ok()) {
db->db = dbttl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Open fails (!s.ok()) we exit below, but, for better readability IMO, I would execute the 3 lines below only if s.ok().

ofriedma pushed a commit to ofriedma/speedb that referenced this pull request Mar 15, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: speedb-io#394

Pull Request: speedb-io#403
ofriedma pushed a commit that referenced this pull request Mar 21, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394

Pull Request: #403
ofriedma pushed a commit that referenced this pull request Mar 21, 2023
Add support for db_bench to use DBWithTTL::Open() just like db_stress.
Add support for using skip_expired_data
Fix bugs in db_stress

Resolves: #417
Pull Request: #403
@ofriedma ofriedma requested a review from ayulas March 21, 2023 17:32
@udi-speedb
Copy link
Contributor

@ofriedma - When you have made up your mind about the ttl support / changes in db_stress please update (following our discussion yesterday). Thanks

@ofriedma
Copy link
Contributor Author

ofriedma commented Mar 22, 2023

@udi-speedb about your assumption, it was correct, small ttl is causing failure of the test, which means the test has not been suitable to handle expiration and deletion by db_stress, probably never, I'm getting not found error whether with or without skip_expired_data when use small ttl (--ttl=2).

@udi-speedb what do you think we should do in that case?

@udi-speedb
Copy link
Contributor

@udi-speedb about your assumption, it was correct, small ttl is causing failure of the test, which means the test has not been suitable to handle expiration and deletion by db_stress, probably never, I'm getting not found error whether with or without skip_expired_data when use small ttl (--ttl=2).

@udi-speedb what do you think we should do in that case?

Thanks for checking this.
I see 2 options:

  1. Leave db_stress as is. It wasn't designed to actually test the ttl feature and, I agree that it was probably never tested by FB with ttl. That means we are not really testing our updated TTL feature except via unit tests.
  2. Invest in adding ttl debug support. Probably requires being able to know which keys have expired (either via compaction or simply by being too old and not returned via the Get / iterator's retrieval) expired keys. Also requires careful sync between the db_stress code that compares the queried values to the expired values (potential races with compaction).

@ofriedma
Copy link
Contributor Author

The pull request is ready to be reviewed, will squash it once the review will over.

Thank you

@udi-speedb
Copy link
Contributor

If db_stress will fail if a user is really trying to run it with a reasonable ttl, why not just not touch it at all?

@@ -682,6 +682,12 @@ void StressTest::OperateDb(ThreadState* thread) {
read_opts.async_io = FLAGS_async_io;
read_opts.adaptive_readahead = FLAGS_adaptive_readahead;
read_opts.readahead_size = FLAGS_readahead_size;
if (gflags::GetCommandLineFlagInfoOrDie("ttl").is_default &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does changing the ttl to 0 (non default) mean with respect to the skip_expired_data flag and should it be included in this check and error reporting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add a check for the ttl with skip_expired_data to be greater than 0

@@ -514,7 +527,25 @@ std::vector<Status> DBWithTTLImpl::MultiGet(
if (!statuses[i].ok()) {
continue;
}
statuses[i] = StripTS(&(*values)[i]);
if (options.skip_expired_data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

statuses[i] = StripTS(&(*values)[i]);
} else {
(*values)[i] = "";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

Options cfopts = GetOptions(column_family);
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>(
cfopts.compaction_filter_factory);
int32_t ttl = filter->GetTtl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment

@@ -86,6 +86,10 @@ class DBWithTTLImpl : public DBWithTTL {

static bool IsStale(const Slice& value, int32_t ttl, SystemClock* clock);

// IsStale for strict ttl
bool IsStale(const Slice& value, ColumnFamilyHandle* column_family,
const ReadOptions& options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 IsStale methods are confusing IMO. I suggest having a common IsStale that accepts a time to which to compare, and does all the common functionality and 2 IsStaleXXX where XXX indicates in what manner is it stale.

std::string value;
ASSERT_OK(db_ttl_->Put(WriteOptions(), key_1, put_value));
ropts.snapshot = db_ttl_->GetSnapshot();
ASSERT_NE(ropts.snapshot, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the gtest:

using ::testing::NotNull;
ASSERT_THAT(ptr, NotNull());

Which also supports smart pointers and provides descriptive errors

env_->Sleep(2);
ASSERT_OK(db_ttl_->CompactRange(CompactRangeOptions(), nullptr, nullptr));
// TODO prevent from ttl compaction to delete keys referenced by snapshot
// ASSERT_OK(db_ttl_->Get(ropts, key_1, &value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test currently a placeholder for a time in which we properly support ttl compaction?
If it is, it might be preferable to have it disables and definitely document that before the test

: opts.env->GetSystemClock().get();
return IsStale(value, ttl, clock);
} else {
int64_t snapshot_time = options.snapshot->GetUnixTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ttl <= 0 you may just avoid doing anything.
And, is it in any way evn-specific (Unix Time)? Should it run correctly on Windows?

Options opts = GetOptions(column_family);
auto filter = std::static_pointer_cast<TtlCompactionFilterFactory>(
opts.compaction_filter_factory);
int32_t ttl = filter->GetTtl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be preferable to just return at the top when ttl <= 0 and save any time spent below in any case

std::string key_1 = "a";
std::string put_value = "1";
ASSERT_OK(DBWithTTL::Open(options, dbname_, &db_ttl_, 1));
auto ropts = ReadOptions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is skip_expired_data left false on purpose?
Do you expect the test's results to depend on this?
Regardless, worth documenting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@udi-speedb
I've added documentation and comments, change the name of the strict ttl added a condition to ttl <= 0.
The last thing I need to test is the windows environment will run make check on top of Windows tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Please check with @maxb-io the status of building on Windows. He is very familiar with this subject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran ttl_test on windows, ran without issues

@ofriedma ofriedma requested a review from udi-speedb April 2, 2023 16:15
udi-speedb
udi-speedb previously approved these changes Apr 2, 2023
ofriedma pushed a commit that referenced this pull request Apr 3, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394, #417

Pull Request: #403
@ofriedma
Copy link
Contributor Author

ofriedma commented Apr 3, 2023

@udi-speedb for a reason i can't understand the squash caused your review to be cancelled, the code hasn't changed.

@ofriedma ofriedma requested a review from udi-speedb April 3, 2023 14:12
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support of for Merge
operation.
For Read Only and iterator pinned data the skip_expired_data will be enforced and expired keys
will not be returned.

Resolves: #394, #417

Pull Request: #403
@ofriedma ofriedma removed the request for review from ayulas April 3, 2023 14:45
@ofriedma
Copy link
Contributor Author

ofriedma commented Apr 3, 2023

@udi-speedb please approve

@Yuval-Ariel Yuval-Ariel merged commit df25119 into main Apr 4, 2023
Yuval-Ariel pushed a commit that referenced this pull request May 2, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support for Merge operations.
For Read Only and iterator pinned data the skip_expired_data will be
enforced and expired keys will not be returned.

Resolves: #394, #417

Pull Request: #403
Yuval-Ariel pushed a commit that referenced this pull request May 4, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support for Merge operations.
For Read Only and iterator pinned data the skip_expired_data will be
enforced and expired keys will not be returned.

Resolves: #394, #417

Pull Request: #403
@Yuval-Ariel Yuval-Ariel deleted the NoRetExpiredObjs branch May 11, 2023 08:26
udi-speedb pushed a commit that referenced this pull request Nov 13, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support for Merge operations.
For Read Only and iterator pinned data the skip_expired_data will be
enforced and expired keys will not be returned.

Resolves: #394, #417

Pull Request: #403
udi-speedb pushed a commit that referenced this pull request Nov 15, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support for Merge operations.
For Read Only and iterator pinned data the skip_expired_data will be
enforced and expired keys will not be returned.

Resolves: #394, #417

Pull Request: #403
udi-speedb pushed a commit that referenced this pull request Dec 4, 2023
Currently even if objects have been expired we still return them until
compaction delete those objects.

This commit adds a skip_expired_data to the ReadOptions, if this option
is true, it Get, MultiGet and iterator will not return objects that have been
expired by the configured ttl.
Currently this feature does not include support for Merge operations.
For Read Only and iterator pinned data the skip_expired_data will be
enforced and expired keys will not be returned.

Resolves: #394, #417

Pull Request: #403
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TTL - Don't return expired objects
3 participants