-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed signature support to handle Delta-persistent objects #271
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The "logic error in PersistentRegistry" warning gets printed in Cascade whenever an RPC update changes only a non-signed field, and not the signed field. This seems to indicate that the sign() method is being called on a version that does not exist in the signed field's log, and is trying to add a signature to this nonexistent version. I'm trying to reproduce this bug in Derecho's signed_log_test so I can fix it here.
It turns out the error only shows up when the signed persistent field is also a DeltaSupport field, meaning it's possible for it to create a version with no data (because the delta is empty). Regular non-Delta signed fields always create a version with their current state even when a non-signed field originates the makeVersion call, so mixing regular signed and non-signed fields doesn't create a log with gaps.
As identified by the new signed_log_test, Delta-supporting signed fields create a problem for the sign() method because they can end up saving no data to the log when makeVersion is called. In particular, if makeVersion is called when only non-signed fields were updated, the non-signed fields will get a log entry at that version, but the Delta-suporting signed field will just advance its current version without creating a log entry (since there is no Delta data). Then when PersistentRegistry::sign() is called, getMinimumVersionAfter() will return the new version that is in the non-signed log as cur_nonempty_version, but updateSignature won't sign any data for this version because it doesn't exist in the signed field's log. This results in the "logic error" message because no data was signed for cur_nonempty_version. The most direct solution is to simply write a new version of getMinimumVersionAfter() that's just for sign(), called getNextSignedVersion(), which only calls getNextVersionOf() on fields that have signatures enabled. The getNextVersionOf() method already correctly skips ahead to the next version with nonempty log data for signed fields, and if no non-signed fields get to return their versions, the overall minimum must be a version that actually has data in the log of a signed field. Fixing the same problem in handle_verify_request also involved fixing a longstanding performance flaw. Instead of doing a public-key verification on the signature posted to the SST by another replica, we should just compare that signature to the signature in the local replica's log on the same version - they should be bytewise identical since both replicas have the same signing key. With this change made, we should also ensure that we only do the equality check if both replicas actually have a signature for the requested version; a version that is in the persistent log but not in any signed fields doesn't need to be verified. Finally, this commit removes noexcept(true) from the PersistLog constructor because the from_pem_private method (called to determine the signature length) can throw an exception if the PEM file is not found.
Now that the signed_num can be different from the persisted_num (smaller, if the current persisted_num has no signed data), it no longer makes sense to use the single return value from Replicated::persist as the new version that is "signed and persisted" - a single persist request may result in a different persisted_num and signed_num. PersistenceManager was the only part of the code calling the Replicated::persist method, so I refactored it into separate "persist" and "sign" methods that PersistenceManager calls in a loop, replacing the sign-and-persist loop that used to be within Replicated::persist. This allows PersistenceManger to learn both the latest persisted version and the latest signed version, and update persisted_num and signed_num separately. While doing this, I re-arranged replicated.hpp and added some comments to more clearly document that some of the "public" methods are in fact only used by internal Derecho components and should not be relied on as a stable API. Also updated signed_log_test to correctly detect the end of the test, now that the last persisted version might not equal the last signed version - if the last update to MixedFieldObject is to the unsigned field, the version resulting from that update will never be signed.
I noticed that a few internal files are still including other Derecho headers with the "system" (angle-brackets) syntax instead of the "local" (quotes) syntax. Library files should always include other library files with the "local" syntax. Putting this in a separate commit since it's unrelated to my other work on sign() and persist().
Following a discussion with Weijia where we confirmed that the signatures system was the only part of Derecho that used getMinimumLatestVersion, I decided to change this method to return the current "in-memory" version of the Persistent fields as reported by the currMetaHeader version, rather than the latest version of the Persistent fields that corresponds to a log entry. I renamed the method to getCurrentVersion to reflect this new meaning, and added a getCurrentVersion method to the PersistLog API that would get the meta-header version. The original getLatestVersion still exists and has the same meaning, since the latest version in a log entry is still a piece of information someone might want to know.
Since the sign() method, like the persist() method, can end up signing a later version than the one requested (due to batching), it's possible for a later persistence request to call sign() only to find that an earlier persistence request already signed the latest version - in other words, there's nothing for PersistentRegistry::sign() to do because the latest version is already signed. In this case, PersistentRegistry::sign() would return lastSignedVersion without placing any data into the signature_buffer, because the main while loop would do nothing. This would cause PersistenceManager to get an empty buffer as the "latest signature" and then place that empty buffer in the SST's signature field. I added a special case to PersistentRegistry::sign() to make it return the cached latest signature if the method is a no-op. I also changed handle_persist_request to check if signed_version has actually advanced before updating the SST, to avoid redundantly re-copying the latest version's signature to the SST if it was already placed there by a previous persistence request. Also changed handle_verify_request to check the local node's version of signed_num in the SST to determine if there's a local signature to check against when verifying another node's signature. This is more reliable than trying to retrieve the local signature and testing for failure, because get_signature can "succeed" but return a signature of all zeroes if the desired log entry has been created but not yet signed.
Now that the signatures are verifying correctly, I'll stop dumping the entire hex value of each signature into the log. Even at the trace level that's a bit much for anything other than finding this particular bug.
This seems to work and produces no warnings, which means we no longer need the "version" parameter from PersistenceManager as even a "hint" to the Replicated::sign() method - it can just use getCurrentVersion().
The "version" argument to the various persist() methods actually didn't do anything because FilePersistLog's persist() ignored the argument and always persisted the latest version. After discussion with Weijia, I rewrote FilePersistLog::persist() to actually use the argument to limit the range of data it persists, but also to make the argument optional and revert to the old behavior if it is not provided. This means callers that want the batching behavior don't have to guess at the correct "latest" version to provide as an argument (they can just provide std::nullopt), and callers that want only a certain version persisted will not end up persisting a later version than they wanted. Most of the "code changes" in FilePersistLog::persist() come from me manually inlining the NEXT_LOG_ENTRY, NEXT_LOG_ENTRY_PERS, and NEXT_DATA macros so that I could figure out what they were doing. The substantial change is to replace all references to m_currMetaHeader in those macros with shadow_header, which is a modified copy of m_currMetaHeader. If the latest_version argument is provided, shadow_header is modified to "point" to an earlier log entry, namely the one corresponding to latest_version. If latest_version is not provided, shadow_header equals m_currMetaHeader and the function should behave exactly as before. Either way, shadow_header becomes the next m_persMetaHeader after the data is flushed, so the m_persMetaHeader accurately reflects which log entries were persisted.
Interleaving persistence and verification requests in a single queue processed by a single thread makes it harder for the thread to skip ahead and discard obsolete persistence requests when the persisted number advances due to batching. There's also no good reason for these requests to be handled by the same thread, since they are independent of one another (and verification has very little to do with perisistence).
The handle_verify_request method should also return early if the requested version has already been verified, like handle_persist_request.
When I refactored persist() to have an optional version parameter (instead of a required one that was ignored), I changed the instances where it was called with "the latest version" to call it with nullopt instead, but I missed this one.
Technically these includes are redundant since the bottom of the header file includes the _impl file, but this makes VSCode's parser happy and we do the same thing in group_impl.hpp.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My original implementation of signed persistent logs did not properly handle Delta-supporting Persistent fields in a few edge cases. Specifically, if a single Derecho Replicated object contains both a non-signed Persistent field and a signed Persistent field that has DeltaSupport enabled, it's possible for PersistentRegistry::sign() to produce the warning "Logic error: Version X was returned by getMinimumVersionAfter(), but it did not exist in any field" and then a subsequent handle_verify_request to produce the warning "Verification of version X failed!" because it attempts to verify an empty signature.
This happens when makeVersion is called on the Replicated object when only the non-signed Persistent field was updated, which means the non-signed field creates a log entry but the Delta-supporting signed field just advances its current version without creating a log entry (since there is no Delta data). Then in PersistentRegistry::sign(), getMinimumVersionAfter() will return the new version that is in the non-signed log as cur_nonempty_version, but updateSignature won't sign any data for this version because it doesn't exist in the signed field's log. Making things even worse, getMinimumLatestVersion() can't be used to properly detect the "current version" in this situation, because the getLatestVersion() method only returns the latest version that is in a log entry, not the current version recorded in the meta-header file (which is what's updated when a Delta-supporting object advances its current version).
I fixed this with several changes: