-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add critical section to protect s.currentRev #10975
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10975 +/- ##
=========================================
Coverage ? 64.02%
=========================================
Files ? 401
Lines ? 37542
Branches ? 0
=========================================
Hits ? 24037
Misses ? 11877
Partials ? 1628
Continue to review full report at Codecov.
|
This one is tricky. I did not find much description on this function's behavior. So not sure if / when we should lock I assume the returned hash and revision should be consistent with each other. This means we need to lock @gyuho Is there documentation on this API? Is this API going to be deprecated in favor of HashByRev()? |
@jingyih |
I think we should get revision while generating the hash in the backend. The current way of returning store revision at the end of the function call (whether protected by revMu or not) is racy. Writes to store could happen after Hash() but before function returning. @gyuho Any comment? |
@jingyih @lzhfromustc |
If we want to block writes in Hash() function, we need to hold WRITE lock from the beginning of the function call, which is very expensive. Given that this is only used in functional test, maybe it is OK. Alternatively we can open a db read transaction and compute hash and store revision from it - very similar to the approach in HashByRev(). |
@jingyih Agree. In our functional tests, we just explicitly stop all clients, so there would be no race. Since it's meant for internal usage, let's keep it as it is now. |
mvcc/kvstore.go
Outdated
@@ -171,6 +171,8 @@ func (s *store) Hash() (hash uint32, revision int64, err error) { | |||
h, err := s.b.Hash(DefaultIgnores) | |||
|
|||
hashSec.Observe(time.Since(start).Seconds()) | |||
s.revMu.RLock() | |||
defer s.revMu.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, adding lock here is not going to help much. For now, can we just add a TODO here as reminder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read your discussion but I am not familiar with these packages, so it's quite possible that my understanding is wrong.
Do you think the following is appropriate?
// TODO: hash and revision could be inconsistent, due to absence of s.revMu.Lock()
// at the beginning of function, which is costly
func (s *store) Hash() (hash uint32, revision int64, err error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either adding s.revMu.RLock()
or s.mu.Lock()
can fix it. s.revMu.RLock()
is likely better. There is a less costly way of fixing this, which is to get revision from the opened backend read transaction in s.b.Hash(). Probably just say:
"TODO: hash and revision could be inconsistent, one possible fix is to add s.revMu.RLock() at the beginning of function, which is costly".
BTW could you also update the commit title to use format |
@jingyih Got it. I will commit the change and squash these commits shortly. |
@lzhfromustc friendly ping:) |
@jingyih Sorry!!! I moved back to my university and completely forgot this. |
mvcc/kvstore.go
Outdated
@@ -164,6 +164,7 @@ func (s *store) compactBarrier(ctx context.Context, ch chan struct{}) { | |||
close(ch) | |||
} | |||
|
|||
// TODO: hash and revision could be inconsistent, one possible fix is to add s.revMu.RLock() at the beginning of function, which is costly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://travis-ci.com/etcd-io/etcd/jobs/229792898#L564
goword is suggesting to use 'Hash'.
Sorry for all the back and forth. I misread earlier. Also, could you change the commit title to use format Thanks for being patient:) |
@jingyih No problem at all. Actually this informs me how format is checked in big open-source projects. |
The commit title needs to be something like |
…e of currentRev and suggest feasible fix
My bad! I mended it now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Functional test failure is unrelated.
Description
Among 14 read/write operations to s.currentRev, 10 of them are protected by
s.revMu.RLock()
ors.mu.Lock()
, 3 of them are synchronized by using channelrevc
, but there is one read operation not protect by any Mutex, RWMutex or channel, which is in functionstore.Hash()
at line 174 of file etcd/mvcc/kvstore.go:etcd/mvcc/kvstore.go
Lines 167 to 175 in 9a2af73
What I did
Add
s.revMu.RLock()
anddefer s.revMu.RUnlock()
to protect s.currentRevWhy I did this
In the comment of s.revMu, it is said that s.revMu protects currentRev:
Is it possible for this patch to introduce other bugs?
I think we only need to consider whether this patch will introduce double lock bug or not (double RLock can also produce deadlock bug if there is another goroutine).
I detected this bug by my static checker, so I can't provide runtime information to validate my patch. However, the buggy function is
store.Hash()
, and there is a test function that directly usesstore.Hash()
without usings.revMu.RLock()
. So I think this patch is safe to use.