Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add critical section to protect s.currentRev #10975

Merged
merged 1 commit into from
Sep 6, 2019
Merged

Add critical section to protect s.currentRev #10975

merged 1 commit into from
Sep 6, 2019

Conversation

lzhfromustc
Copy link
Contributor

Description
Among 14 read/write operations to s.currentRev, 10 of them are protected by s.revMu.RLock() or s.mu.Lock(), 3 of them are synchronized by using channel revc, but there is one read operation not protect by any Mutex, RWMutex or channel, which is in function store.Hash() at line 174 of file etcd/mvcc/kvstore.go:

etcd/mvcc/kvstore.go

Lines 167 to 175 in 9a2af73

func (s *store) Hash() (hash uint32, revision int64, err error) {
start := time.Now()
s.b.ForceCommit()
h, err := s.b.Hash(DefaultIgnores)
hashSec.Observe(time.Since(start).Seconds())
return h, s.currentRev, err
}

What I did
Add s.revMu.RLock() and defer s.revMu.RUnlock() to protect s.currentRev

Why I did this
In the comment of s.revMu, it is said that s.revMu protects currentRev:

	// revMuLock protects currentRev and compactMainRev.
	// Locked at end of write txn and released after write txn unlock lock.
	// Locked before locking read txn and released after locking.
	revMu sync.RWMutex

Is it possible for this patch to introduce other bugs?
I think we only need to consider whether this patch will introduce double lock bug or not (double RLock can also produce deadlock bug if there is another goroutine).
I detected this bug by my static checker, so I can't provide runtime information to validate my patch. However, the buggy function is store.Hash(), and there is a test function that directly uses store.Hash() without using s.revMu.RLock(). So I think this patch is safe to use.

@codecov-io
Copy link

codecov-io commented Aug 5, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@af19d01). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10975   +/-   ##
=========================================
  Coverage          ?   64.02%           
=========================================
  Files             ?      401           
  Lines             ?    37542           
  Branches          ?        0           
=========================================
  Hits              ?    24037           
  Misses            ?    11877           
  Partials          ?     1628
Impacted Files Coverage Δ
mvcc/kvstore.go 85.57% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af19d01...84316c6. Read the comment docs.

@jingyih
Copy link
Contributor

jingyih commented Aug 5, 2019

This one is tricky. I did not find much description on this function's behavior. So not sure if / when we should lock revMu.

I assume the returned hash and revision should be consistent with each other. This means we need to lock revMu and read currentRev before using it to compute hash. But then this becomes basically identical to HashByRev().

@gyuho Is there documentation on this API? Is this API going to be deprecated in favor of HashByRev()?

@gyuho
Copy link
Contributor

gyuho commented Aug 5, 2019

@jingyih Hash is used for functional testing. And HashByRev is used for peer-to-peer corruption check.

@jingyih
Copy link
Contributor

jingyih commented Aug 5, 2019

I think we should get revision while generating the hash in the backend. The current way of returning store revision at the end of the function call (whether protected by revMu or not) is racy. Writes to store could happen after Hash() but before function returning.

@gyuho Any comment?

@gyuho
Copy link
Contributor

gyuho commented Aug 6, 2019

@jingyih @lzhfromustc
Then acquiring read-lock at the beginning of Hash should be enough to block currentRev writes from *storeTxnWrite.

@jingyih
Copy link
Contributor

jingyih commented Aug 7, 2019

If we want to block writes in Hash() function, we need to hold WRITE lock from the beginning of the function call, which is very expensive. Given that this is only used in functional test, maybe it is OK.

Alternatively we can open a db read transaction and compute hash and store revision from it - very similar to the approach in HashByRev().

@gyuho
Copy link
Contributor

gyuho commented Aug 7, 2019

@jingyih Agree. In our functional tests, we just explicitly stop all clients, so there would be no race. Since it's meant for internal usage, let's keep it as it is now.

mvcc/kvstore.go Outdated
@@ -171,6 +171,8 @@ func (s *store) Hash() (hash uint32, revision int64, err error) {
h, err := s.b.Hash(DefaultIgnores)

hashSec.Observe(time.Since(start).Seconds())
s.revMu.RLock()
defer s.revMu.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, adding lock here is not going to help much. For now, can we just add a TODO here as reminder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read your discussion but I am not familiar with these packages, so it's quite possible that my understanding is wrong.
Do you think the following is appropriate?

// TODO: hash and revision could be inconsistent, due to absence of s.revMu.Lock() 
//  at the beginning of function, which is costly
func (s *store) Hash() (hash uint32, revision int64, err error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either adding s.revMu.RLock() or s.mu.Lock() can fix it. s.revMu.RLock() is likely better. There is a less costly way of fixing this, which is to get revision from the opened backend read transaction in s.b.Hash(). Probably just say:

"TODO: hash and revision could be inconsistent, one possible fix is to add s.revMu.RLock() at the beginning of function, which is costly".

@jingyih
Copy link
Contributor

jingyih commented Aug 16, 2019

BTW could you also update the commit title to use format <package>:<description>

@lzhfromustc
Copy link
Contributor Author

@jingyih Got it. I will commit the change and squash these commits shortly.

@jingyih
Copy link
Contributor

jingyih commented Aug 30, 2019

@lzhfromustc friendly ping:)

@lzhfromustc
Copy link
Contributor Author

@jingyih Sorry!!! I moved back to my university and completely forgot this.
Now I squashed my commit. It should be OK now.

mvcc/kvstore.go Outdated
@@ -164,6 +164,7 @@ func (s *store) compactBarrier(ctx context.Context, ch chan struct{}) {
close(ch)
}

// TODO: hash and revision could be inconsistent, one possible fix is to add s.revMu.RLock() at the beginning of function, which is costly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://travis-ci.com/etcd-io/etcd/jobs/229792898#L564

goword is suggesting to use 'Hash'.

@jingyih
Copy link
Contributor

jingyih commented Sep 6, 2019

Sorry for all the back and forth. I misread earlier. goword checker actually demands the starting of the description to match the function name, which means it does not want the comment we are adding to start with word "TODO". We might want to move the comment inside the function.

Also, could you change the commit title to use format <package>:<description>.

Thanks for being patient:)

@lzhfromustc
Copy link
Contributor Author

@jingyih No problem at all. Actually this informs me how format is checked in big open-source projects.
I think this time the format is OK, since the fmt checker has been received

@jingyih
Copy link
Contributor

jingyih commented Sep 6, 2019

The commit title needs to be something like mvcc: add TODO to protect currentRev in Hash

@lzhfromustc
Copy link
Contributor Author

My bad! I mended it now

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Functional test failure is unrelated.

@jingyih jingyih merged commit 88d998b into etcd-io:master Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants