Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

domain: close slow query channel after closing session pool #7847

Merged
merged 5 commits into from
Oct 11, 2018

Conversation

tiancaiamao
Copy link
Contributor

What problem does this PR solve?

If slow query channel is closed before session pool, some session's goroutine may still
writing to the channel. Writing to a closed channel would cause TiDB panic.

image

What is changed and how it works?

  • Change the close order, close slow query channel after closing session pool.
  • Drain the channel data before close the channel.
  • Assume that after domain.Close() is called, there will be no more session executing.

Check List

Related changes

  • Need to cherry-pick to the release branch

If slow query channel is closed before session pool, some session's goroutine may still
writing to the channel. Writing to a closed channel would cause TiDB panic.
@tiancaiamao tiancaiamao added the type/bug-fix This PR fixes a bug. label Oct 9, 2018
@tiancaiamao
Copy link
Contributor Author

PTAL @crazycs520 @winkyao @coocood

do.sysSessionPool.Close()
do.slowQuery.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to put “do.slowQuery.Close()” after line462?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do.slowQuery.Close() will make goroutine exit and WaitGroup count -1.
If this line is moved to line 462,do.wg.Wait() would wait forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it's useless. After this function is executed, "updateStatsWorker" may still be called by another goroutine. This problem still exists.
Maybe we can replace "ctx"(the updateStatsWorker's argument) with the session created in the sysSessionPool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean after session pool is closed, updateStatsWorker will be still executing ? @zimulala

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye. Because the session used in statistic bootstrap is not get from session pool. @tiancaiamao

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current session pool implementation will block waiting resource to be put back, so after the pool is closed, there won't be system session running. @crazycs520 @zimulala

done := false
for !done {
select {
case <-q.ch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to drain the channel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


mu struct {
sync.RWMutex
closed bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use atomic instead of mutex.

Copy link
Contributor Author

@tiancaiamao tiancaiamao Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomic can't protect multiple operations. @crazycs520

modify atomic flag
close(ch)

check atomic flag
ch <-

Take this order for example:

  1. check atomic flag = success
  2. modify atomic flag = closed
  3. close(ch)
  4. <-ch panic!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great~

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520
Copy link
Contributor

@zimulala PTAL

@crazycs520 crazycs520 added the status/LGT2 Indicates that a PR has LGTM 2. label Oct 11, 2018
@tiancaiamao
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2. type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants