Skip to content

Commit

Permalink
Rollup merge of rust-lang#65719 - pitdicker:refactor_sync_once, r=Ama…
Browse files Browse the repository at this point in the history
…nieu

Refactor sync::Once

`std::sync::Once` contains some tricky code to park and unpark waiting threads. [once_cell](https://github.com/matklad/once_cell) has very similar code copied from here. I tried to add more comments and refactor the code to make it more readable (at least in my opinion). My PR to `once_cell` was rejected, because it is an advantage to remain close to the implementation in std, and because I made a mess of the atomic orderings. So now a PR here, with similar changes to `std::sync::Once`!

The initial goal was to see if there is some way to detect reentrant initialization instead of deadlocking. No luck there yet, but you first have to understand and document the complexities of the existing code :smile:.

*Maybe not this entire PR will be acceptable, but I hope at least some of the commits can be useful.*

Individual commits:

#### Rename state to state_and_queue
Just a more accurate description, although a bit wordy. It helped me on a first read through the code, where before `state` was use to encode a pointer in to nodes of a linked list.

#### Simplify loop conditions in RUNNING and add comments
In the relevant loop there are two things to be careful about:
- make sure not to enqueue the current thread only while still RUNNING, otherwise we will never be woken up (the status may have changed while trying to enqueue this thread).
- pick up if another thread just replaced the head of the linked list.

Because the first check was part of the condition of the while loop, the rest of the parking code also had to live in that loop. It took me a while to get the subtlety here, and it should now be clearer.

Also call out that we really have to wait until signaled, otherwise we leave a dangling reference.

#### Don't mutate waiter nodes
Previously while waking up other threads the managing thread would `take()` out the `Thread` struct and use that to unpark the other thread. It is just as easy to clone it, just 24 bytes. This way `Waiter.thread` does not need an `Option`, `Waiter.next` does not need to be a mutable pointer, and there is less data that needs to be synchronised by later atomic operations.

#### Turn Finish into WaiterQueue
In my opinion these changes make it just a bit more clear what is going on with the thread parking stuff.

#### Move thread parking to a seperate function
Maybe controversial, but with this last commit all the thread parking stuff has a reasonably clean seperation from the state changes in `Once`. This is arguably the trickier part of `Once`, compared to the loop in `call_inner`. It may make it easier to reuse parts of this code (see rust-lang/rfcs#2788 (comment)). Not sure if that ever becomes a reality though.

#### Reduce the amount of comments in call_inner
With the changes from the previous commits, the code pretty much speaks for itself, and the amount of comments is hurting readability a bit.

#### Use more precise atomic orderings
Now the hard one. This is the one change that is not anything but a pure refactor or change of comments.

I have a dislike for using `SeqCst` everywhere, because it hides what the atomics are supposed to do. the rationale was:
> This cold path uses SeqCst consistently because the performance difference really does not matter there, and SeqCst minimizes the chances of something going wrong.

But in my opinion, having the proper orderings and some explanation helps to understand what is going on. My rationale for the used orderings (also included as comment):

When running `Once` we deal with multiple atomics: `Once.state_and_queue` and an unknown number of `Waiter.signaled`.
* `state_and_queue` is used (1) as a state flag, (2) for synchronizing the data that is the result of the `Once`, and (3) for synchronizing `Waiter` nodes.
    - At the end of the `call_inner` function we have to make sure the result of the `Once` is acquired. So every load which can be the only one to load COMPLETED must have at least Acquire ordering, which means all three of them.
    - `WaiterQueue::Drop` is the only place that may store COMPLETED, and must do so with Release ordering to make result available.
    - `wait` inserts `Waiter` nodes as a pointer in `state_and_queue`, and needs to make the nodes available with Release ordering. The load in its `compare_and_swap` can be Relaxed because it only has to compare the atomic, not to read other data.
    - `WaiterQueue::Drop` must see the `Waiter` nodes, so it must load `state_and_queue` with Acquire ordering.
    - There is just one store where `state_and_queue` is used only as a state flag, without having to synchronize data: switching the state from INCOMPLETE to RUNNING in `call_inner`. This store can be Relaxed, but the read has to be Acquire because of the requirements mentioned above.
* `Waiter.signaled` is both used as a flag, and to protect a field with interior mutability in `Waiter`. `Waiter.thread` is changed in `WaiterQueue::Drop` which then sets `signaled` with Release ordering. After `wait` loads `signaled` with Acquire and sees it is true, it needs to see the changes to drop the `Waiter` struct correctly.
* There is one place where the two atomics `Once.state_and_queue` and `Waiter.signaled` come together, and might be reordered by the compiler or processor. Because both use Aquire ordering such a reordering is not allowed, so no need for SeqCst.

cc @matklad
  • Loading branch information
JohnTitor committed Nov 10, 2019
2 parents ac162c6 + b05e200 commit e88aa39
Showing 1 changed file with 166 additions and 125 deletions.
Loading

0 comments on commit e88aa39

Please sign in to comment.