Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory barrier to Mutex#unlock on aarch64 #14272

Merged
merged 2 commits into from
Feb 10, 2024

Commits on Jan 29, 2024

  1. Fix Mutex on aarch64

    This solution is the same as the one used in crystal-lang#13050.
    
    The following code is expected to output `1000000` preceded by the time
    it took to perform it:
    
    ```
    mutex = Mutex.new
    numbers = Array(Int32).new(initial_capacity: 1_000_000)
    done = Channel(Nil).new
    concurrency = 20
    iterations = 1_000_000 // concurrency
    concurrency.times do
      spawn do
        iterations.times { mutex.synchronize { numbers << 0 } }
      ensure
        done.send nil
      end
    end
    
    start = Time.monotonic
    concurrency.times { done.receive }
    print Time.monotonic - start
    print ' '
    sleep 100.milliseconds # Wait just a bit longer to be sure the discrepancy isn't due to a *different* race condition
    pp numbers.size
    ```
    
    Before this commit, on an Apple M1 CPU, the array size would be anywhere
    from 880k-970k, but I never observed it reach 1M. Here is a sample:
    
    ```
    $ repeat 20 (CRYSTAL_WORKERS=10 ./mutex_check)
    00:00:00.119271625 881352
    00:00:00.111249083 936709
    00:00:00.102355208 946428
    00:00:00.116415166 926724
    00:00:00.127152583 899899
    00:00:00.097160792 964577
    00:00:00.120564958 930859
    00:00:00.122803000 917583
    00:00:00.093986834 954112
    00:00:00.079212333 967772
    00:00:00.093168208 953491
    00:00:00.102553834 962147
    00:00:00.091601625 967304
    00:00:00.108157208 954855
    00:00:00.080879666 944870
    00:00:00.114638042 930429
    00:00:00.093617083 956496
    00:00:00.112108959 940205
    00:00:00.092837875 944993
    00:00:00.097882625 916220
    ```
    
    This indicates that some of the mutex locks were getting through when
    they should not have been. With this commit, using the exact same
    parameters (built with `--release -Dpreview_mt` and run with
    `CRYSTAL_WORKERS=10` to spread out across all 10 cores) these are the
    results I'm seeing:
    
    ```
    00:00:00.078898166 1000000
    00:00:00.072308084 1000000
    00:00:00.047157000 1000000
    00:00:00.088043834 1000000
    00:00:00.060784625 1000000
    00:00:00.067710250 1000000
    00:00:00.081070750 1000000
    00:00:00.065572208 1000000
    00:00:00.065006958 1000000
    00:00:00.061041541 1000000
    00:00:00.059648291 1000000
    00:00:00.078100125 1000000
    00:00:00.050676250 1000000
    00:00:00.049395875 1000000
    00:00:00.069352334 1000000
    00:00:00.063897833 1000000
    00:00:00.067534333 1000000
    00:00:00.070290833 1000000
    00:00:00.067361500 1000000
    00:00:00.078021833 1000000
    ```
    
    Note that it's not only correct, but also significantly faster.
    jgaskins committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    4d68581 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    d71150c View commit details
    Browse the repository at this point in the history