Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks based on CPU counters #426

Merged
merged 3 commits into from
Feb 27, 2024
Merged

Benchmarks based on CPU counters #426

merged 3 commits into from
Feb 27, 2024

Conversation

Shimuuar
Copy link
Contributor

@Shimuuar Shimuuar commented Dec 5, 2021

We don't run benchmarks with any sort of regularity. In part because it's difficult to get repeatable result, it require dedicated PC and all measurement must be done on same hardware. So tracking performance regressions is hard. I run into this problem while working on size hints. Another way is to use CPU counters to count number of instructions. It's fast and hopefully reproducible and could possibly run on CI.

This PR is prototype. It sort of works, very rough on edges and right now isn't very deterministic. Apparently GC could kick in and skew results. I need to study this more and to figure out how to ensure repeatable measurements.

If everything works out I want to split it into separate package so far it's more convenient to develop it as part of vector.

P.S. -O1 could be up to 3× slower than -O2 in our benchmarks but sometimes it's on par.

@Shimuuar Shimuuar force-pushed the bench-papi branch 3 times, most recently from 21006c3 to 762ec6d Compare January 8, 2024 17:23
@Shimuuar Shimuuar marked this pull request as ready for review January 9, 2024 08:28
@Shimuuar
Copy link
Contributor Author

Shimuuar commented Jan 9, 2024

Now this PR contains benchmarks which use tasty-papi. Since they require libpapi and only work on linux they are protected by cabal flag and won't be built unless requested specifically. I played with benchmarks a bit and it seems that comparing performance for several GHC version is quite doable.

Sadly it's not possible to run them on github CI.

@Shimuuar
Copy link
Contributor Author

@lehins, @Bodigrim I'm going to merge this PR tomorrow if you're OK with it

@Bodigrim
Copy link
Contributor

I don't have a Linux machine to test, but otherwise have no objections. Looks interesting.

@lehins
Copy link
Contributor

lehins commented Jan 14, 2024

I've tried it on my setup with OpenSUSE and I get bunch of errors. Looks like CPU counters are not readily available on all Linux distros without extra configuration.

I am a little skeptical about adding such benchmark directly to vector that is so hard to execute. We can't build it on CI, nor can we easily run it locally. Maybe it would be better to have it as a separate package?

All                 
  listRank:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  rootfix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  leaffix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  awshcc:            FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  hybcc:             FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  quickhull:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  spectral:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  tridiag:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  mutableSet:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_naïve:  FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_manual: FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  minimumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  maximumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
                    
14 out of 14 tests failed (0.02s)
Benchmark algorithms-papi: ERROR
Completed 3 action(s).

@Shimuuar
Copy link
Contributor Author

First of all we can (edc9083) and do build these benchmarks on CI. We just can't actually run them

I think splitting benchmarks means it will become even more complicated to run. Probably to the point where only I can run them. CPU counters are security sensitive with "specter" and all. So problems with access will happen. Hopefully they could be mitigated with better documentation.

But I never encountered such problems (nixos, fedora37) so I can;t debug them What is papi_avail output?

@lehins
Copy link
Contributor

lehins commented Jan 16, 2024

We just can't actually run them

Same case was for me. I was able to build them, just not run them. Which makes me question usability of the suite for your average contributor.

I think splitting benchmarks means it will become even more complicated to run.

I really don't think making a core Haskell package depend on some third party C library that is only usable on some small subset of operating systems is a good idea, even if it is not buildable by default. Splitting it out into a separate package is the only thing that comes to mind as a decent solution to this. Also, I don't understand why would it be more complicated to run? Benchmarks are only useful for people who are making changes to vector, so all it would take is cloning the repo and building it. We don't need to release those benchmarks on hackage.

Just to be clear, I am not against adding the benchmarks to the repo, because I do see value in such benchmarks. I am just against adding them to the main package. I'd rather see something like vector-papi in the repo that has a good documentation on what it takes to run those benchmarks.

Probably to the point where only I can run them.

That is already the case. Out of three maintainers you are the only one who can run them.

I can;t debug

I can't debug them either since I've never used papi. Since you asked, I'll post the relevant output, but honestly, I don't have enough time to debug this issue any further.

$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
...
--------------------------------------------------------------------------------
Of 108 possible events, 0 are available, of which 0 are derived.

No events detected!  Check papi_component_avail to find out why.
$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
   \-> Disabled: Error libpfm4 no default PMU found
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found
Name:   sysdetect               System info detection component

Active components:
Name:   sysdetect               System info detection component
                                Native: 0, Preset: 0, Counters: 0


--------------------------------------------------------------------------------

@Shimuuar
Copy link
Contributor Author

Or maybe I don't quite understand what exactly are you proposing. One problem is benchmarks are not accessible from outside of vector package. So we have to either copy them, or split all benchmarks into separate package

Since you asked, I'll post the relevant output
Thanks! I'll look into it.

@lehins
Copy link
Contributor

lehins commented Jan 16, 2024

One problem is benchmarks are not accessible from outside of vector package.

We could export them as a public sub-library.

Since they are require libpapi and only works on linux they're protected by
cabal flag and have to be enabled explicitly

PAPI benchmark couldn't be built on GHC 8.0
@Shimuuar
Copy link
Contributor Author

This is third attempt. PAPI based benchmarks are placed into separate package. They still require specifying cabal flag in order to not break build for anyone who don't want to run them.

I also added cabal check for vector-stream

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!

@Shimuuar Shimuuar merged commit 197eeb4 into master Feb 27, 2024
22 checks passed
@Shimuuar Shimuuar deleted the bench-papi branch February 27, 2024 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants