Benchmarks based on CPU counters #426

Shimuuar · 2021-12-05T18:01:34Z

We don't run benchmarks with any sort of regularity. In part because it's difficult to get repeatable result, it require dedicated PC and all measurement must be done on same hardware. So tracking performance regressions is hard. I run into this problem while working on size hints. Another way is to use CPU counters to count number of instructions. It's fast and hopefully reproducible and could possibly run on CI.

This PR is prototype. It sort of works, very rough on edges and right now isn't very deterministic. Apparently GC could kick in and skew results. I need to study this more and to figure out how to ensure repeatable measurements.

If everything works out I want to split it into separate package so far it's more convenient to develop it as part of vector.

P.S. -O1 could be up to 3× slower than -O2 in our benchmarks but sometimes it's on par.

Shimuuar · 2024-01-09T08:34:03Z

Now this PR contains benchmarks which use tasty-papi. Since they require libpapi and only work on linux they are protected by cabal flag and won't be built unless requested specifically. I played with benchmarks a bit and it seems that comparing performance for several GHC version is quite doable.

Sadly it's not possible to run them on github CI.

Shimuuar · 2024-01-13T11:07:47Z

@lehins, @Bodigrim I'm going to merge this PR tomorrow if you're OK with it

Bodigrim · 2024-01-13T11:11:00Z

I don't have a Linux machine to test, but otherwise have no objections. Looks interesting.

lehins · 2024-01-14T10:05:28Z

I've tried it on my setup with OpenSUSE and I get bunch of errors. Looks like CPU counters are not readily available on all Linux distros without extra configuration.

I am a little skeptical about adding such benchmark directly to vector that is so hard to execute. We can't build it on CI, nor can we easily run it locally. Maybe it would be better to have it as a separate package?

All                 
  listRank:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  rootfix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  leaffix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  awshcc:            FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  hybcc:             FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  quickhull:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  spectral:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  tridiag:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  mutableSet:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_naïve:  FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_manual: FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  minimumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  maximumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
                    
14 out of 14 tests failed (0.02s)
Benchmark algorithms-papi: ERROR
Completed 3 action(s).

Shimuuar · 2024-01-14T17:02:02Z

First of all we can (edc9083) and do build these benchmarks on CI. We just can't actually run them

I think splitting benchmarks means it will become even more complicated to run. Probably to the point where only I can run them. CPU counters are security sensitive with "specter" and all. So problems with access will happen. Hopefully they could be mitigated with better documentation.

But I never encountered such problems (nixos, fedora37) so I can;t debug them What is papi_avail output?

lehins · 2024-01-16T08:55:00Z

We just can't actually run them

Same case was for me. I was able to build them, just not run them. Which makes me question usability of the suite for your average contributor.

I think splitting benchmarks means it will become even more complicated to run.

I really don't think making a core Haskell package depend on some third party C library that is only usable on some small subset of operating systems is a good idea, even if it is not buildable by default. Splitting it out into a separate package is the only thing that comes to mind as a decent solution to this. Also, I don't understand why would it be more complicated to run? Benchmarks are only useful for people who are making changes to vector, so all it would take is cloning the repo and building it. We don't need to release those benchmarks on hackage.

Just to be clear, I am not against adding the benchmarks to the repo, because I do see value in such benchmarks. I am just against adding them to the main package. I'd rather see something like vector-papi in the repo that has a good documentation on what it takes to run those benchmarks.

Probably to the point where only I can run them.

That is already the case. Out of three maintainers you are the only one who can run them.

I can;t debug

I can't debug them either since I've never used papi. Since you asked, I'll post the relevant output, but honestly, I don't have enough time to debug this issue any further.

$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
...
--------------------------------------------------------------------------------
Of 108 possible events, 0 are available, of which 0 are derived.

No events detected!  Check papi_component_avail to find out why.

$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
   \-> Disabled: Error libpfm4 no default PMU found
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found
Name:   sysdetect               System info detection component

Active components:
Name:   sysdetect               System info detection component
                                Native: 0, Preset: 0, Counters: 0


--------------------------------------------------------------------------------

Shimuuar · 2024-01-16T09:18:40Z

Or maybe I don't quite understand what exactly are you proposing. One problem is benchmarks are not accessible from outside of vector package. So we have to either copy them, or split all benchmarks into separate package

Since you asked, I'll post the relevant output
Thanks! I'll look into it.

lehins · 2024-01-16T09:24:31Z

One problem is benchmarks are not accessible from outside of vector package.

We could export them as a public sub-library.

Since they are require libpapi and only works on linux they're protected by cabal flag and have to be enabled explicitly PAPI benchmark couldn't be built on GHC 8.0

Shimuuar · 2024-02-26T08:38:21Z

This is third attempt. PAPI based benchmarks are placed into separate package. They still require specifying cabal flag in order to not break build for anyone who don't want to run them.

I also added cabal check for vector-stream

lehins

Beautiful!

Shimuuar force-pushed the bench-papi branch 3 times, most recently from 21006c3 to 762ec6d Compare January 8, 2024 17:23

Shimuuar marked this pull request as ready for review January 9, 2024 08:28

Shimuuar force-pushed the bench-papi branch from 762ec6d to 90d58d7 Compare January 13, 2024 11:00

Shimuuar mentioned this pull request Jan 28, 2024

Expose tests and benchmarks as public sublibrary #481

Merged

Shimuuar added 3 commits February 26, 2024 11:08

Add PAPI-based benchmarks

0981497

Since they are require libpapi and only works on linux they're protected by cabal flag and have to be enabled explicitly PAPI benchmark couldn't be built on GHC 8.0

Test on CI that PAPI benchmark is buildable

5437d41

check vector-stream as well

c7d8173

Shimuuar force-pushed the bench-papi branch from 90d58d7 to c7d8173 Compare February 26, 2024 08:12

lehins approved these changes Feb 26, 2024

View reviewed changes

Bodigrim approved these changes Feb 26, 2024

View reviewed changes

Shimuuar merged commit 197eeb4 into master Feb 27, 2024
22 checks passed

Shimuuar deleted the bench-papi branch February 27, 2024 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks based on CPU counters #426

Benchmarks based on CPU counters #426

Shimuuar commented Dec 5, 2021

Shimuuar commented Jan 9, 2024

Shimuuar commented Jan 13, 2024

Bodigrim commented Jan 13, 2024

lehins commented Jan 14, 2024

Shimuuar commented Jan 14, 2024

lehins commented Jan 16, 2024

Shimuuar commented Jan 16, 2024

lehins commented Jan 16, 2024

Shimuuar commented Feb 26, 2024

lehins left a comment

Benchmarks based on CPU counters #426

Benchmarks based on CPU counters #426

Conversation

Shimuuar commented Dec 5, 2021

Shimuuar commented Jan 9, 2024

Shimuuar commented Jan 13, 2024

Bodigrim commented Jan 13, 2024

lehins commented Jan 14, 2024

Shimuuar commented Jan 14, 2024

lehins commented Jan 16, 2024

Shimuuar commented Jan 16, 2024

lehins commented Jan 16, 2024

Shimuuar commented Feb 26, 2024

lehins left a comment

Choose a reason for hiding this comment