Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run_tests_parallel uses only one core #956

Open
Trzs opened this issue Jan 12, 2024 · 5 comments
Open

Run_tests_parallel uses only one core #956

Trzs opened this issue Jan 12, 2024 · 5 comments
Assignees

Comments

@Trzs
Copy link
Contributor

Trzs commented Jan 12, 2024

On Perlmutter and friends, run_tests_parallel runs tests in parallel, but all tests are run on just one core.

I created a small reproducer that narrows it down to certain module imports.

main script:

import subprocess
from multiprocessing import Pool

commands = [["libtbx.python", "dummy.py"]]*10

pool = Pool(processes=10)
for cmd in commands:
    pool.apply_async(subprocess.Popen(cmd))
pool.close()
pool.join()

dummy.py (the import are not important, they just to show that python import run fine)

from boost_adaptbx import boost
#import boost_adaptbx.boost.python as bp
#import boost_python_meta_ext
#import boost_tuple_ext

import inspect
import os
import re
import sys
import warnings
import numpy as np

from libtbx import cpp_function_name

x = 0
for i in range(10**8):
    x += 1.3*i

Without the comments, the dummy scripts run on 10 cores. With either one, it's down to one core.
Wrapping the import in os.sched_getaffinity and os.sched_setaffinity helps, but is not a real solution.

@Trzs
Copy link
Contributor Author

Trzs commented Jan 12, 2024

Importing the modules somehow changes the affinity:

In [1]: def get_affinity():
   ...:   for line in open('/proc/self/status'):
   ...:     if 'Cpu' in line:
   ...:       print(line)
   ...:   return
   ...:

In [2]: get_affinity()
 Cpus_allowed:	ffffffff,ffffffff,ffffffff,ffffffff

 Cpus_allowed_list:	0-127


In [3]: import boost_python_meta_ext

In [4]: get_affinity()
 Cpus_allowed:	00000000,00000000,00000000,00000001

 Cpus_allowed_list:	0

@Trzs
Copy link
Contributor Author

Trzs commented Jan 12, 2024

Logging the changes with strace libtbx.python dummy.py > trace.log 2>&1, something is changing the affinity:

[...]
sched_getaffinity(334914, 16, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]) = 16
[...]
sched_setaffinity(334914, 16, [0])      = 0
[...]

@bkpoon
Copy link
Member

bkpoon commented Jan 16, 2024

Can you list your packages? I copied your get_affinity test into a file and I do not get the change in affinity with a newly created environment with cctbx-base on one of our servers.

test.py

def get_affinity():
  for line in open('/proc/self/status'):
    if 'Cpu' in line:
      print(line)
  return

if __name__ == '__main__':
  get_affinity()

  import boost_python_meta_ext

  get_affinity()
[bkpoon@anaconda:tmp] conda create -n py39 cctbx-base python=3.9
[bkpoon@anaconda:tmp] conda activate py39
(py39) [bkpoon@anaconda:tmp] python test.py
Cpus_allowed:   ffff,ffffffff,ffffffff,ffffffff,ffffffff

Cpus_allowed_list:      0-143

Cpus_allowed:   ffff,ffffffff,ffffffff,ffffffff,ffffffff

Cpus_allowed_list:      0-143

@Trzs
Copy link
Contributor Author

Trzs commented Jan 17, 2024

It seems this behaviour is caused by OMP_PLACES and OMP_PROC_BIND. Unsetting these leads to the expected behavior. These were set for Kokkos.

more info:
pytorch/pytorch#49971
OpenMathLib/OpenBLAS#2238

The core issue seems to be a bug when OMP_PLACES is set to threads. As far as I know, I am not using OpenBLAS, but the same bug might occur in some other library.

@Trzs
Copy link
Contributor Author

Trzs commented Jan 23, 2024

Current workaround to suppress Kokkos warnings: export OMP_PLACES=threads and export OMP_PROC_BIND=false

Interaction of these settings with MPI is still an open question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants