Summary
This version brings bug fixes and updates to our v2.3.0 release.
New features
- [HG info]
- Add support for CSV and JSON output formats
- [HG/NA Perf Test]
- Enable sizes to be passed using k/m/g qualifiers
- [NA OFI]
- Add
tcp_rxm
alias fortcp;ofi_rxm
- Find CXI
svc_id
orvni
ifauth_key
components have zeros (e.g.,auth_key=0:0
)- Add VNI index for
SLINGSHOT_VNIS
discovery as extra auth_key parameter
- Add VNI index for
- Add
Bug fixes
- [HG/NA]
- Fix potential race when checking secondary completion queue
- [HG]
- Prevent multiple threads from entering
HG_Core_progress()
- Add
HG_ALLOW_MULTI_PROGRESS
CMake option to control behavior (ON
by default) - Disable
NA_HAS_MULTI_PROGRESS
ifHG_ALLOW_MULTI_PROGRESS
isON
- Add
- Fix expected operation count for handle to be atomic
- Expected operation count can change if extra RPC payload must be transferred
- Let poll events remain private to HG poll wait
- Prevent a race when multiple threads call progress and
HG_ALLOW_MULTI_PROGRESS
isOFF
- Prevent a race when multiple threads call progress and
- Separate internal list from user created list of handles
- Address an issue where
HG_Context_unpost()
would unnecessarily wait
- Address an issue where
- Prevent multiple threads from entering
- [HG Core]
- Cache disabled response info in proc info
- Add
HG_Core_registered_disable(d)_response()
routines - Refactor and optimize self RPC code path
- Add additional logging of refcount/expected op count
- Fixes for self RPCs with no response
- [HG Util]
- Prevent locking in
hg_request_wait()
- Concurrent progress in multi-threaded scenarios on the same context could complete another thread's request and let a thread blocked in progress
- Prevent locking in
- [HG Perf]
- Fix tests to be run in parallel with any communicator size
- [HG Test]
- Ensure affinity of class thread is set
- Add concurrent multi RPC test
- Add multi-progress test
- Add multi-progress test with handle creation
- Refactoring of unit test cleanup
- [NA]
- Fix memory leak on
NA_Get_protocol_info()
- Fix memory leak on
- [NA OFI]
- Fix
na_ofi_get_protocol_info()
not returningopx
protocol- Refactor
na_ofi_getinfo()
to account forNA_OFI_PROV_NULL
type - Ensure there are no duplicated entries
- Refactor
- Refactor parsing of init info strings and fix OPX parsing
- Simplify parsing of some address strings
- Bump default CQ size to have a maximum depth of 128k entries
- Remove sockets as the only provider on macOS
- Remove send afer send tagged msg ordering
- Ensure that
rx_ctx_bits
are not set if SEP is not used - Set CXI domain ops w/ slingshot 2.2 to prevent from potential memory corruptions
- Fix
- [NA Perf]
- Prevent tests from being run as parallel tests
- [CMake]
- Pass
INSTALL_NAME_DIR
through target properties- This fixes an issue seen on macOS where libraries would not be found using
@rpath
- This fixes an issue seen on macOS where libraries would not be found using
- Pass
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZE
to be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires