Skip to content

Commit

Permalink
fix(simd.h): AVX-512 round function (AcademySoftwareFoundation#4119)
Browse files Browse the repository at this point in the history
This PR fixes vfloat16 round function. Intrinsic `_mm512_roundscale_ps`
was used incorrectly, and caused failure on Zen4 CPU.

```
/var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579:
FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3]))
	values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4'
``` 

In old code `_mm512_roundscale_ps (a, (1<<4) | 3)` meant the following:
```
[0001] - Number of fixed points to preserve
[0] - Use MSCSR exception mask
[0] - Select mode from imm
[11] - Truncate mode
```
Effectively enabling rounding to nearest 0.5, not to integer.

References:
* https://www.felixcloutier.com/x86/vrndscalepd#fig-5-29
*
https://stackoverflow.com/questions/50854991/instrinsic-mm512-round-ps-is-missing-for-avx512


Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
  • Loading branch information
AngryLoki authored and lgritz committed Jan 21, 2024
1 parent 1f68a59 commit 3376cd7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/include/OpenImageIO/simd.h
Original file line number Diff line number Diff line change
Expand Up @@ -10195,7 +10195,7 @@ OIIO_FORCEINLINE vfloat16 floor (const vfloat16& a)
OIIO_FORCEINLINE vfloat16 round (const vfloat16& a)
{
#if OIIO_SIMD_AVX >= 512
return _mm512_roundscale_ps (a, (1<<4) | 3); // scale=1, round to nearest smaller mag int
return _mm512_roundscale_ps (a, (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC));
#else
return vfloat16(round(a.lo()), round(a.hi()));
#endif
Expand Down

0 comments on commit 3376cd7

Please sign in to comment.