Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OIIO 2.5.7.0 fails to build on Fedora only on aarch64 #4111

Closed
hobbes1069 opened this issue Jan 15, 2024 · 27 comments · Fixed by #4143
Closed

[BUG] OIIO 2.5.7.0 fails to build on Fedora only on aarch64 #4111

hobbes1069 opened this issue Jan 15, 2024 · 27 comments · Fixed by #4143

Comments

@hobbes1069
Copy link
Contributor

Describe the bug

/builddir/build/BUILD/OpenImageIO-2.5.7.0/src/include/OpenImageIO/simd.h:4796:32: error: cannot convert ‘int16x8_t’ to ‘OpenImageIO_v2_5::simd::vint4::simd_t’ {aka ‘int32x4_t’} in initialization
 4796 |     simd_t val16 = vcombine_s16(vqmovn_s32(clamped), vdup_n_s16(0));
      |                    ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                |
      |                                int16x8_t

Full log:
https://kojipkgs.fedoraproject.org//work/tasks/1777/111791777/build.log

Perhaps this is do to the new SIMD updates? I never built 2.5.6.0 in the official repos but all test builds completed.

Platform information:

  • OIIO branch/version: 2.5.7.0
  • OS: Fedora Rawhide
  • C++ compiler: gcc 13.2.1
  • Any non-default build flags when you build OIIO: See full logs
@mfvescovi
Copy link
Contributor

Hi Richard!

/builddir/build/BUILD/OpenImageIO-2.5.7.0/src/include/OpenImageIO/simd.h:4796:32: error: cannot convert ‘int16x8_t’ to ‘OpenImageIO_v2_5::simd::vint4::simd_t’ {aka ‘int32x4_t’} in initialization
 4796 |     simd_t val16 = vcombine_s16(vqmovn_s32(clamped), vdup_n_s16(0));
      |                    ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                |
      |                                int16x8_t

I'll give you more feedback on Debian later this week (maybe this evening CET) since I'm uploading v2.5.7.0 to Debian experimental prior to the SONAME transition and it'll be built on all release architectures, so even arm64.

@mfvescovi
Copy link
Contributor

I'll give you more feedback on Debian later this week (maybe this evening CET) since I'm uploading v2.5.7.0 to Debian experimental prior to the SONAME transition and it'll be built on all release architectures, so even arm64.

Here: ARM64 build

Hope this helps somehow.

@hobbes1069
Copy link
Contributor Author

Unfortuantely I couldn't find anything that helped. For now I have reverted to 2.5.6.0.

@darix
Copy link

darix commented Jan 24, 2024

Same issue for openSUSE Tumbleweed

https://build.opensuse.org/package/live_build_log/graphics/OpenImageIO/openSUSE_Factory_ARM/aarch64

@hobbes1069
Copy link
Contributor Author

The original build expired so here's a new link showing the error for 2.5.8.0:
https://kojipkgs.fedoraproject.org//work/tasks/1376/112731376/build.log

@darix
Copy link

darix commented Feb 5, 2024

openSUSE package updated to 2.5.8.0. same link as above.

@darix
Copy link

darix commented Feb 7, 2024

can we maybe revert the neon part from
in the mean time? d58e4aa3e3f4fb42aa1593ae379

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

Thanks for your patience. I will look at this issue today.

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

Proposed fix here: #4143

Can you try this patch out on your end before I merge it?

The problem is that even though we now have an ARM based Mac in our CI suite, it looks like the flavor of Apple Clang it uses was perfectly happy to accept the code we had and just understand that it was fine to bitcast between those types, whereas it was rejected on the ARM Linux variants (which we don't have available to test on).

Reading documentation on the intrinsics, I believe I was able to repair it, but won't be sure until you give it a try.

The functions get tested, so I know they were correct execution all along, it's just a question of the compiler being strict about the type casting.

@darix
Copy link

darix commented Feb 7, 2024

package with the patch is building here: https://build.opensuse.org/package/show/home:darix:branches:graphics/OpenImageIO

you can find multiple distributions and architectures there.

it seems 2.5.8.0 broke the testsuite for us. even updating the images to the latest version does not help.

for the record 2.5.7.0 still passed https://build.opensuse.org/package/show/openSUSE:Factory/OpenImageIO

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

If I understand the failing aarch64 logs for the build in progress that you point to above, I believe that it is failing because the Imath/OpenEXR you have installed is 2.2, whereas OpenImageIO 2.5 requires a minimum OpenEXR 2.4. (Current OpenEXR is 3.2, so we're talking about extraordinarily old.)

At least in that log, it's not getting anywhere near far enough in the build for the NEON patch to be relevant. It fails just trying to find dependencies.

@darix
Copy link

darix commented Feb 7, 2024

I will force newer BuildRequires.

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

For reference, OpenEXR 2.4 was released in 2019.

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

@hobbes1069 Your error was definitely on the NEON part. Please do let me know (if you can check) whether the patch from this PR gets you unstuck.

@hobbes1069
Copy link
Contributor Author

@lgritz yes, seems to be all good now, just built 2.5.7.0.
https://kojipkgs.fedoraproject.org//work/tasks/6094/113136094/build.log

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

Outstanding, thanks @hobbes1069. Would you mind please clicking approve on #4143, since basically fixing your build break is the whole reason for that patch? Thanks.

@darix
Copy link

darix commented Feb 7, 2024

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

@darix Those look suspiciously like the same failures as here https://github.com/AcademySoftwareFoundation/OpenImageIO/actions/runs/7811654989/job/21307148511
which is our "bleeding edge" in which we test against the top-of-tree of several dependencies. The current (unreleased) top commit in openexr appears to be responsible for that failure, and it's a bug the openexr team is trying to fix before they make their release.

Are you by any chance using an exr that's either taken right from their master, or incorporates that broken patch from the top of their master?

@darix
Copy link

darix commented Feb 7, 2024

@lgritz
Copy link
Collaborator

lgritz commented Feb 7, 2024

Yes, that openexr-CVE-2023-5841.patch is the one with the bug, as far as I can tell.

@darix
Copy link

darix commented Feb 7, 2024

oki. will keep that in mind.

@lgritz
Copy link
Collaborator

lgritz commented Feb 8, 2024

Sigh. CVEs.

As I understand it, this CVE was addressing a potential memory clobber in a utility that is only used for OpenEXR's own testing. It not an exploitable problem in any application that uses openexr because it's not in code that any real app would call.

But because everybody drops everything when the magic word "CVE" is uttered, lest an unaddressed CVE be a stain on the project and jeopardize its inclusion in distros, the "fixes" are rushed out without proper care or time to test. This is yet another instance of a CVE that reports something that isn't really a problem and the rush to fix caused a much worse one.

(The situation was also compounded by an unrelated logistical problem, wherein the openexr team did not receive the email that reported the bug and only found out about the problem when the CVE was published, which was done without anybody checking in with the project authors to verify that we received the email or ask why we hadn't responded.)

@darix
Copy link

darix commented Feb 8, 2024

I brought up the state of the openexr patch to the suse security team so we can remedy the situation. :)

now that we have a fix for OpenImageIO ... it is back to the coal mines and trying to hammer the cmake mess for MaterialX/OpenUSD into shape that one can finally install it properly.

Thank you for your support and keep up the great work! :)

@lgritz
Copy link
Collaborator

lgritz commented Feb 8, 2024

So sorry about the openexr issue. We are working hard to try to fix it ASAP. We held up our release while we work it out, not realizing that the bad commit had been cherry picked into your patches.

@darix
Copy link

darix commented Feb 8, 2024

honestly compared to MaterialX/OpenUSD this is harmless :)

@darix
Copy link

darix commented Feb 8, 2024

all architectures are passing now for me

@hobbes1069
Copy link
Contributor Author

Fixed via #4143.

lgritz added a commit that referenced this issue Feb 8, 2024
Primarily, recent changes (PR #4071) to vint4::store for the NEON case
appear to have some type mismatches, which apple clang on ARM-based Mac
(including our CI) seems ok with, but which is generating type errors on
other ARM Linux platforms.

I think the types were weird here, so I tightened it up to get the types
right for temporary variables in that function. That's the primary fix
here.

Secondarily, I modified simd.h and the CMake setup so that build option
USE_SIMD=0 will disable NEON in the same way that it disables SSE. (I
realized that USE_SIMD=0 was not disabling NEON, so there was no way for
a NEON platform to completely disable SIMD if they needed to.)

Fixes #4111

Signed-off-by: Larry Gritz <lg@larrygritz.com>
lgritz added a commit to lgritz/OpenImageIO that referenced this issue Feb 8, 2024
Primarily, recent changes (PR AcademySoftwareFoundation#4071) to vint4::store for the NEON case
appear to have some type mismatches, which apple clang on ARM-based Mac
(including our CI) seems ok with, but which is generating type errors on
other ARM Linux platforms.

I think the types were weird here, so I tightened it up to get the types
right for temporary variables in that function. That's the primary fix
here.

Secondarily, I modified simd.h and the CMake setup so that build option
USE_SIMD=0 will disable NEON in the same way that it disables SSE. (I
realized that USE_SIMD=0 was not disabling NEON, so there was no way for
a NEON platform to completely disable SIMD if they needed to.)

Fixes AcademySoftwareFoundation#4111

Signed-off-by: Larry Gritz <lg@larrygritz.com>
1div0 pushed a commit to 1div0/OpenImageIO that referenced this issue Feb 24, 2024
Primarily, recent changes (PR AcademySoftwareFoundation#4071) to vint4::store for the NEON case
appear to have some type mismatches, which apple clang on ARM-based Mac
(including our CI) seems ok with, but which is generating type errors on
other ARM Linux platforms.

I think the types were weird here, so I tightened it up to get the types
right for temporary variables in that function. That's the primary fix
here.

Secondarily, I modified simd.h and the CMake setup so that build option
USE_SIMD=0 will disable NEON in the same way that it disables SSE. (I
realized that USE_SIMD=0 was not disabling NEON, so there was no way for
a NEON platform to completely disable SIMD if they needed to.)

Fixes AcademySoftwareFoundation#4111

Signed-off-by: Larry Gritz <lg@larrygritz.com>
Signed-off-by: Peter Kovář <peter.kovar@reflexion.tv>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants