Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

homm
Copy link
Member

@homm homm commented Jul 6, 2024

Current SIMD state

There is a separate Pillow-SIMD package on PyPI with modified Pillow code, which includes some rewritten routines using SSE4 and AVX2 compiler intrinsics. Any version of Pillow-SIMD has strictly the same functionality, behavior, and bugs as the corresponding Pillow version. There are many problems with this approach, including:

  1. Pillow-SIMD versions are often outdated. This is because any Pillow-SIMD release requires some work to merge SIMD-accelerated code into every Pillow version, which involves resolving potential conflicts and testing all implementations.
  2. SIMD-accelerated code is unconditionally platform-specific. Pillow-SIMD can be compiled and run only on x86. This means that it is very hard to add Pillow-SIMD as a dependency in cross-platform applications.
  3. Pillow-SIMD is a different package, although it shares the same namespace (PIL), which means only one package (Pillow or Pillow-SIMD) should be installed at a time in one environment. It can be tricky to avoid Pillow installation if it is a dependency for some third-party libraries.
  4. Pillow-SIMD itself can't be a dependency for third-party libraries, since it's not cross-platform.
  5. Pillow-SIMD doesn't provide precompiled binaries since there are two types of SIMD-acceleration implemented: SSE4 and AVX2. Pillow-SIMD compiled with AVX2 is a bit faster, but can't be run on non-AVX2 CPUs (including Rosetta 2 on Apple silicon).

Proposal

I'd like to move SIMD-accelerated code to the upstream with some enhancements for cross-platform compilation. How it affects described problems:

  1. Every released Pillow version will contain SIMD-accelerated code, so no additional actions will be required.
  2. Pillow can be compiled on any platform, with or without SIMD-acceleration, if available. No separate package will be required.
  3. Like the previous point, no separate package will be required.
  4. Like the previous point, no separate package will be required.
  5. Default precompiled binaries for Pillow will be provided with minimal available SIMD-acceleration for every platform: SSE2 for x86_64 and NEON for ARM aarch64 (please note, currently there is no SSE2 or NEON accelerated code). For advanced users who want acceleration, compilation documentation will be provided.

Challenges

Testing

The main challenge is testing. If some code is contributed to the upstream, it should be run during tests. There could be the following versions of the code:

  • Native code: Will be tested on platforms without acceleration support: 32-bit x86, s390x, and ppc64le. Are there any tests for any of these platforms?
  • SSE2: The most common, since it is the default for x86_64.
  • SSE4: Not currently tested. Should be added explicitly.
  • AVX2: Not currently tested. Should be added explicitly.
  • NEON: Mandatory extension for ARM aarch64, so it will be tested in many ARM configurations.

Detecting Current Acceleration

There should be a way to detect the current acceleration implementation. I think some property for the core object should be enough.

Criticism

The proposed solution doesn't provide first-class SIMD-acceleration for Pillow. When you are using any application or installing other libraries, like NumPy, there is only one compiled version available for a platform, and it contains all types of available acceleration. The right accelerated functions are dynamically called at runtime instead of recompiling for particular CPU capabilities during installation. However, such an approach requires much more effort, and this is not the aim of the current proposal.

@homm homm marked this pull request as draft July 6, 2024 11:34
@homm
Copy link
Member Author

homm commented Jul 8, 2024

@python-pillow/pillow-team What do you think?

@nulano
Copy link
Contributor

nulano commented Jul 15, 2024

I don't see why testing x86_64 would be an issue - we already have special cases for PYTHONOPTIMIZE and could do the same for SSE4 / AVX2:

- { python-version: "3.11", PYTHONOPTIMIZE: 1, REVERSE: "--reverse" }
- { python-version: "3.10", PYTHONOPTIMIZE: 2 }

I think the only x86 (32-bit) job we have is on AppVeyor (using SysWOW64):

Pillow/.appveyor.yml

Lines 21 to 23 in 2152a17

- PYTHON: C:/Python312
ARCHITECTURE: x86
APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2022
However, it seems to have selected sse2: https://ci.appveyor.com/project/Python-pillow/pillow/builds/50165788/job/sv60thgf3ygibbv4?fullLog=true#L3667

s390x and ppc64le are tested using QEMU:

ubuntu-24.04-noble-ppc64le,
ubuntu-24.04-noble-s390x,


Detecting Current Acceleration

There should be a way to detect the current acceleration implementation. I think some property for the core object should be enough.

I anticipate that we will get questions about this, so I would propose we add this to PIL.features and the report summary PIL.features.pilinfo.

I see you've already done this 👍


The right accelerated functions are dynamically called at runtime instead of recompiling for particular CPU capabilities during installation.

I think we should ideally aim for implementing this at some point in the future, but starting with compile-time selection for now would be a good first step.

@@ -128,6 +128,7 @@ def get_supported_codecs() -> list[str]:
"libjpeg_turbo": ("PIL._imaging", "HAVE_LIBJPEGTURBO", "libjpeg_turbo_version"),
"libimagequant": ("PIL._imaging", "HAVE_LIBIMAGEQUANT", "imagequant_version"),
"xcb": ("PIL._imaging", "HAVE_XCB", None),
"acceleration": ("PIL._imaging", "acceleration", "acceleration"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this violates the type hint for check_feature by returning a string instead of a bool.

@homm homm force-pushed the simd-init branch 3 times, most recently from bd65e1c to 7fa8fb8 Compare July 16, 2024 19:11
.github/workflows/test.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
@homm homm requested a review from hugovk July 17, 2024 11:36
@homm homm marked this pull request as ready for review July 28, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants