Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails on Mac; Linker missing Metal and Objc libraries #8222

Closed
FabianSchuetze opened this issue Jan 19, 2024 · 11 comments · Fixed by #8230
Closed

Build fails on Mac; Linker missing Metal and Objc libraries #8222

FabianSchuetze opened this issue Jan 19, 2024 · 11 comments · Fixed by #8230

Comments

@FabianSchuetze
Copy link

🐛 Describe the bug

When I build the library from source according to the instructions I get the following linker errors:

Undefined symbols for architecture arm64:
  "_OBJC_CLASS_$_MTLCompileOptions", referenced from:
      objc-class-ref in nms_kernel.mm.o
      objc-class-ref in ps_roi_align_kernel.mm.o
      objc-class-ref in ps_roi_pool_kernel.mm.o
      objc-class-ref in roi_align_kernel.mm.o
      objc-class-ref in roi_pool_kernel.mm.o
  "_OBJC_CLASS_$_NSString", referenced from:
      objc-class-ref in nms_kernel.mm.o
      objc-class-ref in ps_roi_align_kernel.mm.o
      objc-class-ref in ps_roi_pool_kernel.mm.o
      objc-class-ref in roi_align_kernel.mm.o
      objc-class-ref in roi_pool_kernel.mm.o
  "_objc_autorelease", referenced from:
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in nms_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_pool_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_pool_kernel.mm.o
  "_objc_autoreleasePoolPop", referenced from:
      ____ZN6vision3ops12_GLOBAL__N_110nms_kernelERKN2at6TensorES5_d_block_invoke in nms_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_align_forward_kernelERKN2at6TensorES5_dxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_128ps_roi_align_backward_kernelERKN2at6TensorES5_S5_dxxxxxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_126ps_roi_pool_forward_kernelERKN2at6TensorES5_dxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_pool_backward_kernelERKN2at6TensorES5_S5_dxxxxxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_124roi_align_forward_kernelERKN2at6TensorES5_dxxxb_block_invoke in roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_125roi_align_backward_kernelERKN2at6TensorES5_dxxxxxxxb_block_invoke in roi_align_kernel.mm.o
      ...
  "_objc_autoreleasePoolPush", referenced from:
      ____ZN6vision3ops12_GLOBAL__N_110nms_kernelERKN2at6TensorES5_d_block_invoke in nms_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_align_forward_kernelERKN2at6TensorES5_dxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_128ps_roi_align_backward_kernelERKN2at6TensorES5_S5_dxxxxxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_126ps_roi_pool_forward_kernelERKN2at6TensorES5_dxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_pool_backward_kernelERKN2at6TensorES5_S5_dxxxxxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_124roi_align_forward_kernelERKN2at6TensorES5_dxxxb_block_invoke in roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_125roi_align_backward_kernelERKN2at6TensorES5_dxxxxxxxb_block_invoke in roi_align_kernel.mm.o
      ...
  "_objc_opt_new", referenced from:
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in nms_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_pool_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_pool_kernel.mm.o
ld: symbol(s) not found for architecture arm64

I use cmake to set up the build like so: /opt/homebrew/bin/cmake -DCMAKE_PREFIX_PATH=python3 -c 'import torch;print(torch.utils.cmake_prefix_path)' -DWITH_MPS=1. Torchvision builds when I turn MPS builds off.

What can I do to build the project from source with MPS successfully?

Versions

Collecting environment information...
PyTorch version: 2.1.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.4.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.27.1
Libc version: N/A

Python version: 3.9.6 (default, May 7 2023, 23:32:44) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-13.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==2.1.0
[pip3] torchdata==0.7.0
[pip3] torchtext==0.16.0
[pip3] torchvision==0.16.0
[conda] Could not collect

@NicolasHug
Copy link
Member

CC @qqaatw this seems to be an MPS-related issue - is this something you can help with? Thanks!

@qqaatw
Copy link
Contributor

qqaatw commented Jan 20, 2024

I'll take a look at this. @FabianSchuetze which commands did you use to build from source? Do you only want to build the c++ lib or with python binding?

@FabianSchuetze
Copy link
Author

FabianSchuetze commented Jan 20, 2024

Thanks a lot, @qqaatw . I appreciate that.

The command I run was:

 /opt/homebrew/bin/cmake  -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'` -DWITH_MPS=1 -DWITH_JPEG=0 -DWITH_PGN=0 ..

which resulted in the following log:

CMake Warning at /Users/user1004/builds/Boc1aCWwx/0/user/external-contrib/model-conversion/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /Users/user1004/builds/Boc1aCWwx/0/user/external-contrib/model-conversion/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:24 (find_package)


-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/vision/build
(venv) user1004@2023-007-002 build % rm CMakeCache.txt 
(venv) user1004@2023-007-002 build % /opt/homebrew/bin/cmake  -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'` -DWITH_MPS=1 -DWITH_JPEG=0 -DWITH_PNG=0 ..
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The OBJC compiler identification is AppleClang 14.0.3.14030022
-- The OBJCXX compiler identification is AppleClang 14.0.3.14030022
-- Detecting OBJC compiler ABI info
-- Detecting OBJC compiler ABI info - done
-- Check for working OBJC compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting OBJCXX compiler ABI info
-- Detecting OBJCXX compiler ABI info - done
-- Check for working OBJCXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
CMake Warning at /Users/user1004/builds/Boc1aCWwx/0/user/external-contrib/model-conversion/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /Users/user1004/builds/Boc1aCWwx/0/user/external-contrib/model-conversion/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:24 (find_package)


-- Found Torch: /Users/user1004/builds/Boc1aCWwx/0/user/external-contrib/model-conversion/venv/lib/python3.9/site-packages/torch/lib/libtorch.dylib  
-- Configuring done (0.6s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/vision/build

And then building results in the following linker errors:

Undefined symbols for architecture arm64:
  "_OBJC_CLASS_$_MTLCompileOptions", referenced from:
      objc-class-ref in nms_kernel.mm.o
      objc-class-ref in ps_roi_align_kernel.mm.o
      objc-class-ref in ps_roi_pool_kernel.mm.o
      objc-class-ref in roi_align_kernel.mm.o
      objc-class-ref in roi_pool_kernel.mm.o
  "_OBJC_CLASS_$_NSString", referenced from:
      objc-class-ref in nms_kernel.mm.o
      objc-class-ref in ps_roi_align_kernel.mm.o
      objc-class-ref in ps_roi_pool_kernel.mm.o
      objc-class-ref in roi_align_kernel.mm.o
      objc-class-ref in roi_pool_kernel.mm.o
  "_objc_autorelease", referenced from:
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in nms_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_pool_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_pool_kernel.mm.o
  "_objc_autoreleasePoolPop", referenced from:
      ____ZN6vision3ops12_GLOBAL__N_110nms_kernelERKN2at6TensorES5_d_block_invoke in nms_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_align_forward_kernelERKN2at6TensorES5_dxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_128ps_roi_align_backward_kernelERKN2at6TensorES5_S5_dxxxxxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_126ps_roi_pool_forward_kernelERKN2at6TensorES5_dxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_pool_backward_kernelERKN2at6TensorES5_S5_dxxxxxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_124roi_align_forward_kernelERKN2at6TensorES5_dxxxb_block_invoke in roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_125roi_align_backward_kernelERKN2at6TensorES5_dxxxxxxxb_block_invoke in roi_align_kernel.mm.o
      ...
  "_objc_autoreleasePoolPush", referenced from:
      ____ZN6vision3ops12_GLOBAL__N_110nms_kernelERKN2at6TensorES5_d_block_invoke in nms_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_align_forward_kernelERKN2at6TensorES5_dxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_128ps_roi_align_backward_kernelERKN2at6TensorES5_S5_dxxxxxxx_block_invoke in ps_roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_126ps_roi_pool_forward_kernelERKN2at6TensorES5_dxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_127ps_roi_pool_backward_kernelERKN2at6TensorES5_S5_dxxxxxx_block_invoke in ps_roi_pool_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_124roi_align_forward_kernelERKN2at6TensorES5_dxxxb_block_invoke in roi_align_kernel.mm.o
      ____ZN6vision3ops12_GLOBAL__N_125roi_align_backward_kernelERKN2at6TensorES5_dxxxxxxxb_block_invoke in roi_align_kernel.mm.o
      ...
  "_objc_opt_new", referenced from:
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in nms_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in ps_roi_pool_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_align_kernel.mm.o
      vision::ops::mps::compileVisionOpsLibrary(id<MTLDevice>) in roi_pool_kernel.mm.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [libtorchvision.dylib] Error 1
make[1]: *** [CMakeFiles/torchvision.dir/all] Error 2
make: *** [all] Error 2

I do not need the python bindings. If I turn mps builds off DWITH_MPS=0 the linker succeeds. Does that provide all the necessary info?

@qqaatw
Copy link
Contributor

qqaatw commented Jan 20, 2024

@FabianSchuetze Yes, thank you for providing the context.

@NicolasHug We have two ways to fix this:

  1. Find and link the necessary libraries in torchvision's CMakeLists.txt.
  2. Find the necessary libraries in PyTorch's TorchConfig.cmake and link them in torchvision's CMakeLists.txt.

I can do either way, which one would you prefer?

To give more context, CUDA uses the second approach.

@FabianSchuetze
Copy link
Author

Thanks, @qqaatw , for the help.

@NicolasHug
Copy link
Member

Thanks a ton for looking into this @qqaatw ! I seems to me that we should go with 2., but let me ask confirmation from @malfet here

@malfet
Copy link
Contributor

malfet commented Jan 22, 2024

I think first approach is better because:

  • First approach will force all PyTorch extension to link with MPS (as it does for CUDA), while 2nd will give extensions author a freedom to choose whether to integrate with MPS or not.
  • All PyTorch Mac builds include MPS integration, while some Linux builds do and some do not, so it's easier to just link with at the framework level
  • Missing _objc_autoreleasePoolPush symbols means that ObjC language integration was not properly enabled (as it has nothing to do with MPS sort of)

Q: How does it work now in CI/CD? Or we build without MPS by default?

@NicolasHug
Copy link
Member

Q: How does it work now in CI/CD? Or we build without MPS by default?

We build and test the MPS backend, but it gets built through setup.py instead of relying on cmake:

vision/setup.py

Lines 208 to 209 in 315f315

elif torch.backends.mps.is_available() or force_mps:
sources += source_mps

@malfet
Copy link
Contributor

malfet commented Jan 22, 2024

We build and test the MPS backend, but it gets built through setup.py instead of relying on cmake:

vision/setup.py

Lines 208 to 209 in 315f315

elif torch.backends.mps.is_available() or force_mps:
sources += source_mps

In that case, I think it makes even more sense for TorchVision build system to define the dependencies it wants to use to implement the features it wants, as they can be different from PyTorch dependencies

@qqaatw
Copy link
Contributor

qqaatw commented Jan 22, 2024

Hello @malfet,

I agree with your points here. But I'm curious that why we want to force all extensions link with CUDA but not with MPS, if MPS is built. The issue here is essentially lack of Foundation & Metal lib.

malfet pushed a commit that referenced this issue Jan 30, 2024
Fixes #8222

I think we don't have tests for cmake build. It was built successfully on my Mac.
@FabianSchuetze
Copy link
Author

Thanks for working on this issue.

facebook-github-bot pushed a commit that referenced this issue Mar 20, 2024
Summary:
Fixes #8222

I think we don't have tests for cmake build. It was built successfully on my Mac.

Reviewed By: vmoens

Differential Revision: D55062801

fbshipit-source-id: 9b2cd66104b5c2a8493c957518521c0b416c2823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants