[PyTorch] Update to 2.1 #1426

HGuillemet · 2023-10-16T09:42:11Z

Included in this PR:

Update to PyTorch 2.1
Rationalization of function pointer classes. Group all of them in package functions
Add Tensor.item_bool and Tensor.item_byte
Add Tensor.data_ptr_byte and Tensor.data_ptr_bool
Remove useless classes not part of the libtorch API
Add CUDACachingAllocator
Map most of missing data loader API, for Example<Tensor, Tensor> and Example<Tensor,NoTarget>
Make all methods taking ArrayRef arguments of primitive types accept primitive Java array or variadic.
Make register_module generic and its return type the class of the registered module.
Other minor improvements

saudet · 2023-10-20T11:28:43Z

@sbrunk Could you review this pull request?

HGuillemet · 2023-10-20T11:39:46Z

I still plan to add another improvement: support for stateless datasets, dataloaders, etc...
Hopefully coming later today or tomorrow.
Please hold on before starting a review.

sbrunk · 2023-10-22T20:47:15Z

@saudet: shouldn't we embed libtorch binaries now that it links with CUDA 12 or do we continue building our version with limited compute capabilities ?

If the upstream libtorch binaries don't cause any issues, I'm in favor of switching. I think it has a number of advantages:

Less native building -> less strain on limited CI resources
It can reduce friction regarding PTX JIT compilation (s. [Pytorch] New version of the presets #1360)
Somewhat relaxed driver version constraints due to (currently) targeting CUDA 12.1 instead of always the latest CUDA version.

I will do some tests like running the test suite of Storch using this branch and the upstream binaries to see if I run into any issues.

One thing that might be interesting to look into is MPS acceleration on MacOS, as soon most macs will run on ARM with GPU support.
I'm not sure if the upstream binaries have MPS acceleration enabled, I know the Python wheels do. I'm not even sure if they provide ARM builds at all for pure libtorch (and I don't have an M1/M2 machine to test). I think right now, we don't have support for ARM builds either right?

HGuillemet · 2023-10-22T22:50:09Z

The libtorch binary for mac that can be downloaded from PyTorch main page is still x86_64 only.
I think we could build from source for Mac and use libtorch binaries for CUDA (and ROCm).

saudet · 2023-10-22T23:00:26Z

There's no need to switch to anything, the presets already support LibTorch:
https://github.com/bytedeco/javacpp-presets/tree/master/pytorch#documentation

HGuillemet · 2023-10-22T23:16:42Z

RIght, but what exactly does the user gain in using our binary compared to libtorch, now that libtorch links with CUDA 12 ? If nothing significant, it seems logical to embed libtorch which supports more hardware and for the reason listed by @sbrunk.

One problem I can think of is if the cuda presets is updated to a version and libtorch is not available with this cuda version, or the other way around.
Anything else ?

saudet · 2023-10-22T23:31:27Z

Like I mentioned previously many times, LibTorch links statically with MKL, so it can't be used together with other libraries that use BLAS and LAPACK like GSL, OpenCV, NumPy, SciPy, Smile, etc

HGuillemet · 2023-10-24T12:15:10Z

Ok. Maybe can we use libtorch_cuda only from provided binaries and compile libtorch_cpu ?

I'm done with what I planned to add to this PR.
You can start reviewing. Comments or RFE welcomed.

HGuillemet · 2023-10-24T20:44:08Z

The remaining error during build on Windows is:

D:\a\javacpp-presets\javacpp-presets\pytorch\target\native\org\bytedeco\pytorch\windows-x86_64\jnitorch.cpp(98904): error C2526: 'JavaCPP_org_bytedeco_pytorch_functions_GatheredContextSupplier_allocate_callback': C linkage function cannot return C++ class 'std::shared_ptr<c10::GatheredContext>'

This is related to issue bytedeco/javacpp#720.
@saudet Any idea if this can be fixed ? With some @Convention maybe ?

Else we can skip the single function needing this function pointer: CUDAAllocator.recordHistory.

saudet · 2023-10-24T22:52:38Z

The remaining error during build on Windows is:
D:\a\javacpp-presets\javacpp-presets\pytorch\target\native\org\bytedeco\pytorch\windows-x86_64\jnitorch.cpp(98904): error C2526: 'JavaCPP_org_bytedeco_pytorch_functions_GatheredContextSupplier_allocate_callback': C linkage function cannot return C++ class 'std::shared_ptr<c10::GatheredContext>'
This is related to issue bytedeco/javacpp#720.
@saudet Any idea if this can be fixed ? With some @Convention maybe ?

Sounds like a bug in MSVC 2019:
https://stackoverflow.com/questions/57429064/c-linkage-function-cannot-return-c-class-in-visual-studio-2019

Else we can skip the single function needing this function pointer: CUDAAllocator.recordHistory.

If it doesn't look like an important function, we can do that, sure.

saudet · 2023-10-25T12:01:12Z

Ok. Maybe can we use libtorch_cuda only from provided binaries and compile libtorch_cpu ?

Where can we get libtorch_cuda for CUDA 12.3?

HGuillemet · 2023-10-25T12:07:16Z

According to https://developer.nvidia.com/blog/cuda-toolkit-12-0-released-for-general-availability/ we could run with 12.3 a binary compiled for 12.1 ?

saudet · 2023-10-25T12:22:55Z

CUDA doesn't maintain backward compatibility of the ABI, even for minor version upgrades.

HGuillemet · 2023-10-25T12:25:43Z

Have you seen the "compatibility" paragraph on the link above ? It seems to be new since CUDA 11.

saudet · 2023-10-25T12:27:35Z

Well, try it and tell me if it works! I don't think it does. NVIDIA says many things, but reality is often different.

sbrunk · 2023-10-25T18:08:39Z

I'm trying to test this locally, but I'm running into issues.

[ERROR] Failed to execute JavaCPP Builder: Could not parse "c10/cuda/impl/cuda_cmake_macros.h": File does not exist

Fresh pytorch clone from the cppbuild. There's a cuda_cmake_macros.h.in. Any idea what I might be missing?

HGuillemet · 2023-10-25T18:53:57Z

Have you run the cppbuid ?

sbrunk · 2023-10-25T19:08:00Z

Have you run the cppbuid ?

Yes sorry should have mentioned that. I'm running mvn install --projects pytorch -Djavacpp.platform=linux-x86_64 -Djavacpp.platform.extension=-gpu and the native build is working fine.

Update I re-checked the native build and I think I did not put CUDA in the right place, sorry for the noise.

sbrunk · 2023-10-27T22:33:12Z

I managed to build it locally now, and the Storch tests are passing (with minimal changes) with 2.1. :)

Testing with CUDA is causing issues though. My current assumption is that we might need to extend the CUDA arch list to support Ampere GPUs. See sbrunk/storch#62

sbrunk · 2023-10-28T22:25:27Z

I did some more testing. All tests were done on an Ampere GPU with compute capability 8.6 Ubuntu 22.04 with the latest Nvidia driver currently available in the official package sources which is 535.104.12 which means CUDA driver version 12.2

Snapshot from current master (targeting PyTorch 2.0.1 and CUDA 12.3), with cuda-redist-12.3-8.9 or locally installed CUDA 12.3 -> PTX error
Locally built libtorch from this branch. (targeting PyTorch 2.1.0 and CUDA 12.3) -> PTX error
- Added compute capabilities 8.0 and 8.6 to TORCH_CUDA_ARCH_LIST -> PTX error
Switched to CUDA 12.1 for the local build:
- cuda-redist-12.3-8.9 -> works
- Locally installed CUDA versions 12.1 and 12.3 -> works
- Upstream libtorch (libtorch-cxx11-abi-shared-with-deps-2.1.0+cu121) when setting pathsFirst and lib path -> works

To make sure that the correct native libraries are chosen I always checked the symlinks in $HOME/.javacpp/cache and deleted the cache folder between runs.

So it looks like building against 12.1 makes it work again, and is compatible with 12.3 at runtime. I still need to verify this for 2.0.1

HGuillemet · 2023-10-28T22:45:49Z

Thanks for reporting all these tests.

535.104.12 which means CUDA driver version 12.2

That explains the whole thing.
So they release CUDA 12.3 but the 545 driver that is compatible is still in beta...

Hopefully upstream libtorch_cuda and libtorch_cuda_linalg will work with JavaCPP build of other libs and we could get rid of PTX nightmares.

saudet · 2023-10-30T11:03:41Z

@HGuillemet Please update the CUDA directory paths to 12.3 in these files:
https://github.com/bytedeco/javacpp-presets/blob/master/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java
https://github.com/bytedeco/javacpp-presets/blob/master/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch_cuda.java

saudet · 2023-10-30T22:57:51Z

It looks like either the GitHub runners or NVCC from CUDA 12.3 are a bit faster. In any case, we have an extra build hour that we could use for 8.0 arch.

sbrunk · 2023-10-31T11:07:31Z

With both 41a97d1 and a3102e5 it seems to be working with my local build (built with locally installed CUDA 12.3 and linked against cuda-redist (also 12.3) at runtime). 🚀

Would be great if someone else could verify this.

HGuillemet · 2023-10-31T14:21:45Z

I tried to replace libtorch_cuda in JavaCPP pytorch with "official" binary libtorch_cuda, but this binary is linked with a a libcudart provided in the archive and this would need to patch the elf and the dll files to link with JavaCPP cuda.
Doable but a bit more complex.

Maybe there is another option: copy cubins from a library file to another, but I don't know how to do it.

@saudet You might as well replace +PTX with 9.0, or drop it if it helps since there is almost no 9.0 architecture running out there for now.

saudet · 2023-11-01T00:30:17Z

@sbrunk Apart from that, anything else that needs to be fixed?

saudet · 2023-11-03T08:34:44Z

7,812 additions, 5,691 deletions not shown because the diff is too large. Please use a local Git client to view these changes.

A lot of those "changes" is simply because you're changing around the parsing order. If there are no good reasons to change the order, please revert the order to how it was before. It makes it hard to see the differences.

HGuillemet · 2023-11-03T08:54:27Z

torch_include.h is generated by a script from the output of g++ -H. This ensures we don't miss any new includes and that the includes are parsed in the order they are processed by g++. If the order changed since 2.0.1 this is probably because of upstream changes (or maybe I fixed a bug in my script).

saudet · 2023-11-03T10:00:29Z

Where is that script?

HGuillemet · 2023-11-03T10:30:55Z

Just added to the repo

saudet · 2023-11-03T10:32:10Z

Ok, so can you please run that against master and make sure nothing changes? If something changes, please fix it so that nothing changes.

pytorch/include_list.pl

pytorch/src/main/resources/org/bytedeco/pytorch/presets/torch_include.h

pytorch/README.md

pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java

HGuillemet mentioned this pull request Oct 17, 2023

Pytorch preset request: add access to c10::cuda::CUDACachingAllocator #1422

Closed

saudet requested a review from sbrunk October 20, 2023 22:44

sbrunk mentioned this pull request Oct 27, 2023

New snapshot - CUDA error: the provided PTX was compiled with an unsupported toolchain. sbrunk/storch#62

Closed

sbrunk mentioned this pull request Oct 29, 2023

Update to PyTorch 2.1.0 sbrunk/storch#64

Merged

1 task

HGuillemet and others added 10 commits November 3, 2023 08:48

Add TensorBase.data_ptr_byte

2a89d39

Skip not exported CUDACachingAllocator::format_size

db626e1

Map generic data loaders

99dbdad

Accept Java arrays for primitive ArrayRef

1dc9d4f

Fix get_batch argument type

abf565b

Remove GatheredContextSupplier.java

8499486

Restore missing classes from torch::jit

95496c6

Update CUDA library paths to 12.3

fe140fd

Try to update CUDA archs to "5.0;6.0;7.0;8.0+PTX" for PyTorch

4ffcc18

Try to update CUDA archs to "5.0;6.0;7.0;8.0;9.0" for PyTorch

49668cb

HGuillemet added 2 commits November 3, 2023 11:28

Add item_byte and data_ptr_bool

2dfcc32

Add include_list.pl

4fc9e28

HGuillemet force-pushed the master branch from 9e08093 to 4fc9e28 Compare November 3, 2023 10:29

saudet reviewed Nov 3, 2023

View reviewed changes

pytorch/include_list.pl Show resolved Hide resolved

Restore parse order of 2.0.1

d1e473d

saudet reviewed Nov 4, 2023

View reviewed changes

pytorch/src/main/resources/org/bytedeco/pytorch/presets/torch_include.h Show resolved Hide resolved

saudet reviewed Nov 5, 2023

View reviewed changes

pytorch/README.md Outdated Show resolved Hide resolved

saudet reviewed Nov 5, 2023

View reviewed changes

pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java Outdated Show resolved Hide resolved

HGuillemet and others added 4 commits November 6, 2023 17:53

Make register_module generic

df1e13e

Revert renaming of torch::jit::load

6fcfb80

Revert change in README concerning register_module

18dda32

Update CHANGELOG.md and fix nits

9a7e6c2

saudet approved these changes Nov 10, 2023

View reviewed changes

saudet merged commit 5507552 into bytedeco:master Nov 10, 2023
6 checks passed

HGuillemet mentioned this pull request May 23, 2024

[PyTorch] Training is very slow on Linux. #1504

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Update to 2.1 #1426

[PyTorch] Update to 2.1 #1426

HGuillemet commented Oct 16, 2023 •

edited

Loading

saudet commented Oct 20, 2023

HGuillemet commented Oct 20, 2023

sbrunk commented Oct 22, 2023

HGuillemet commented Oct 22, 2023

saudet commented Oct 22, 2023

HGuillemet commented Oct 22, 2023

saudet commented Oct 22, 2023

HGuillemet commented Oct 24, 2023

HGuillemet commented Oct 24, 2023

saudet commented Oct 24, 2023 •

edited

Loading

saudet commented Oct 25, 2023

HGuillemet commented Oct 25, 2023

saudet commented Oct 25, 2023 via email •

edited

Loading

HGuillemet commented Oct 25, 2023

saudet commented Oct 25, 2023

sbrunk commented Oct 25, 2023

HGuillemet commented Oct 25, 2023

sbrunk commented Oct 25, 2023 •

edited

Loading

sbrunk commented Oct 27, 2023

sbrunk commented Oct 28, 2023 •

edited

Loading

HGuillemet commented Oct 28, 2023 •

edited

Loading

saudet commented Oct 30, 2023

saudet commented Oct 30, 2023 •

edited

Loading

sbrunk commented Oct 31, 2023

HGuillemet commented Oct 31, 2023

saudet commented Nov 1, 2023

saudet commented Nov 3, 2023 •

edited

Loading

HGuillemet commented Nov 3, 2023

saudet commented Nov 3, 2023

HGuillemet commented Nov 3, 2023

saudet commented Nov 3, 2023

[PyTorch] Update to 2.1 #1426

[PyTorch] Update to 2.1 #1426

Conversation

HGuillemet commented Oct 16, 2023 • edited Loading

saudet commented Oct 20, 2023

HGuillemet commented Oct 20, 2023

sbrunk commented Oct 22, 2023

HGuillemet commented Oct 22, 2023

saudet commented Oct 22, 2023

HGuillemet commented Oct 22, 2023

saudet commented Oct 22, 2023

HGuillemet commented Oct 24, 2023

HGuillemet commented Oct 24, 2023

saudet commented Oct 24, 2023 • edited Loading

saudet commented Oct 25, 2023

HGuillemet commented Oct 25, 2023

saudet commented Oct 25, 2023 via email • edited Loading

HGuillemet commented Oct 25, 2023

saudet commented Oct 25, 2023

sbrunk commented Oct 25, 2023

HGuillemet commented Oct 25, 2023

sbrunk commented Oct 25, 2023 • edited Loading

sbrunk commented Oct 27, 2023

sbrunk commented Oct 28, 2023 • edited Loading

HGuillemet commented Oct 28, 2023 • edited Loading

saudet commented Oct 30, 2023

saudet commented Oct 30, 2023 • edited Loading

sbrunk commented Oct 31, 2023

HGuillemet commented Oct 31, 2023

saudet commented Nov 1, 2023

saudet commented Nov 3, 2023 • edited Loading

HGuillemet commented Nov 3, 2023

saudet commented Nov 3, 2023

HGuillemet commented Nov 3, 2023

saudet commented Nov 3, 2023

HGuillemet commented Oct 16, 2023 •

edited

Loading

saudet commented Oct 24, 2023 •

edited

Loading

saudet commented Oct 25, 2023 via email •

edited

Loading

sbrunk commented Oct 25, 2023 •

edited

Loading

sbrunk commented Oct 28, 2023 •

edited

Loading

HGuillemet commented Oct 28, 2023 •

edited

Loading

saudet commented Oct 30, 2023 •

edited

Loading

saudet commented Nov 3, 2023 •

edited

Loading