[Pytorch] New version of the presets #1360

HGuillemet · 2023-05-22T21:01:11Z

Reorganization of the Pytorch presets.

Main improvements are

Handling of the virtual inheritance between torch::nn::Module and its subclasses. Avoiding segmentation faults when a function accepting torch::nn::Module or a Module instance method is called from a subclass instance.
Transparent and more rigorous handling of shared_ptr. This concerns Module and its subclasses, Tensor and a few less important classes. A consequence is that modules are now deallocated normally (they were never deallocated in previous version of the presets).
Module holders that are the C++ counterpart of the transparent handling of shared_ptr are not mapped any more.
Friend functions that were ignored are now mapped, giving access to additional operator overloads, like on iterators.
Module is now virtualized, meaning that if you override in Java its virtual member functions (train, is_training, to, zero_grad, save, load, pretty_print, is_serializable), your Java code will get called by libtorch.
Added support for Sequential, AnyModule, AnyValue.
Hopefully a better coverage of the API. Almost all what is available in C++ with:

#include <torch/torch.h>
#include <ATen/native/TensorShape.h>
#include <torch/csrc/jit/runtime/custom_operator.h>
#include <torch/csrc/jit/serialization/storage_context.h>
#include <torch/csrc/jit/serialization/import.h>
#include <ATen/cudnn/Descriptors.h>
#include <ATen/cudnn/Types.h>
#include <c10/cuda/CUDAGuard.h>

should be mapped, ... baring a lot of exceptions and a lot of missing template instantiations.

The CUDA versions of the library now includes PTX code for compute capability 7.0 and binary codes for 5.0 and 6.0. This means you need a Maxwell card minimum, and that at first launch, if you have a Volta or more recent card, the library will compile the PTX code of all libtorch kernels into binaries for your card. This can take 10 to 20 minutes. The binaries are stored in a cache for future launches. If your application still lag at startup after first launch, this means your cache size is too low. Try to increase it by setting environment variable CUDA_CACHE_MAXSIZE to 2000000.
If this is not acceptable for your needs and you cannot afford the delay at first launch, or if you want to benefit from newer compute capabilities, you can install libtorch from pytorch.org and set the org.bytedeco.javacpp.pathsFirst system property to true.

Please give it a test and report here any problem, comment, missing API or other RFE.

saudet · 2023-05-23T00:55:03Z

Could you undo unnecessary changes in indenting?

Add missing ATen/ops/from_blob.h Add missing ATen/ops/tensor.h

HGuillemet · 2023-05-23T07:56:14Z

Could you undo unnecessary changes in indenting?

I reindented using rules closer, I think, that the ones you use.

Also added 2 missing includes.

saudet · 2023-05-23T11:52:51Z

Could you undo unnecessary changes in indenting?

I reindented using rules closer, I think, that the ones you use.

Also added 2 missing includes.

I mean, could you please not change the lines that you haven't touched? It makes it harder to review.

HGuillemet · 2023-05-23T12:28:14Z

You won't be able to review by comparing line by line with the previous version. Too much has changed.
I grouped the rules by categories, and added comments. I suggest that you simply read them and see if you have comments or if I got something wrong.

But please hold on, another commit coming. I realized some includes are still missing + I need to commit the gen directory so that the CI checks pass.

Edit: in fact, for a thorough review, it might be easier to check the difference in the generated files, after filtering out non significant changes.

Added parsing of files included from additional includes (Descriptors.h, Types.h...)

Remove _object Remove _compute_enum_name Remove _CopyBytesFunctionRegisterer

HGuillemet · 2023-05-23T23:29:39Z

I removed the includes depending on CUDA includes.
But, if needed, I guess there is a way to patch the include list depending on the availability of cuda, like you did for libraries.

saudet · 2023-05-24T04:22:16Z

The build is still failing on Windows. Please fix this!

HGuillemet · 2023-05-27T11:36:23Z

Pytorch tries a dlopen on nvfuser_codegen on my machine at startup and issues a warning because it does not found it.
I added this library to the list of preloads. No warning anymore. Was it the right thing to do ?

saudet · 2023-05-27T11:42:17Z

I had already added that library. The question should be, why did you remove it? Please don't remove stuff for no reason.

HGuillemet · 2023-05-27T12:26:15Z

The answer is that I did't remove anything. I have started this before your update for Pytorch 2.0 and I didn't merge your changes.

Well I did merge 2.0.0 but not 2.0.1

sbrunk · 2023-07-22T21:55:37Z

I just built this locally to get a better estimate how much I'll have to change when upgrading.

One thing I noticed is that sqrt for Tensor in global.torch is missing:

// aten::sqrt(Tensor self) -> Tensor
- @Namespace("at") public static native @ByVal Tensor sqrt(@Const @ByRef Tensor self);

Instead we have the complex variants:

@Namespace("c10_complex_math") public static native @ByVal @Name("sqrt<float>") FloatComplex sqrt(@Const @ByRef FloatComplex x);

@Namespace("c10_complex_math") public static native @ByVal @Name("sqrt<double>") DoubleComplex sqrt(@Const @ByRef DoubleComplex x);

Probably due to this mapping:

        infoMap.put(new Info("c10_complex_math::sqrt<float>").javaNames("sqrt"))
               .put(new Info("c10_complex_math::sqrt<double>").javaNames("sqrt"))

Is it possible to keep the original sqrt mapping somehow?

saudet · 2023-07-22T22:17:26Z

I don't think it's due to those, but if it works by adding one more for at::sqrt() as well, let's just do that?

HGuillemet · 2023-07-23T00:09:09Z

complex_math.h is parsed before ATen/ops/sqrt.h, where at::sqrt is defined.
So when ATen/ops/sqrt.h is parsed, because of this loop, sqrt qualified name is set to the first matching cppName it finds a info for. So c10_complex_math::sqrt in this case.

I added an explicit info for at::sqrt to prevent this. And I added the other missing complex math operators for good measure.

sbrunk · 2023-07-23T06:39:43Z

sqrt is working again. Thanks! I got Storch compiling now with minimal changes (just a few type names, additional parameters needed etc.).

I'm seeing a bunch of crashes now when running our tests, especially in IndexingSlicingJoiningOpsSuite.I think they are starting to become a nice regression test suite for the native bindings. :)

I've tried to isolate the crashing ops. Most seem to take lists of tensors in some form, like stack or cat:

import org.bytedeco.pytorch.*;
global.torch.stack(
  new TensorArrayRef(
    new TensorVector(
      AbstractTensor.create(0),
      AbstractTensor.create(1)
    )
  ),
  0
).print();

[info] # C  [libtorch_cpu.so+0x1de6e4f]  at::_ops::stack::call(c10::ArrayRef<at::Tensor>, long)+0x9f

The same as above, cat instead of stack:

[info] # C  [libtorch_cpu.so+0x1cc0a2c]  c10::detail::MultiDispatchKeySet::operator()(c10::IListRef<at::Tensor>)+0xcc

Calling stack variants like dstack, column_stack and hstack are also causing the crash. The other ops we're testing are all working fine.

saudet · 2023-07-23T06:46:16Z

Should we wait after that is fixed to merge this pull request?

sbrunk · 2023-07-23T07:32:47Z

From my side it's fine merging now and then fixing this in a separate PR.

I'd actually like to try with the CI snapshots to rule out that the issue is caused by me building locally.

HGuillemet · 2023-07-23T07:56:49Z

I can reproduce your crash, so it's not related to your local build.
@saudet, please give me some minutes before merging in case I quickly find out the reason for this.

HGuillemet · 2023-07-23T08:41:38Z

The ArrayRef constructor taking a Vector isn't mapped any more. So it's the "pointer-cast" constructor that is called instead in your case, causing the crash.
I'll try to re-add it but you can also use the constructor taking a Tensor and a length instead.

sbrunk · 2023-07-23T09:14:29Z

So you mean s.th. like this, relying on the fact that vectors are stored contiguously?

var vector = new TensorVector(AbstractTensor.create(0), AbstractTensor.create(1));
global.torch.stack(
  new TensorArrayRef(vector.front(), vector.size()),
  0
).print();

Seems to be working, I got all tests passing again. :)

HGuillemet · 2023-07-23T10:59:08Z

Or something like:

Tensor t = new Tensor(2);
t.put(AbstractTensor.create(0));
t.position(1).put(AbstractTensor.create(1));
torch.stack(new TensorArrayRef(t.position(0), 2)).print();

bytedeco/javacpp-presets#1360

pytorch/src/gen/java/org/bytedeco/pytorch/BlockArrayRef.java

sbrunk · 2023-07-25T15:49:25Z

This is great! Thanks for these improvements @HGuillemet!

Issue: bytedeco/javacpp-presets#1376 Fixed with bytedeco/javacpp-presets#1360

HGuillemet added 2 commits May 22, 2023 22:18

Reorganization, use of new JavaCPP features, more mapping

96e3a65

Add missing exports in module-info

9152ff6

Add 2 missing includes, reindent

a80ea2e

Add missing ATen/ops/from_blob.h Add missing ATen/ops/tensor.h

HGuillemet added 5 commits May 23, 2023 19:35

Add missing includes

809f2c3

Added parsing of files included from additional includes (Descriptors.h, Types.h...)

Remove 3 classes not in API

42e877c

Remove _object Remove _compute_enum_name Remove _CopyBytesFunctionRegisterer

Update gen

ad9e7d0

Fix Module::apply and JitModule::apply

03af32c

Remove includes needing CUDA installed

50c2d70

HGuillemet added 9 commits May 24, 2023 10:00

Fix windows build.

f11a5c4

Skip some "internal-only" functions

9001e16

Update gen

309b8f4

Fix make_generator for Windows

c499367

Move cuda-specific to torch_cuda

ebb9249

Add nvfuser to preloads

c64ff76

Exclude more non-exported symbols

8ec1916

gen update

534fe1b

cuda gen update

5b198c1

HGuillemet added 4 commits May 29, 2023 23:08

Merge 2.0.1 changes from master

7fb3268

Skip EnumHolder::is

7527850

Add include path for CUDA on Windows

2c293c5

Fix torch_cuda windows linking

d720854

Update CHANGELOG.md and fix nits

f8cd7ec

Add missing at::sqrt(Tensor) and other complex math operators

c14b39a

sbrunk added a commit to sbrunk/storch that referenced this pull request Jul 23, 2023

Adapt to the new version of the PyTorch presets

624bd0f

bytedeco/javacpp-presets#1360

HGuillemet added 3 commits July 24, 2023 01:09

Add ska::detailv3::log2 masked by last commit

2c4ff2d

Skip one-element constructor for all ArrayRef instances

8c16124

Add ArrayRef constructor taking a std::vector

f5cc0be

saudet reviewed Jul 24, 2023

View reviewed changes

pytorch/src/gen/java/org/bytedeco/pytorch/BlockArrayRef.java Show resolved Hide resolved

saudet approved these changes Jul 25, 2023

View reviewed changes

saudet merged commit d370dbc into bytedeco:master Jul 25, 2023
6 checks passed

HGuillemet deleted the hg_pytorch branch July 25, 2023 13:30

sbrunk added a commit to sbrunk/storch that referenced this pull request Jul 26, 2023

Remove workarounds for cusolver loading issue

e2c2825

Issue: bytedeco/javacpp-presets#1376 Fixed with bytedeco/javacpp-presets#1360

This was referenced Jul 26, 2023

CUDA support in PyTorch broken after update to 1.5.9 stable #1376

Closed

New pytorch presets sbrunk/storch#46

Merged

HGuillemet mentioned this pull request Aug 2, 2023

[pytorch] how to train use nvidia gpu in windows system with javacpp pytorch #1397

Closed

saudet mentioned this pull request Aug 8, 2023

[Pytorch] Fix StringSupplier and TensorIdGetter returning pointers to local variables #1391

Merged

sbrunk mentioned this pull request Aug 15, 2023

Calling methods on pytorch Generator causes segfault #1259

Closed

jxtps mentioned this pull request Aug 31, 2023

Production issues with Pytorch 2.0.1-1.5.9 #1409

Closed

HGuillemet mentioned this pull request Oct 17, 2023

[pytorch] Can't get byte data from torch.ScalarType.Byte tensor #1321

Closed

sbrunk mentioned this pull request Oct 22, 2023

[PyTorch] Update to 2.1 #1426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pytorch] New version of the presets #1360

[Pytorch] New version of the presets #1360

HGuillemet commented May 22, 2023 •

edited

Loading

saudet commented May 23, 2023

HGuillemet commented May 23, 2023

saudet commented May 23, 2023

HGuillemet commented May 23, 2023 •

edited

Loading

HGuillemet commented May 23, 2023

saudet commented May 24, 2023

HGuillemet commented May 27, 2023

saudet commented May 27, 2023 via email

HGuillemet commented May 27, 2023 •

edited

Loading

sbrunk commented Jul 22, 2023

saudet commented Jul 22, 2023

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 23, 2023 •

edited

Loading

saudet commented Jul 23, 2023

sbrunk commented Jul 23, 2023

HGuillemet commented Jul 23, 2023

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 23, 2023 •

edited

Loading

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 25, 2023

[Pytorch] New version of the presets #1360

[Pytorch] New version of the presets #1360

Conversation

HGuillemet commented May 22, 2023 • edited Loading

saudet commented May 23, 2023

HGuillemet commented May 23, 2023

saudet commented May 23, 2023

HGuillemet commented May 23, 2023 • edited Loading

HGuillemet commented May 23, 2023

saudet commented May 24, 2023

HGuillemet commented May 27, 2023

saudet commented May 27, 2023 via email

HGuillemet commented May 27, 2023 • edited Loading

sbrunk commented Jul 22, 2023

saudet commented Jul 22, 2023

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 23, 2023 • edited Loading

saudet commented Jul 23, 2023

sbrunk commented Jul 23, 2023

HGuillemet commented Jul 23, 2023

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 23, 2023 • edited Loading

HGuillemet commented Jul 23, 2023

sbrunk commented Jul 25, 2023

HGuillemet commented May 22, 2023 •

edited

Loading

HGuillemet commented May 23, 2023 •

edited

Loading

HGuillemet commented May 27, 2023 •

edited

Loading

sbrunk commented Jul 23, 2023 •

edited

Loading

sbrunk commented Jul 23, 2023 •

edited

Loading