Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNI improvements #886

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open

JNI improvements #886

wants to merge 73 commits into from

Conversation

javagl
Copy link
Contributor

@javagl javagl commented Mar 26, 2024

Addresses #880

I know, the title is too generic. It would be better to clearly lay out a work plan and create a clean sequence of self-contained, specific PRs. But there are just too many (unrelated) things to do. "You can't cross a chasm in two small jumps". The first pass has to be tackled somewhat holistically. The result may not be as 'good' as I'd like it to be, but I hope that it will be 'better' along many dimensions. If this is not considered to be a viable approach, then that's OK. Then I'll just create my own repo with JNI bindings for KTX, with blackjack and error handling.


The preliminary task list that I tried to create in the linked issue is now replicated here, and I'll try to update it and extend it as we move on:

  • Wrap up Properly convert Java char to C char in inputSwizzle #876
  • Wrap up Expose supercompression functions via JNI #879
  • Expose CreateFromMemory
  • Improve the use of constants
    • There should be documentation for these constants, and stringFor ... methods to convert them to strings
    • Updates:
      • Aligned the names of all constants with the native version

      • Aligned the names of constants with the other bindings (see JNI improvements #886 (comment) )

      • Added documentation and string functions for all constants, except for...

        • KtxInternalformat
        • VkFormat

        (See Synchronizing auto-generated files with bindings #887 )

      • NOTES:

        • The ktxTexture2.supercompressionScheme does not appear in the native documentation
        • The constants for KtxPackAstcEncoderMode are not documented on the native side
  • Fix the KtxTestLibraryLoader to also work on Windows
  • The JNI class- and field initializations should only be done once. Right now, things like env->GetFieldID(ktx_texture_class, "instance", "J"); are called in each function. The jfieldID should be obtained once initially.
  • (continuously) Error checks, error checks, error checks ... (no JVM crashes, period!)
    • Updates (I did a first pass for this, but will keep an eye on places where checks have to be added):
      • Passing null to any method should throw a NullPointerException.
      • Trying to use a texture after destroy() was called should throw an IllegalStateException.
      • When a JNI function causes an error (like an out-of-memory error), then the native functions have to return immediately, because there is a pending exception. This is not done in all places yet...
      • Java methods that do not return error codes (like the create... methods) now generally throw a KtxException when the implementation caused an error code that was not KTX_SUCCESS
  • (continuously) Comments, comments, comments...
    • Updates:
      • I have added basic JavaDocs, and enabled the creation of the JavaDocs in the POM.
  • Add bindings for ktxErrorString

Not (yet) supposed to be addressed here:

  • Try to find a clean solution for JAR release to Maven Central #624

  • There are a few cases where we might have to think about how to handle the absence of unsigned types in Java. For example, when calling createInfo.setBaseWidth(-123); // INVALID!, then this value is implicitly converted into a ktx_uint32_t.

  • Update the documentation of deflateZstd and deflateZLIB
    EDIT: This was now done as part of Expose supercompression functions via JNI #879, but only for the native side and for the Java bindings (other bindings are still missing the doc update or the functions)

    • As pointed out in a comment, this documentation should include the information that 'the data pointer is updated'. This will go into the native documentation first, but it will also have to be passed through to the python bindings, the Java binding, and maybe to the JS binding. But... the deflateZLIB function is still missing in the Python binding (and the deflate* functions are just about to be added to the Java binding), so ... this may have to be tracked and addressed separately
  • Improve thoroughness of deflate* tests by comparing the deflated values

    • As pointed out in another comment, the unit test for the deflate* functions should compare the result of inflating the data again to the original input. The inflate* methods are not yet exposed (anywhere), so this may have to be deferred for now
  • Add test for input swizzling in ASTC

    • The PR for fixing the swizzling in JNI includes a test for the swizzling for Basis. It does not include one for ASTC. These are different code paths, and should both be tested. But right now, I'm not sure what could be checked for the ASTC result (see this PR comment for context).
  • Consider consistent formatting. I know, it's a detail, but just auto-formatting everything with the Eclipse default could already be an improvement. I do have strong preferences for my own projects, but here, the main point is that the formatting should be consistent...

  • The ktxHash... functionalities are not yet mapped through from the native side.

    • Consider following the new JS binding here. It adds findKeyValue, addKVPair and deleteKVPair methods to the ktxTexture2 object rather than exposing ktxHashList.

@javagl
Copy link
Contributor Author

javagl commented Mar 26, 2024

I locally started a few updates, including some basic error handling, so that things like KtxTexture2.create(null, ...) will no longer crash the JVM, but throw a NullPointerException instead. (Many more error checks to add...).

One thing that could/should be done next: The KtxTexture class has a destroy() method, documented like this:

    /**
     * Destroy the KTX texture and free memory image resources
     *
     * NOTE: If you try to use a {@link KTXTexture} after it's destroyed, that will cause a segmentation
     * fault (the texture pointer is set to NULL).
     */
    public native void destroy();

Now, the goal is: No segmentation faults. And we'll probably not be able to completely avoid this destroy() method. So the question is how to handle that.

I think that trying to use a KtxTexture after destroy() was called should also throw an exception - maybe an IllegalStateException. I'll probably draft this (with the type of the exception being easy to change). But it will affect basically each and every native function, which has to check whether the thiz parameter has been destroyed...

@javagl
Copy link
Contributor Author

javagl commented Mar 26, 2024

There currently are two functions in the KtxTexture... class that receive "data" in the form of byte arrays:

  • public native int setImageFromMemory(int level, int layer, int faceSlice, byte[] src);
  • public static native KtxTexture2 createFromMemory(ByteBuffer byteBuffer, int createFlags);

(The latter has been added as part of this PR)

Many libraries that are using JNI expect direct ByteBuffer instances as the input for all functions, for a variety of reasons. But I'm strongly in favor of additionally offering the conveniece of a byte[] array. This does come at a cost for the implementor. The different cases of arrays and direct (or non-direct...) buffer have to be handled, but ... this is doable (some utility functions to fetch and release the data are shown in the draft for createFromMemory).

On the Java side, the ByteBuffer-based signature is more versatile. Even if someone has a byte[] array, this can trivially be wrapped into a ByteBuffer at the call site:

byte array[] = ...;
t = KtxTexture2.createFromMemory(ByteBuffer.wrap(array), ...);

But as a middle-ground, I'd suggest to...

  • introduce an additional setImageFromMemory(..., ByteBuffer src) method (keeping the one that accepted the byte[] array, for backward compatibility and convenience)
  • Add a createFromMemory(byte array[]...) method, for consistency with the two flavors of setImageFromMemory

An aside: The documentation of the setImageFromMemory currently says

* Set the image data - the passed data should be not modified after passing it! The bytes
* will be kept in memory until {@link KTXTexture#destroy} is called.

This statement is not true, and will be updated accordingly.


Remotely related: There currently is a function public native byte[] writeToMemory(); which returns a byte[] array. This probably makes sense, but the function itself has to be reviewed and tested.

@MarkCallow
Copy link
Collaborator

* Set the image data - the passed data should be not modified after passing it! The bytes
* will be kept in memory until {@link KTXTexture#destroy} is called.

What does "passed data" refer to here? Is it the data at the pointer location that was passed to setImageFromMemory? Not that it matters since you say it isn't true. Indeed the data is copied into the ktxTexture object.

@javagl
Copy link
Contributor Author

javagl commented Mar 27, 2024

What does "passed data" refer to here? Is it the data at the pointer location that was passed to setImageFromMemory? Not that it matters since you say it isn't true. Indeed the data is copied into the ktxTexture object.

There are multiple "layers" here, and this revolves exactly about the question where data is copied. Imagine the following Java pseudo code, similar to what could appear in the KTX JNI bindings:

byte input[] = new byte[10];

// Create a texture from the input
KtxTexture2 texture = KtxTexture2.createFrom(input);

// XXX Will this affect the result?
input[5] = 123;

// Encode the input data that was given earlier
texture.encode();

The comment suggested that this input[5] = 123 would be "visible" for the 'encode' function that is called later. This would mean that the native library would store "a pointer" to the Java byte[] array. Something like this is actually possible, but with many, many, caveats - and here, it certainly is not the case.

@javagl
Copy link
Contributor Author

javagl commented Mar 27, 2024

A detail @ShukantPal : Currently, the bindings are set up for Java 11. I don't see a strong reason for that. Unless there is a reason, I'd suggest lowering the version requirement to "the lowest version that works" (and I think that 8 should do it).

@MarkCallow
Copy link
Collaborator

@javagl, PR #874 extends the Javascript binding to provide full libktx functionality. As part of it I went over the naming. I made the names consistent with the libktx names and I ended up removing all the ktx and KTX_ prefixes. This is because the module that contains the binding is typically called ktx. Please take a look. Perhaps it can inform the naming here too. If we change any names that exist in the current binding we should provide the old names as aliases.

I suggest asking @ShukantPal to review this as he is the author of the original binding.

The bullet point list at the top is a good guide. Obviously there are items to be completed. I think a short call would be good but I want to first familiarize myself with the code. I'll let you know when I've done that and we can then plan the call.

@javagl
Copy link
Contributor Author

javagl commented Jun 6, 2024

Thanks @MarkCallow , I'll try to allocate some time looking over the changes from the linked PR. This PR should be a step in the direction of aligning the JNI bindings with KTX itself. We can decide later whether some missing parts (like bindings for the ktxHash... family of functions) will become part of this PR or a dedicated, new one.

Perhaps it can inform the naming here too. If we change any names that exist in the current binding we should provide the old names as aliases.

I'm in favor of aligning names insofar that they should be recognizable as their native counterpart. There are some conventions in the respective target languages, though. For example, I wouldn't use something else than a UPPER_SNAKE_CASE for constant names or something else than CamelCase for class names in Java, but the details can still be sorted out. (E.g. the removal of the KTX_ prefix mentioned above)

@MarkCallow
Copy link
Collaborator

@javagl, I've been reviewing the discussion here and some of the code including the commits you specifically called out in comment of April 18th. The code I've seen looks very good.

I added a note to the description suggesting following the JS binding handling of metadata queries.

Added stringFor-functions that take the respective enum constants, and return their string representation (e.g. 332510f ). Note that this only returns the string representation, and not the elaborate description that some of the functions in the native layer return

Can't you call the native function(s) from the stringFor function to get the elaborate description. I'd particularly like to have the full descriptions of the error codes. Are there any beyond ktx_error_code_e where the native side has a convert-to-string function?

I am happy you have replaced the crashes with exceptions.

I have actively been using the resulting library in a small dummy application where you can drag-and-drop a PNG file, see a preview of the PNG and its KTX version, drag around some sliders for the compression/quality (and see a live preview). So I'm confident that this "mostly works". But there are still a few things to wrap up.

Any plan to release this? Does the preview show you both pre- and post-compression images so you can easily compare them?

Some of the JNI implementations still have to be reviewed to ensure that they don't perform operations with pending exceptions, and maybe run with -verbose:jni to spot further possible pitfalls that already existed in the code

Which implementations?

There should be a version of setImageFromMemory that does not accept a byte[] array, but a ByteBuffer as the last parameter (mentioned in #886 (comment))

I agree a version that accepts ByteBuffer is needed.

The open points of the bullet point list from the first post

Yes.

But maybe this can already go through a first review pass, based on the current state.

There is certainly enough here to review. I want to spent a little more time familiarizing myself with the code before we arrange that call.

Note that all GitHub Actions Windows builds are currently broken due a bad GHA runner image. Many tests, including the Java binding tests fail with segmentation violations.

@javagl
Copy link
Contributor Author

javagl commented Jun 11, 2024

Thanks for the detailed review.


I added a note to the #886 (comment) suggesting following the JS binding handling of metadata queries.

I haven't looked at the detals of the hash-related functions. If it's "only" about exposing 3 functions, then it might be added here, to be on par with other bindings, but if it's too much, it might make more sense to carve it out into a follow-up PR.


Can't you call the native function(s) from the stringFor function to get the elaborate description. I'd particularly like to have the full descriptions of the error codes. Are there any beyond ktx_error_code_e where the native side has a convert-to-string function?

There are two levels here:

  1. A function of obtaining a string version of the constant itself
  2. A function for the actual description of the meaning of the constant

Until now, I only addressed 1.: Each "enum" has a stringFor function. This is particularly useful for the KtxErrorCode, where you could have
System.out.println("Error " + errorCode); // Prints '16' - what is that?
vs
System.out.println("Error " + KtxErrorCode.stringFor(errorCode)); // Prints 'UNSUPPORTED_TEXTURE_TYPE'
But similarly applies to all other "enum" types.

Now, I saw that there is the ktxErrorString function in the native layer. (And I think that this the only function of this kind in the native layer, but will have to check again to be sure). And one could make a case for exposing that via JNI, to be able to do something like
System.out.println("Error " + KtxErrorCode.errorString(errorCode)); // Prints "Texture type not supported."
I think that this does not add soooo much value beyond UNSUPPORTED_TEXTURE_TYPE, but can add this, if desired.


I have actively been using the resulting library in a small dummy application

Any plan to release this? Does the preview show you both pre- and post-compression images so you can easily compare them?

Until now, this was mainly for internal experiments and testing. I think that it might eventually have some aspects that are "useful", and that it could be worth being published. We'll probably have to sort out some of the "Maven Release"-related questions before this can be done, but maybe there can be an "early sneak preview" release.

The goal was to have some sort of "interactive preview". (As 'interactive' as it can be, when encoding an image takes 5 seconds...). I did create a short screengrab of that a while ago (yeah... as a GIF... but), maybe it shows the overall idea:

Khronos KTX Sneak Preview 00002


Some of the JNI implementations still have to be reviewed to ensure that they don't perform operations with pending exceptions, and maybe run with -verbose:jni to spot further possible pitfalls that already existed in the code
Which implementations?

I did try to keep an eye on that for every function that I "touched", but did not systematically go through all function from top to bottom. It boils down to the following pattern:

When any JNI function that is called with env->... causes an exception, then it is not allowed to call any other JNI function. Instead, the function must return immediately (and leave the handling of the pending exception to the Java layer).

(The only functions that may be called with a pending exception are the ones like env->ExceptionCheck() or certain env->Release... functions. Details in the specification).

To check this manually, I'd have to read every function and see whether it exposes this pattern (calling a function when an exception might be pending). It can be automated to some extent, with the ‑Xcheck:jni verbose, but this only covers the case where such an exception actually appears....


I still have to look closer at the other PR, but will try to at least pick up the work here (syncing with master, addressing some of the smaller points mentioned above...) in the next days.

@MarkCallow
Copy link
Collaborator

There is certainly enough here to review. I want to spent a little more time familiarizing myself with the code before we arrange that call.

I am now back to familiarizing myself with the more of this code. Sorry for the looooong delay. Can we arrange a call towards the end of next week, FaceTime or Skype? I want to get this wrapped up.

Regarding constant naming. both the JS and Python bindings remove KTX_ and all other parts of a name that are reflected in the class or enum name. (The sole slight exception I've spotted is e.g. KTX_PACK_ASTC_BLOCK_DIMENSION_4x4, where dimension is in the enum name (JS) or class name (Python) but due to the numerical tail an alphabetic character is needed. Python is using D4x4. JS is using d4x4. I will change the latter since it is not yet released. I wonder though if it should be D_4x4 or DIMENSION_4x4.) Java should follow the same model for constants.

Another aspect of naming is the enum or class name. In the JS binding I have used the enum or struct name from the C API minus the ktx or KTX_ prefix. enums are therefore lower_snake_case and structs, classes in JS, are CamelCase. Python has used KtxCamelCase for everything. We need to discuss and regularize how to treat these names.

Please merge the latest main into this branch.

You will almost certainly get a failure in the first Travis-CI job of a build which is a Mac build. This is due to now using vcpkg for dependencies for the load test apps. The dependencies will not be cached in the initial build. The resulting extended log length from building them seems to be hitting a Travis-CI limit.

@javagl
Copy link
Contributor Author

javagl commented Aug 17, 2024

I have updated with the main state, and added a few exception checks that had been pointed out by running it with Xcheck:jni.


For the constants: I'm OK with either pattern.

  • the full name KTX_<whatever>_<value> or
  • omitting KTX, as in <whatever>_<value>, or
  • the short one, just <value>

As long as it is consistent, I don't have a strong opinion. It can either be aligned with the C name (i.e. the long form), or the short name that is used in other bindings.

Java does not differentiate between "struct" and "class". Java does have enum, but this is currently not used. From the perspective of the user, the difference between a class (with public static final int CONSTANTS = ...) and an enum is negligible. For the names of classes, there is a strong convention to use CamelCase, and we should not deviate from that.


Can we arrange a call towards the end of next week, FaceTime or Skype? I want to get this wrapped up.

If you don't mind, I'll send you a mail to arrange the details.

@MarkCallow
Copy link
Collaborator

If you don't mind, I'll send you a mail to arrange the details.

Certainly. I did not intend to arrange that through this public forum.

@javagl
Copy link
Contributor Author

javagl commented Aug 21, 2024

The ktxErrorString function that was mentioned in an earlier comment ( #886 (comment) ) should also be part of the Java API. It should be simple to add, and can become part of this PR as well.

@javagl
Copy link
Contributor Author

javagl commented Aug 24, 2024

The last sequence of commits are...

  • "Update ... names" commits: These are changing the names of constants, to omit any redundant prefixes. For example, KtxPackAstcBlockDimension.KTX_PACK_ASTC_BLOCK_DIMENSION_D4x4 is now just called KtxPackAstcBlockDimension.D4x4, which is aligned with the naming in the JavaScript- and Python bindings
  • Added a binding for ktxErrorString. It's a judgement call where and how to add this. In the native layer, it's a "free-floating" function that does not belong anywhere. On the Java side, I thought that it would make sense to offer it as KtxErrorCode.createString(int).
  • A minor fix for ktxErrorString on the native side

The last one appears to be related to the (open) question about how to handle the absence of unsigned types in Java. But strictly speaking, it isn't: Even in the native layer, something like
const char* s = ktxErrorString(-1234);
would have caused ... Undefined Behavior, I guess.

Visual Studio even points this out:

Khronos Ktx Error enum

So I changed that check to

    if (error < 0 || error > KTX_ERROR_MAX_ENUM)
        return "Unrecognized error code";
    ...

@javagl
Copy link
Contributor Author

javagl commented Aug 24, 2024

I think that the build failure should already have been sorted out - right now, it marks some actions as 'failed' because they have been cancelled (maybe because I edited the first post while the build was running?).

In doubt, I could push ... "some change" ... to trigger a re-run (because I apparently cannot request a re-run manually).

Apart from that, this PR should be 'Ready for review' (which might be short, because intermediate states have already been reviewed and discussed).

Miscellaneous:

  • At some point, the change log may have to include a hint that the Java bindings have been ~"incompatibly refactored". I think this is not a big deal, because there probably haven't been so many people been using the previous state (and those who did hopefully appreciate the changes, even though some of them may be "breaking" in the strictest sense)
  • The open points from the first comment of this PR (Maven release, formatting, ....) will be promoted into dedicated issues where appropriate

@javagl javagl marked this pull request as ready for review August 24, 2024 17:07
@MarkCallow
Copy link
Collaborator

In GitHub Actions when a job in a workflow fails the remaining jobs are cancelled. The VS2019 job in the Windows workflow is failing. I just tried re-running this job but it still fails in the same place: problems creating the Javadoc. I will try rerunning the cancelled jobs. Some of the Linux build jobs are failing for a similar reason. I did not try to re-run them. I'm don't know yet why some macOS jobs are failing.

When you add more commits to a PR, existing queued builds are cancelled, running builds may be cancelled. A new build is started for the new commit when an existing job is cancelled or finishes. All this means in this case that the errors remain in the code of the most recent commit because a build was done using it.

@MarkCallow
Copy link
Collaborator

The failures in the Emscripten jobs look to be due to an updated clang in the latest Emscripten Docker image. Research is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants