Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/joml #3811

Merged
merged 8 commits into from
Feb 16, 2020
Merged

Feature/joml #3811

merged 8 commits into from
Feb 16, 2020

Conversation

pollend
Copy link
Member

@pollend pollend commented Dec 23, 2019

This PR ports the Terasology over to JOML vectors instead of termath. I think I break some of the shading effects with this port because god rays don't work quite correctly. some of the matrices are not functioning how I'm expecting them to or methods with similar parameters and breaking. I would like to add more test to verify the camera more but this is a good first step. I moved the display width and height into lwjldisplay so it should allow more better testing.

  • LwjglDisplayDevice.java - Moved display width and height in LWjgl display for testing.
  • JomlUtil.java - Utility to convert from termath < -- > joml
  • Material.java - additional joml methods we're added to allow for either joml or termath

A lot of small changes with nodes that translates between termath and joml.

problems

  • mul method in termath takes two vectors multiplies them and sets the original vector. mul in joml takes the current vector multiplies it by the first and uses an optional 3rd to set the vector.

  • for quaternion angle will offset it by (2pi - angle) if it's greater then 0 and this seems to break camera rotation

  • god rays don't seem to function correctly

  • for floatbuffers terasology uses matrices in row - major and not column major so getTransposed it used instead.

@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

1 similar comment
@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

@Cervator Cervator added Api Topic: Architecture Requests, Issues and Changes related to software architecture, programming patterns, etc. Category: Performance Requests, Issues and Changes targeting performance Topic: Rendering Requests, Issues and Changes related to lighting, meshes, camera, etc. labels Dec 23, 2019
@GooeyHub
Copy link
Member

Hooray Jenkins reported success with all tests good!

@PAndaContron
Copy link
Member

I tested the performance of this branch and the current develop branch with FlightRecorder. I created a world with default settings in each branch and tested walking and looking around. Although with something this light, the benefits aren't directly seen in the CPU or heap usage, methods relating to vectors definitely run faster. Most notably, in the current develop branch, the method Region3i.encompasses had a sample percentage of 10.6%, indicating that it was running for about 1/10 of the game's total execution, while in this branch, it's down to 2.25%. Most other vector-related methods are down as well, some more so than others. Although I didn't test it very rigorously (unit tests might be helpful), it definitely appears at first glance that this helps with performance.

Files

@httpdigest
Copy link

httpdigest commented Dec 25, 2019

Adding to my above review comments: To get the absolute best performance out of JOML, especially with its matrices implementations, it's important to understand an underlying optimization which is unique to JOML: Matrix methods have various overloads and internal implementations of operations, such as multiplying two matrices, that make use of certain matrix properties (identity, orthogonality, affinity, perspective or orthogonal projection, translation-only) when those properties are known. So, when calling Matrix4f.mul(Matrix4f) what actually happens in the most cases is not a full 4x4 matrix multiplication, because most of the time this is not necessary. For this to work, though, calling the matrix element setters (mNN()) directly should be avoided as much as possible and instead calling other matrix methods such as lookAt(), scale(), rotate(), perspective(), ... etc. should be preferred. In doing so, important matrix properties are preserved over multiple method calls to allo JOML to always route to the most optimized internal methods for all operations on matrices. This actually does have significant performance improvements even when not doing a full 4x4 multiplication but a 4x3 (four columns, three rows) for affine matrices.
The next step in performance optimization is to completely avoid calling mul() and instead use the post-multiplying methods translate(), rotate(), scale(), lookAt(), perspective(), ... as much as possible, which do internal optimized matrix multiplications.

EDIT (2019-12-26): Also, while Terasology has chosen to use the DirectX approach of row-vector matrices (with a matrix vector multiplication of v * M) and not the OpenGL/Vulkan standard of column-vector matrices (with M * v multiplication), there is still no need to swap rows for columns and transposing or actually accessing individual matrix elements in a JOML matrix.
All operations, such as matrix-matrix or matrix-vector multiplications are agnostic to whether the image of a basis vector transformation is stored in the columns or in the rows of a matrix.
Simply follow the convention of projection * view * model (* vector) and use get(buffer) instead of getTransposed(buffer).
Also, this will allow JOML to take advantage of the optimizations when handling matrices with known properties.

@pollend
Copy link
Member Author

pollend commented Dec 25, 2019

I think at the moment I'm just looking to have the same functionality between the old termath and this move over. Will have to keep that in mind when we come back to these classes and make sure that we use these methods. There are other secondary libraries such as the bullet wrapper that will need to be considered as we progress. This is a step by step process ensuring that I don't break anything too badly.

@httpdigest
Copy link

Sure. If you need anything in JOML to ensure feature parity, then please do not hesitate to open an issue at JOML.

@GooeyHub
Copy link
Member

GooeyHub commented Jan 1, 2020

Hooray Jenkins reported success with all tests good!

@httpdigest
Copy link

Hi, I had a quick look at what it would take to also transform https://github.com/MovingBlocks/TeraBullet away from javax.vecmath towards using JOML. I think in order to reduce the amount of errors in manually transforming the vector calls from vecmath to JOML, an automatic approach (such as via https://github.com/javaparser/javaparser) would be better here, since the whole thing is a very mechanic task.
Most notably changing any vecmath a.mul(b, c) or a.add(b, c) calls to their JOML equivalents b.mul(c, a) and b.add(c, a).
This could be accomplished with a simple javaparser AST transformer visitor instead of doing this manually (with IDE caller identification support).

@Qwertygiy
Copy link
Contributor

Qwertygiy commented Jan 3, 2020

Here are some Flight Recorder readings I took of launching Terasology, creating a world with the Core Gameplay preset and the seed "Test", pausing for a few minutes, and then looking and wandering around, in this branch and my biomework branch.

Of particular note is the BaseVector3i init() method, which jumps up from being sampled around 0.25% of activity, to being sampled over 6% of activity in this branch.

Terasology_jfr.zip

@pollend
Copy link
Member Author

pollend commented Jan 3, 2020

@httpdigest there is another repository with a native bullet wrapper using JNI, but there seems to be some performance problems I'll have to look but in the meantime I think I'll look into fixing up terabullet. There is quite a bit of overhead with querying blocks and transitioning vector3 to vecmath.

@httpdigest
Copy link

@pollend alright, by the way I am evaluating Java 9's java.lang.Math.fma() and it yields a good 20-30% performance increase in all methods I am using it. The JIT is consistently producing nice x86 VFMADD231SS for it instead of VADDSS and VMULSS. All three instructions, VFMADD231SS, VADDSS and VMULSS have a latency of 4-5 cycles each on all somewhat recent x86 architectures, so using VFMADD231SS alone halves the total latency across a single instruction dependency chain.
The next JOML 1.9.21 release will make use of it across the board, of course with a non-performance-degrading fallback to simple manual a * b + c when java.lang.Math.fma() is not available.
So, from a performance perspective it would make sense to use JDK9+ for Terasology.

@Cervator Cervator added this to the v2.3.0 milestone Feb 16, 2020
@Cervator
Copy link
Member

Tested this out locally, couldn't spot any obvious critical issues, at least related to this PR :-)

Merged and I've submitted #3832 to help us keep track over some related bits until this saga is ready to close. Thank you @pollend and others chiming in! Plus @httpdigest for making it possible in the first place, +1 users for your lib 👍

Cervator added a commit that referenced this pull request Feb 16, 2020
@Cervator Cervator merged commit 22b41d4 into MovingBlocks:develop Feb 16, 2020
@pollend pollend deleted the feature/joml branch February 17, 2020 01:47
@skaldarnar skaldarnar removed the Api label May 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Performance Requests, Issues and Changes targeting performance Topic: Architecture Requests, Issues and Changes related to software architecture, programming patterns, etc. Topic: Rendering Requests, Issues and Changes related to lighting, meshes, camera, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants