Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SDF LevelSet #187

Merged
merged 46 commits into from
Sep 1, 2022
Merged

Adding SDF LevelSet #187

merged 46 commits into from
Sep 1, 2022

Conversation

elalish
Copy link
Owner

@elalish elalish commented Aug 18, 2022

Fixes #129

I've implemented my own take on Marching Cubes for making manifold meshes from signed-distance functions. It's a form of Marching Tetrahedra on a body-centered cubic grid, which ensures all tetrahedra are identical, maximum quality, and symmetric. It also ensures all output is manifold without needing to deal with edge cases.

I've implemented this for the GPU (of course it works on the CPU as well) because the computation can get pretty intense. The interface is intended to mimic Thrust with its use of functors.

These new tests also unearthed an existing and very tricky bug, whose fix is also included here.

@elalish elalish self-assigned this Aug 18, 2022
@elalish elalish requested a review from pca006132 August 18, 2022 20:16
@fire
Copy link
Contributor

fire commented Aug 18, 2022

So if I have a random glb I may be possible to test this?

Do I need to generate sdfs in some form? What form is that?

@elalish
Copy link
Owner Author

elalish commented Aug 19, 2022

So if I have a random glb I may be possible to test this?

Do I need to generate sdfs in some form? What form is that?

There's documentation in sdf.h for the SDF class. Any function that takes a vec3 and returns a float will do. Turning an arbitrary GLB into an SDF is pretty non-trivial - that's not really what it's for. It's about creating complex surfaces from mathematical functions. Try out the sample.

Copy link
Collaborator

@pca006132 pca006132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about readability and use of atomics. Haven't yet look into details of the algorithmic part.

#else
std::atomic<Uint64>& tar = reinterpret_cast<std::atomic<Uint64>&>(target);
Uint64 old_val = tar.load();
tar.compare_exchange_weak(compare, val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_exchange may fail. It will fail when there are interleaved CAS (the latter one will fail), and will fail spuriously when compare_exchange_weak is used (like when the cache line is touched, as it can be implemented with better efficiency in some architecture like arm with exclusive monitor).

Generally, when you just want to update the value atomically, you have to do it in a loop, repeat the CAS until the return value is true (updated successfully).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm still a little fuzzy on atomics. I've been impressed how fast they are in CUDA even when hit pretty often. I'll change that for the CPU side.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems that the implementation of atomic in CUDA is different from that of CPU. Actually, the slow thing about atomic is about cache update across cores, I guess this is a bit different in GPU.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I've just read about this now. It sounds likes weak means I can't rely on return == compare to guarantee a swap has taken place. I could probably tweak it to make use of the boolean return instead, but I think there's no big problem going with strong.

}

__host__ __device__ Uint64 SpreadBits3(Uint64 v) {
v = v & 0x1fffff;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These magic numbers look really magical, it would be nice if you can explain how/where did you get them.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar ones in the collider; Morton codes are pretty standard and various versions of the magic numbers can be found in algorithm reference books.

uint32_t idx = vert.key & (Size() - 1);
while (1) {
Uint64& key = table_[idx].key;
const Uint64 found = AtomicCAS(key, kOpen, vert.key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the implementation of a concurrent hashtable, I'm a bit worried about the performance of having atomic operations in a loop. Atomic operations are a lot slower than typical instructions, and the cost is still a lot higher even when you only have 1 thread running.

If writes are not common, I think a lock might give you better performance. If writes are common, I think I can have a look and see if there is a way to reduce the use of atomic operations.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual I was focused more on the GPU, where locks aren't much of an option. I'm totally open to recommendations though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement a spin lock with atomic.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that atomics are mostly slow when concurrent access is actually happening; do they have much performance impact otherwise? I would expect very little concurrency on these - they will only happen during a hash collision (which is already rare) that also happens to be concurrent. I made the step large so the nearby threads don't tend to step on each other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they have performance impact even when there is only 1 thread running these atomic instructions.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, thanks. I'll leave it this way for now, but you're welcome to optimize it - that will be a good learning experience for me.

src/sdf/include/sdf.h Outdated Show resolved Hide resolved
src/sdf/include/sdf.h Outdated Show resolved Hide resolved
inline __host__ __device__ void operator()(Uint64 mortonCode) {
const glm::ivec4 gridIndex = DecodeMorton(mortonCode);

if (glm::any(glm::greaterThan(glm::ivec3(gridIndex), gridSize))) return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just noticed that you have some glm::any(glm::some_comparison(...)) patterns. Maybe we can define some helper functions to make it easier to read? Like anyGreaterThan or something.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's potentially a lot of permutations, I think I'm going to leave that for now.

src/sdf/include/sdf.h Outdated Show resolved Hide resolved
gridVert.distance = BoundedSDF(gridIndex);

bool keep = false;
float minDist2 = 0.25 * 0.25;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usage of minDist2?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's dead code now. I had a cool idea, but realized it would create edge cases that could break manifoldness.

tet[2] = thisVert.Inside();
int edges[6] = {base.edgeVerts[0], -1, -1, -1, -1, -1};
for (const int i : {0, 1, 2}) {
edges[1] = base.edgeVerts[i + 1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is relatively complicated, maybe we can reorder some statements and avoid reusing the array to make it more readable?

for (const int i : {0, 1, 2}) {
  thisIndex = leadIndex;
  thisIndex[Prev3(i)] -= 1;
  GridVert nextVert = gridVerts[MortonCode(thisIndex)];
  tet[3] = nextVert.Inside();

  int edges1[6] = {
    base.edgeVerts[0],
    base.edgeVerts[i+1],
    nextVert.edgeVerts[Next3(i) + 4],
    nextVert.edgeVerts[Prev3(i) + 1],
    thisVert.edgeVerts[i + 4],
    base.edgeVerts[Prev3(i) + 4]
  };
  thisVert = nextVert;
  if (!skipTet && thisVert.key != kOpen) CreateTris(tet, edges1);
  skipTet = thisVert.key == kOpen;
  tet[2] = tet[3];
  int edges2[6] = ...
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I personally think that it is easy to get lost reading many array element operations, especially with some order dependencies with other variables)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this has been cleaned up a lot; thanks for the suggestion, I ended up using that and a few other things too.

@elalish elalish mentioned this pull request Aug 27, 2022
2 tasks
@codecov-commenter
Copy link

Codecov Report

Merging #187 (94a3c69) into master (08598c0) will increase coverage by 0.02%.
The diff coverage is 97.35%.

@@            Coverage Diff             @@
##           master     #187      +/-   ##
==========================================
+ Coverage   98.15%   98.18%   +0.02%     
==========================================
  Files          30       33       +3     
  Lines        2602     2919     +317     
==========================================
+ Hits         2554     2866     +312     
- Misses         48       53       +5     
Impacted Files Coverage Δ
src/collider/src/collider.cpp 100.00% <ø> (ø)
src/manifold/src/impl.cpp 100.00% <ø> (ø)
src/utilities/include/structs.h 100.00% <ø> (ø)
src/utilities/include/utils.h 100.00% <ø> (ø)
src/sdf/include/sdf.h 96.10% <96.10%> (ø)
samples/src/gyroid_module.cpp 100.00% <100.00%> (ø)
src/manifold/src/boolean3.cpp 100.00% <100.00%> (ø)
src/manifold/src/boolean_result.cpp 99.57% <100.00%> (+0.06%) ⬆️
src/sdf/src/sdf.cpp 100.00% <100.00%> (ø)
src/utilities/include/par.h 100.00% <100.00%> (ø)
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@elalish elalish requested a review from pca006132 August 30, 2022 20:03
@elalish
Copy link
Owner Author

elalish commented Aug 30, 2022

I've simplified the API; I think this is pretty much ready to merge.

@elalish elalish changed the title Adding SDF class Adding SDF LevelSet Sep 1, 2022
@elalish elalish merged commit 11e58ea into master Sep 1, 2022
@elalish elalish deleted the sdf branch September 1, 2022 16:04
cartesian-theatrics pushed a commit to SovereignShop/manifold that referenced this pull request Mar 11, 2024
* polymorphic device functions working

* starting to flesh out algorithm

* getting closer

* fleshed out vert computation

* more progress

* BuildTris

* fixed some syntax

* updating to cpp

* added VecDc

* SDF compiles

* SDF runs

* fixed some bugs

* manifold output

* fixed out of memory bug

* flattened sides

* back to egg-crate

* works with CUDA too

* moved SDF

* Revert "moved SDF"

This reverts commit ba2348f.

* added gyroid sample

* added sdf_test

* added SDF tests

* tests pass

* updated gyroid_module

* fixed gyroid module

* added docs

* addressing feedback

* fix cuda detection

* clean up detect cuda

* fixed CUDA failure

* consistent execution policy in Boolean

* remove SDF wrapper class

* simplified tet logic

* use NeighborInside()

* optimized SDF

* moved RemoveUnreferencedVerts

* put printf statements behind MANIFOLD_DEBUG

* enable hashtable resize

* added test for hashtable resizing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SDF system
4 participants