Optimizations #106

pca006132 · 2022-05-09T16:40:03Z

optimizations as mentioned in #105

elalish

Looking great, thanks! Would you mind including a link to your performance spreadsheet in this PR (ideally updated once it's finished)? 30% improvement on CUDA, especially for the larger test cases is quite impressive indeed!

elalish · 2022-05-10T05:17:24Z

manifold/src/face_op.cpp

@@ -50,12 +53,10 @@ void Manifold::Impl::Face2Tri(const VecDH<int>& faceEdge,
    ALWAYS_ASSERT(numEdge >= 3, topologyErr, "face has less than three edges.");
    const glm::vec3 normal = faceNormal[face];

-    std::map<int, int> vertBary;
-    for (int j = firstEdge; j < lastEdge; ++j)
-      vertBary[halfedge[j].startVert] = halfedgeBary.H()[j];


Good call, I bet this could be a big part of why the triangulation step has been scaling worse than it should.

manifold/src/face_op.cpp

elalish · 2022-05-10T05:20:41Z

manifold/src/impl.cpp

  // Stable sort is required here so that halfedges from the same face are
  // paired together (the triangles were created in face order). In some
  // degenerate situations the triangulator can add the same internal edge in
  // two different faces, causing this edge to not be 2-manifold. We detect this
  // and fix it by swapping one of the identical edges, so it is important that
  // we have the edges paired according to their face.
-  std::stable_sort(edge.begin(), edge.end());
+  thrust::stable_sort(edge.beginD(), edge.endD());


Is this new? I vaguely recall using std:: because thrust didn't have a stable_sort.

From git blame of https://github.com/NVIDIA/thrust/blame/fa54f2c6f1217237953f27ddf67f901b6b34fbdd/testing/stable_sort.cu, it seems that stable_sort is in thrust for over 10 years. Perhaps it is related to cudatoolkit version?

cool, I must have remembered wrong.

utilities/include/sparse.h

elalish

Really love how much simpler this is! And perfTest for CUDA is indeed running ~30% faster for me:

nTri = 512, time = 0.00647049 sec
nTri = 2048, time = 0.00899614 sec
nTri = 8192, time = 0.0170416 sec
nTri = 32768, time = 0.0450551 sec
nTri = 131072, time = 0.144635 sec
nTri = 524288, time = 0.524747 sec
nTri = 2097152, time = 1.89367 sec
nTri = 8388608, time = 7.68228 sec

elalish · 2022-05-10T15:40:13Z

collider/src/collider.cpp

@@ -266,7 +266,7 @@ Collider::Collider(const VecDH<Box>& leafBB,
 */
 template <typename T>
 SparseIndices Collider::Collisions(const VecDH<T>& querriesIn) const {
-  int maxOverlaps = 1 << 20;
+  int maxOverlaps = querriesIn.size() * 4;


collider/src/collider.cpp

elalish · 2022-05-10T15:43:11Z

manifold/src/impl.cpp

  // Stable sort is required here so that halfedges from the same face are
  // paired together (the triangles were created in face order). In some
  // degenerate situations the triangulator can add the same internal edge in
  // two different faces, causing this edge to not be 2-manifold. We detect this
  // and fix it by swapping one of the identical edges, so it is important that
  // we have the edges paired according to their face.
-  std::stable_sort(edge.begin(), edge.end());
+  thrust::stable_sort(edge.beginD(), edge.endD());


cool, I must have remembered wrong.

elalish

Thanks for the cleanup!

elalish · 2022-05-11T04:33:53Z

collider/src/collider.cpp

@@ -282,7 +282,11 @@ SparseIndices Collider::Collisions(const VecDH<T>& querriesIn) const {
      break;
    else {  // if not enough memory was allocated, guess how much will be needed
      int lastQuery = querryTri.Get(0).H().back();
-      maxOverlaps *= 2;
+      float ratio = static_cast<float>(querriesIn.size()) / lastQuery;
+      if (ratio > 1000) // do not trust the ratio if it is too large


Optimizations

use std::move to avoid copying

a39f4fd

elalish reviewed May 10, 2022

View reviewed changes

pca006132 added 2 commits May 10, 2022 22:56

avoid using std::map for 3/4 edge cases

1f1b608

collider: reduced initial max size

02ce2b4

pca006132 force-pushed the optimizations branch 2 times, most recently from ad1e096 to 02ce2b4 Compare May 10, 2022 14:58

pca006132 and others added 2 commits May 10, 2022 23:05

fixed sprase indices resize

42b46f0

Merge branch 'master' into optimizations

c5c5335

elalish approved these changes May 10, 2022

View reviewed changes

pca006132 added 2 commits May 11, 2022 00:08

collider: resize faster

f940fe9

reduce code duplication

25e7bf7

elalish approved these changes May 11, 2022

View reviewed changes

elalish merged commit 48d51b6 into elalish:master May 11, 2022

pca006132 deleted the optimizations branch May 12, 2022 08:22

cartesian-theatrics pushed a commit to SovereignShop/manifold that referenced this pull request Mar 11, 2024

Merge pull request elalish#106 from pca006132/optimizations

14c10f0

Optimizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations #106

Optimizations #106

pca006132 commented May 9, 2022

elalish left a comment

elalish May 10, 2022

elalish May 10, 2022

pca006132 May 10, 2022

elalish May 10, 2022

elalish left a comment

elalish May 10, 2022

elalish May 10, 2022

elalish left a comment

elalish May 11, 2022

Optimizations #106

Optimizations #106

Conversation

pca006132 commented May 9, 2022

elalish left a comment

Choose a reason for hiding this comment

elalish May 10, 2022

Choose a reason for hiding this comment

elalish May 10, 2022

Choose a reason for hiding this comment

pca006132 May 10, 2022

Choose a reason for hiding this comment

elalish May 10, 2022

Choose a reason for hiding this comment

elalish left a comment

Choose a reason for hiding this comment

elalish May 10, 2022

Choose a reason for hiding this comment

elalish May 10, 2022

Choose a reason for hiding this comment

elalish left a comment

Choose a reason for hiding this comment

elalish May 11, 2022

Choose a reason for hiding this comment