Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifold::Cube is slow. #710

Closed
briansturgill opened this issue Jan 17, 2024 · 11 comments · Fixed by #717
Closed

Manifold::Cube is slow. #710

briansturgill opened this issue Jan 17, 2024 · 11 comments · Fixed by #717

Comments

@briansturgill
Copy link
Contributor

Cube is nearly half the speed of a function I wrote in python calling square/extrude.
I can do a PR if you want.

 1.292 100,000 creations of M with cube member function
 0.858 100,000 creations of M with mesh from cube
 0.719 100,000 creations of M with python function
import time
import manifold3d as m
from random import random
import numpy as np

ts = time.time()
for i in range(100000):
    c = m.Manifold.cube((10.0, 10.0, 10.0))
    c.num_vert()
te = time.time()
tdiff = te - ts
print(f"{tdiff: 5.3f}", "100,000 creations of M with cube member function")

mesh = c.to_mesh()
mesh = m.Mesh(
    tri_verts=np.array(mesh.tri_verts.tolist().copy(), np.int32),
    vert_properties=np.array(mesh.vert_properties.tolist().copy(), np.float32),
)
ts = time.time()
for i in range(100000):
    c = m.Manifold(mesh)
    c.num_vert()
te = time.time()
tdiff = te - ts
print(f"{tdiff: 5.3f}", "100,000 creations of M with mesh from cube")


def cube(dim):
    x, y, z = dim
    c = m.CrossSection.square([x, y])
    return m.Manifold.extrude(c, z)


ts = time.time()
for i in range(100000):
    c = cube((10.0, 10.0, 10.0))
    c.num_vert()
te = time.time()
tdiff = te - ts
print(f"{tdiff: 5.3f}", "100,000 creations of M with python function")
@briansturgill
Copy link
Contributor Author

AsOriginal() seems only to be used in Cube and Cylinder.

/**
 * This function condenses all coplanar faces in the relation, and
 * collapses those edges. In the process the relation to ancestor meshes is lost * and this new Manifold is marked an original. Properties are preserved, so if
 * they do not match across an edge, that edge will be kept.
 */
Manifold Manifold::AsOriginal() const {
  auto newImpl = std::make_shared<Impl>(*GetCsgLeafNode().GetImpl());
  newImpl->meshRelation_.originalID = ReserveIDs(1);
  newImpl->InitializeOriginal();
  newImpl->CreateFaces();
  newImpl->SimplifyTopology();
  newImpl->Finish();
  return Manifold(std::make_shared<CsgLeafNode>(newImpl));
}

I am mystified as to why any of these things need to be done?

Interestingly Cylinder calls it only if a translate is done.

@elalish
Copy link
Owner

elalish commented Jan 17, 2024

Yes, this didn't get much optimization attention yet. The reason for AsOriginal() is that the input verts and triangles are not sorted (laziness on my part), but become sorted during construction. The faceID refers to the original, good if the mesh was actually input, but meaningless here. AsOriginal() simplifies that, but you're right that it would be better to avoid it entirely by getting everything in the right order ahead of time, or possibly just calling a subset of AsOriginal().

@pca006132
Copy link
Collaborator

Yes, we did not pay much attention to short running functions. In addition to sorting the vertices and avoid the AsOriginal() call, we should probably try to cache these constructor calls and make a singleton for them, and make the size a transformation node in the csg tree. For things like sphere and cylinder though, it will be harder as the number of vertices can be different for different sizes, the cache should probably consider that.

@pca006132
Copy link
Collaborator

OK did some benchmark, this patch is the fastest:

diff --git a/src/manifold/src/constructors.cpp b/src/manifold/src/constructors.cpp
index dce705a..da0096c 100644
--- a/src/manifold/src/constructors.cpp
+++ b/src/manifold/src/constructors.cpp
@@ -153,10 +153,10 @@ Manifold Manifold::Cube(glm::vec3 size, bool center) {
       glm::length(size) == 0.) {
     return Invalid();
   }
-  auto cube = Manifold(std::make_shared<Impl>(Impl::Shape::Cube));
-  cube = cube.Scale(size);
-  if (center) cube = cube.Translate(-size / 2.0f);
-  return cube.AsOriginal();
+  // return Manifold::Extrude(CrossSection::Square({size.x, size.y}), size.z).Translate(center?(-size/2.0f) : glm::vec3(0));
+  glm::mat4x3 m = glm::scale(size);
+  if (center) m = glm::translate(-size / 2.0f) * glm::mat4(m);
+  return Manifold(std::make_shared<Impl>(Manifold::Impl::Shape::Cube, m));
 }
 
 /**
diff --git a/src/manifold/src/impl.cpp b/src/manifold/src/impl.cpp
index 8480320..cb491d1 100644
--- a/src/manifold/src/impl.cpp
+++ b/src/manifold/src/impl.cpp
@@ -534,7 +534,7 @@ Manifold::Impl::Impl(const Mesh& mesh, const MeshRelationD& relation,
  * Create either a unit tetrahedron, cube or octahedron. The cube is in the
  * first octant, while the others are symmetric about the origin.
  */
-Manifold::Impl::Impl(Shape shape) {
+Manifold::Impl::Impl(Shape shape, const glm::mat4x3 m) {
   std::vector<glm::vec3> vertPos;
   std::vector<glm::ivec3> triVerts;
   switch (shape) {
@@ -547,19 +547,19 @@ Manifold::Impl::Impl(Shape shape) {
       break;
     case Shape::Cube:
       vertPos = {{0.0f, 0.0f, 0.0f},  //
-                 {1.0f, 0.0f, 0.0f},  //
-                 {1.0f, 1.0f, 0.0f},  //
-                 {0.0f, 1.0f, 0.0f},  //
                  {0.0f, 0.0f, 1.0f},  //
+                 {0.0f, 1.0f, 0.0f},  //
+                 {0.0f, 1.0f, 1.0f},  //
+                 {1.0f, 0.0f, 0.0f},  //
                  {1.0f, 0.0f, 1.0f},  //
-                 {1.0f, 1.0f, 1.0f},  //
-                 {0.0f, 1.0f, 1.0f}};
-      triVerts = {{0, 2, 1}, {0, 3, 2},  //
-                  {4, 5, 6}, {4, 6, 7},  //
-                  {0, 1, 5}, {0, 5, 4},  //
-                  {1, 2, 6}, {1, 6, 5},  //
-                  {2, 3, 7}, {2, 7, 6},  //
-                  {3, 0, 4}, {3, 4, 7}};
+                 {1.0f, 1.0f, 0.0f},  //
+                 {1.0f, 1.0f, 1.0f}};
+      triVerts = {{1, 0, 4}, {2, 4, 0},  //
+                  {1, 3, 0}, {3, 1, 5},  //
+                  {3, 2, 0}, {3, 7, 2},  //
+                  {5, 4, 6}, {5, 1, 4},  //
+                  {6, 4, 2}, {7, 6, 2},  //
+                  {7, 3, 5}, {7, 5, 6}};
       break;
     case Shape::Octahedron:
       vertPos = {{1.0f, 0.0f, 0.0f},   //
@@ -575,6 +575,7 @@ Manifold::Impl::Impl(Shape shape) {
       break;
   }
   vertPos_ = vertPos;
+  for (auto& v : vertPos_) v = m * glm::vec4(v, 1.0f);
   CreateHalfedges(triVerts);
   Finish();
   meshRelation_.originalID = ReserveIDs(1);
diff --git a/src/manifold/src/impl.h b/src/manifold/src/impl.h
index 36e37fc..c5dd8ec 100644
--- a/src/manifold/src/impl.h
+++ b/src/manifold/src/impl.h
@@ -59,7 +59,7 @@ struct Manifold::Impl {
 
   Impl() {}
   enum class Shape { Tetrahedron, Cube, Octahedron };
-  Impl(Shape);
+  Impl(Shape, const glm::mat4x3 = glm::mat4x3(1));
 
   Impl(const MeshGL&, std::vector<float> propertyTolerance = {});
   Impl(const Mesh&, const MeshRelationD& relation,

Timing:

 0.274 100,000 creations of M with cube member function
 0.489 100,000 creations of M with mesh from cube
 0.368 100,000 creations of M with python function

If we use extrude, the timing is roughly

 0.354 100,000 creations of M with cube member function
 0.489 100,000 creations of M with mesh from cube
 0.368 100,000 creations of M with python function

If we use translate/scale instead of directly applying the matrix, we need to calculate the spectral norm which is slow and result in the following timing:

 0.361 100,000 creations of M with cube member function
 0.489 100,000 creations of M with mesh from cube
 0.368 100,000 creations of M with python function

@pca006132
Copy link
Collaborator

so there is one optimization we can do for primitives, apart from caching, is to allow transformation without calculating the spectral norm.

@pca006132
Copy link
Collaborator

@briansturgill I think your reply is deleted. Not sure what do you mean by batch extrude and what is too slow, can you elaborate?

@briansturgill
Copy link
Contributor Author

briansturgill commented Jan 24, 2024

@pca006132 Yeah... I had a mistake in the benchmark.
The only take away you might care about at this point is that Manifold::Extrude
does a fair bit of unnecessary work for the most common case of no scale/rotate.

@pca006132
Copy link
Collaborator

Do you have a measurement for the overhead of that? e.g. remove the related code and see the timing difference. it is hard to reason about the overhead without actually benchmarking it.

@pca006132
Copy link
Collaborator

I did a benchmark (extrude square), the most expensive parts are Impl::Finish and Impl::CreateFaces, which is expected as they are fairly involved. I think unless you have a very high nDivision count, the loop will not have a high overhead, but for extrusion without scale/rotate, you will not set nDivision count anyway.

For complicated polygons, the triangulator is the most expensive part.

@briansturgill
Copy link
Contributor Author

briansturgill commented Jan 24, 2024

@pca006132 "Do you have a measurement...".
I believed I did, but the benchmark had a problem. Thus, my deleted message.
I am working towards one, but it will be a few days. [I need to do it in C++.]

@elalish
Copy link
Owner

elalish commented Jan 26, 2024

Happy to see optimizations and such, but just as a general note on this library - it is always true that copying a Manifold is cheaper than constructing one. This is especially true for input meshes, but as you found, also for cubes and such. I figure it's better for perf-focused users to get used to doing cube = Manifold::Cube() and then doing things like cube + cube.Translate(...) + cube.Translate(...) rather than calling Manifold::Cube() over and over. It's a better pattern to follow, particularly when you switch from a cube to a shape from a file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants