Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up interpolation operations #128

Open
jbadger95 opened this issue Jun 20, 2023 · 1 comment
Open

Speeding up interpolation operations #128

jbadger95 opened this issue Jun 20, 2023 · 1 comment
Assignees

Comments

@jbadger95
Copy link
Contributor

jbadger95 commented Jun 20, 2023

Interpolation routines (e.g. hpface, dhpface-, etc.) constitute the bulk of the computation in update_gdof and update_Ddof, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).

Here's a few ways in which they could be optimized:

  • Ordering of the loops on the assembled projection matrix is not optimal (indices need to be swapped)
  • Test functions are currently pulled back in the inner-most loop (which means all the pullbacks are applied to all test functions, redundantly for each trial function). The pulled-back test and trial functions coincide so could instead be computed once and stored (and importantly, not redundantly computed, this alone should result in a significant speedup). The best way to do this would be to apply the pullback to all shape functions at once (inside the quadrature point loop) using optimized BLAS3 routines (DGEMM).
  • Integration could likely be accelerated by forming a matrix with dimensions pulled-back shape functions by quadrature points (so each column has values of pulled-back shape functions at different quadrature points) and then simply multiplying the matrix and its transpose. Calling optimized BLAS3 routines for the assembly instead of explicitly forming the product (as we are doing now) will likely be much faster.

For a p=3 hexahedral mesh, the assembly (integration) in hpface takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.

@stefanhenneking
Copy link
Contributor

Thanks for suggesting this! - these optimizations will have a good impact on the PBI routines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants