Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lanczos Solver #2416

Open
wants to merge 25 commits into
base: branch-24.10
Choose a base branch
from
Open

Conversation

aamijar
Copy link
Contributor

@aamijar aamijar commented Aug 20, 2024

Lanczos Solver for Sparse Eigen Decomposition

We propose a new lanczos solver in raft that fixes the issues present in the previous solver raft::sparse::solver::detail::computeSmallestEigenvectors.

Specifically we address the following issues:

  1. Numerical Stability for both float32 and float64 datatypes
  2. Efficiency and Speed of Convergence

This new implementation is taken from the cupy library cupyx.scipy.sparse.linalg.eigsh where the thick-restart and full reorthogonalzation methods are used.

Additionally this PR exposes a python api for raft lanczos solver with an interface similar to scipy.sparse.linalg.eigsh and cupyx.scipy.sparse.linalg.eigsh.

from pylibraft.solver import eigsh

Copy link

copy-pr-bot bot commented Aug 20, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Aug 20, 2024
@aamijar aamijar mentioned this pull request Aug 20, 2024
@aamijar aamijar self-assigned this Aug 21, 2024
@aamijar aamijar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 21, 2024
@aamijar aamijar marked this pull request as ready for review August 21, 2024 23:24
@aamijar aamijar requested review from a team as code owners August 21, 2024 23:24
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is coming along great, @aamijar. Mostly it needs some cleanup and polishing, but otherwise should be ready to merge once my comments are resolved.

cpp/include/raft/sparse/solver/detail/lanczos.cuh Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/detail/lanczos.cuh Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/detail/lanczos.cuh Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/detail/lanczos.cuh Outdated Show resolved Hide resolved
cpp/src/raft_runtime/solver/lanczos_solver.cuh Outdated Show resolved Hide resolved
cpp/include/raft_runtime/solver/lanczos.hpp Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/lanczos.cuh Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/lanczos.cuh Show resolved Hide resolved
cpp/include/raft_runtime/solver/lanczos.hpp Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/lanczos.cuh Show resolved Hide resolved
python/pylibraft/pylibraft/solver/__init__.py Outdated Show resolved Hide resolved
cpp/include/raft/sparse/solver/detail/lanczos.cuh Outdated Show resolved Hide resolved
template <typename IndexTypeT, typename ValueTypeT>
auto lanczos_compute_smallest_eigenvectors(
raft::resources const& handle,
raft::spectral::matrix::sparse_matrix_t<IndexTypeT, ValueTypeT> const& A,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the newer raft::device_csr_matrix API be used for this new public function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the device_csr_matrix_view in the public API, not in the detail namespace

Suggested change
raft::spectral::matrix::sparse_matrix_t<IndexTypeT, ValueTypeT> const& A,
raft::device_csr_matrix_view<ValueTypeT, IndexTypeT, IndexTypeT, IndexTypeT> A,

cpp/test/sparse/solver/lanczos.cu Outdated Show resolved Hide resolved
cpp/test/sparse/solver/lanczos.cu Show resolved Hide resolved
@aamijar aamijar requested a review from lowener September 9, 2024 18:21
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a test for the python API of Lanczos also be added?


namespace raft::sparse::solver {

template <typename IndexTypeT, typename ValueTypeT>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexTypeT template is unused in this structure so it should be removed.

}

/**
* @brief Find the smallest eigenpairs using lanczos solver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should be on the top-level function which will be called directly, not in the detail namespace

template <typename IndexTypeT, typename ValueTypeT>
auto lanczos_compute_smallest_eigenvectors(
raft::resources const& handle,
raft::spectral::matrix::sparse_matrix_t<IndexTypeT, ValueTypeT> const& A,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the device_csr_matrix_view in the public API, not in the detail namespace

Suggested change
raft::spectral::matrix::sparse_matrix_t<IndexTypeT, ValueTypeT> const& A,
raft::device_csr_matrix_view<ValueTypeT, IndexTypeT, IndexTypeT, IndexTypeT> A,

cpp/include/raft/sparse/solver/lanczos.cuh Show resolved Hide resolved
@@ -1396,4 +1438,658 @@ int computeLargestEigenvectors(
return status;
}

template <typename T>
RAFT_KERNEL kernel_subtract_and_scale(T* u, T* vec, T* scalar, int n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel_clamp_down_vector<<<numBlocks, blockSize, 0, stream>>>(
u.data_handle(), static_cast<ValueTypeT>(1e-7), n);

kernel_clamp_down<<<1, 1, 0, stream>>>(&beta(0, i), static_cast<ValueTypeT>(1e-6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't dereference a device memory address on host side

handle,
v0_vector_const,
V_0_view,
[device_scalar = v0nrm_scalar.data_handle()] __device__(auto y) { return y / *device_scalar; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can v0nrm and it's copy operations be skipped this way?

Suggested change
[device_scalar = v0nrm_scalar.data_handle()] __device__(auto y) { return y / *device_scalar; });
[device_scalar = output1.data_handle()] __device__(auto y) { return y / *device_scalar; });


raft::device_vector<ValueTypeT, uint32_t> output1 =
raft::make_device_vector<ValueTypeT, uint32_t>(handle, 1);
raft::device_matrix_view<const ValueTypeT> input1 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference with v0_view? Can't it be used here?

raft::linalg::unary_op(handle,
u_vector_const,
V_0_view,
[device_scalar = unrm_scalar.data_handle()] __device__(auto y) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrm can be skipped

Suggested change
[device_scalar = unrm_scalar.data_handle()] __device__(auto y) {
[device_scalar = output.data_handle()] __device__(auto y) {

raft::sqrt_op());
raft::copy(&res, output2.data_handle(), 1, stream);

RAFT_LOG_TRACE("Iteration %f: residual (tolerance) %d", iter, res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use a copied value on host, the stream would need to be synchronized before. Since that sync can slow this function down it would be better to check the log level, and only copy and sync if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants