Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMeans Init Sparsity Support #2815

Merged

Conversation

md-shafiul-alam
Copy link
Contributor

@md-shafiul-alam md-shafiul-alam commented Jun 7, 2024

Add sparsity support to KMeans Init and fix a few bugs in daal sparse kmeans++ init, onedal kmeans++ init, and kmeans infer. Specific changes planned or made in this PR.

  • Fix distance calculation for sparse data in daal KMeans++
  • Allow oneDAL Kmeans++ init to take n_trials same as daal and scikit-learn
  • Fix difference between daal Kmeans++ dense and sparse results
  • Implement KMeans init sparse support for CPU (just calling daal implementation - cpu)
  • Fix oneDAL KMeans sparse infer on GPU
  • Update Kmeans infer for sparse data to allow result options same as dense

I have verified that

  • Daal kmeans init results are same for sparse and dense data
  • oneDAL kmeans init results are same for sparse and dense data
  • oneDAL kmeans init results are same on cpu and gpu
    • Not same for dense data unless we compute initial centroids for dense GPU using cpu implementation

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @md-shafiul-alam for thework done!

I see that only one test suit here updated. It worth to increase or maybe updates tests for better coverage of new fixes and features.

Copy link
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, but we have to discuss optional results, due to looks like its not necessary with the latest comments, and it makes sense to remove/update it. Also, it would be nice if you add tests with direct comparison between results on the same data, but with different formats(dense , sparse) with some tolerance to track that it works fine

cpp/oneapi/dal/algo/kmeans/infer_types.cpp Show resolved Hide resolved
cpp/oneapi/dal/algo/kmeans/test/badarg.cpp Outdated Show resolved Hide resolved
@md-shafiul-alam
Copy link
Contributor Author

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@md-shafiul-alam
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor

/intelci: run

2 similar comments
@md-shafiul-alam
Copy link
Contributor Author

/intelci: run

@md-shafiul-alam
Copy link
Contributor Author

/intelci: run

@md-shafiul-alam
Copy link
Contributor Author

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@md-shafiul-alam
Copy link
Contributor Author

/intelci: run

@md-shafiul-alam md-shafiul-alam merged commit 08df465 into oneapi-src:main Aug 7, 2024
16 of 17 checks passed
@md-shafiul-alam
Copy link
Contributor Author

The failures are not related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants