-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding KMeans PyTorch Implementation to cfa model #998
adding KMeans PyTorch Implementation to cfa model #998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this. I wanted to compare the scikit-learn and torch implementations, but had some trouble. Can you share the steps that you mentioned here in this link
#956 (comment)
@@ -78,6 +78,66 @@ def get_feature_extractor(backbone: str, return_nodes: list[str]) -> GraphModule | |||
return feature_extractor | |||
|
|||
|
|||
#Kmeans clustering algorithm implementation in PyTorch framework | |||
class KMeans_torch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class KMeans_torch: | |
class KMeans: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KMeans
could potentially be used by other algorithms in the future. Therefore, it would be good to create a file, named src/anomalib/models/components/cluster/kmeans.py
and move this implementation there.
@@ -78,6 +78,66 @@ def get_feature_extractor(backbone: str, return_nodes: list[str]) -> GraphModule | |||
return feature_extractor | |||
|
|||
|
|||
#Kmeans clustering algorithm implementation in PyTorch framework | |||
class KMeans_torch: | |||
def __init__(self, n_clusters, max_iter=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to use type hints as much as possible, which will be useful for our new CLI that automatically handle the type of the variables.
def __init__(self, n_clusters, max_iter=10): | |
def __init__(self, n_clusters: int, max_iter:int = 10): |
|
||
#thise line returns labels and centoids of the results, | ||
#alternative to Sklearn's cluster_centers_ & labels_ attributes | ||
return self.cluster_assignments, self.centroids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to return self
, similar to the scikit-learn
implementation?
Dear Samet, thank you for putting such effort into review and feedback, of course, all of your requested changes are possible and I will start working on them and commit changes ASAP, I will also provide you with the full code/notebook I used for comparison between the two implementations. in the meantime, I potentially looking to address some other TO-DO issues either were the other two issues in cfa model or issues in other models as well, if that's possible and eligible of course. Thank you. |
Dear @samet-akcay, please forgive me for the late reply, due to several engagements in the last few days, here is my full approach to comparing the Kmeans Sklean and PyTorch implementations through silhouette score, which is a metric used to calculate the goodness of a clustering technique. Its value ranges from -1 to 1. It contains the modifications you asked for, this is a preview, I will commit the changes as required and I may add a running notebook with the following comparison code, if it's not necessary, I will delete it. Thank you
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@samet-akcay, I wanted to ask if this to-do task, As far as I know, |
@samet-akcay, if it's already been done, I may contribute to this task, |
@ashwinvaidya17, can you help here to show how @aadhamm can utilize the new feature extractor for the cfa model |
You can have a look here to see how it is done. You can adapt your code to call this class. anomalib/src/anomalib/models/csflow/torch_model.py Lines 475 to 476 in 4fe6c74
If you still have any questions then feel free to ask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the efforts I have a few comments.
- Can you remove the jupyter notebook from the components folder. If you want to confirm the outputs then you can move this code into a file and turn it into a unit test.
- Can you also address the issues raised by codacy.
- The spinx parser is configured to follow the Google's docstring format. Can you update the docstrings to conform to this format? Here is an example https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
Parameters: | ||
X: A tensor of shape (N, D) containing the input data. | ||
N is the number of data points | ||
D is the dimensionality of the data points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these are the right parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to more than one source, N is the number of samples and D is the number of features, D is s used to initialize the centroids randomly with the same dimensionality as the input data, but not used actually afterward in the code which created issues related to codacy tests.
I resolved this later using your suggestion.
Parameters: | ||
n_clusters: The number of clusters to create. | ||
max_iter: The maximum number of iterations to run the algorithm for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow the google docstring format. This should be Args
. Also, can you also add the types to the variables in the docs as well.
N is the number of data points | ||
D is the dimensionality of the data points. | ||
""" | ||
N, D = X.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe batch_size, _ = inputs.shape
?
mask = self.labels_ == j | ||
if mask.any(): | ||
self.cluster_centers_[j] = X[mask].mean(dim=0) | ||
#thise line returns labels and centoids of the results, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo "this line"
""" | ||
Assigns each data point in X to its closest centroid. | ||
|
||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Args
as well
@ashwinvaidya17 Thank you for your detailed review, I will work on all the required changes and update you as soon as possible and yes, you are right, there are variables that could be reduced as it will work best for codacy tests. |
… 34)' (syntax-error)
@samet-akcay @ashwinvaidya17 Hello everyone, I have made the required changes and addressed issues by codacy. If you could take a look and give me your feedback I will appreciate it a lot. Thank you. |
@samet-akcay @ashwinvaidya17 Hello, everyone i have done the required changes and all the checks have passed, if you could take a look and see if it is ready to merge or not yet. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting back with this. Thanks for all the efforts.
Description
Provide an implementation to the KMeans clustering algorithm through the PyTorch framework
Fixes # (956)
Changes
Checklist