-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create standalone IVF training API in python #2553
Conversation
…ng with IVF training
f"Vector column {c} must be FixedSizeListArray " | ||
f"1-dimensional FixedShapeTensorArray, got {field.type}" | ||
) | ||
if not pa.types.is_floating(field.type.value_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support uint8
as well
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2553 +/- ##
==========================================
+ Coverage 79.95% 80.01% +0.05%
==========================================
Files 207 209 +2
Lines 59500 59821 +321
Branches 59500 59821 +321
==========================================
+ Hits 47576 47865 +289
- Misses 9122 9131 +9
- Partials 2802 2825 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here :)
self.distance_type = distance_type | ||
"""The distance type used to train the IVF model""" | ||
|
||
def num_partitions(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be a property:
def num_partitions(self) -> int: | |
@property | |
def num_partitions(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a property
|
||
K-means is an iterative algorithm that can be computationally expensive. The | ||
accelerator argument can be used to offload the computation to a hardware | ||
accelerator such as a GPU or TPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a progress bar for this? Should we mention the LANCE_LOG
variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a tqdm progress bar if an accelerator is used. I started to add one if an accelerator is not used but it ended up being a fair amount of code. Will open a different PR.
from .dependencies import torch | ||
|
||
|
||
class IvfModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary for now, but it would be very cool to be able to get this from an existing index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
I'm going to merge this on green. I have a few PRs built on top of this one already. |
This is the start of breaking IVF / PQ index training up into four standalone methods with check pointing in between each method.