-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal Build for On-Device Training #16326
Conversation
…baijumeswani/training-minimal-build
…baijumeswani/training-minimal-build
…baijumeswani/training-minimal-build
…baijumeswani/training-minimal-build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familar with flatbuffer format, just have few general comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minors.
…baijumeswani/training-minimal-build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting to rename "requires_grad" to "requires_grad_params" to better represent itself.
Since it is part of schema, if we want to do that, maybe we should do it earlier (instead of bumping the versions next time). Any thought?
Besides that, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments, looks good overall
…baijumeswani/training-minimal-build
Thank you for the valuable feedback @edgchen1 @pengwa @skottmckay @askhade 😄 |
🛠️ Changes in this pull request:
This pull request introduces two significant changes to the project:
Changing on device training checkpoint format: The current implementation stores the on device training checkpoint as a sequence of tensors in multiple files inside a checkpoint folder, which can be inefficient in terms of storage and performance. In this PR, I have modified the checkpoint format to utilize the flatbuffer table to save the checkpoint to a single file, providing a more compact and efficient representation. The changes around this are twofold:
Adding support for onnxruntime minimal build: To support scenarios where binary size is a constraint, I made changes to ensure that the training build can work well with the minimal build.
🔍 Open Issues: