Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: On-device training with TensorFlow Lite #390

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

miaout17
Copy link
Contributor

@miaout17 miaout17 commented Jun 7, 2021

We're sharing this RFC to reflect our newest thoughts of implementing on-device training in TensorFlow Lite.
We didn't setup a timeline to close the comments. We want to surface the RFC early for transparency and get feedback.

Status Draft
Author(s) Yu-Cheng Ling (ycling@google.com), Haoliang Zhang (haoliang@google.com), Jaesung Chung (jaesung@google.com)
Sponsor Jared Duke (jdduke@google.com)
Updated 2021-06-04

Introduction

TensorFlow Lite is TensorFlow's solution for on-device machine learning.
Initially it only focused on inference use cases. We have increasingly heard
from users regarding the need for on-device training. This proposal lays out
the concrete plan & roadmap for supporting training in TensorFlow Lite.

@jijoongmoon
Copy link

jijoongmoon commented Jun 9, 2021

Thanks for sharing the RFC. But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup? And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

@bhack
Copy link
Contributor

bhack commented Jun 13, 2021

I suggest to take a look at Continual Learning on the Edge with TensorFlow Lite

And https://arxiv.org/abs/2105.13127

@bhack
Copy link
Contributor

bhack commented Jun 13, 2021

/cc @vlomonaco

@bhack
Copy link
Contributor

bhack commented Jun 14, 2021

Another interesting scenario to evaluate is training in the context of Edge federated learning:

https://github.com/tensorflow/federated/issues/749
https://arxiv.org/abs/2104.03042
https://arxiv.org/abs/1909.11875
https://www.sciencedirect.com/science/article/pii/S266729522100009X

@vlomonaco
Copy link

Thanks @bhack for the tag! @lrzpellegrini, the main author of "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" will take a look and provide some feedback.

@bhack
Copy link
Contributor

bhack commented Jun 14, 2021

/cc @gdemos01 @akhilmathurs

@miaout17
Copy link
Contributor Author

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

For sure. We're focusing on making it generally work first. Once we reach that point, we can do more benchmarking and profiling to figure out what's most significant to be optimized, and work on it.

@miaout17
Copy link
Contributor Author

Thanks @bhack.
@vlomonaco @lrzpellegrini thanks for taking a look and please feel free to comment.

@lc0
Copy link
Contributor

lc0 commented Jun 14, 2021

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

@miaout17 can you elaborate on how such a shape change process would work? I do not see such a use case in the current proposal. Thanks!

@vlomonaco
Copy link

Hi @miaout17, I had a more in-depth look. This direction looks promising and we are excited to finally see training on-device on the TFLite radar. I think for many Transfer Learning problems these features would be great. However, for Continual Learning (CL) flexibility is all that matters.

  • Can the model architecture, optimizer, loss function be changed over time?

It would be difficult to implement a CL approach without those features, apart from basic experience replay. @lrzpellegrini will provide more details.

@lrzpellegrini
Copy link

Hi there, I had a look at the RFC. It seems to me that it moves in a very good direction.

I'm not aware of the current capabilities of TF-Lite as I only had the chance to use it in a very high-level way, but I really appreciate that the focus of the RFC is on the ability to transfer whole tf.functions to the final model. This can really boost the ability to learn on-device without forcing the programmer to delve too much in the low-level side of mobile implementations.

As a comparison, while implementing the CORe app described in "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" I had to manually translate the Python version of our Continual Learning algorithm in C++ so that it could be used along the Caffe deep learning library. In this scenario even simple things like moving data, accessing tensors (weights, inputs, ...) add a lot of complexity and with that comes an absurd overhead on the programming side, so I really appreciate this tf.functions based approach 👍.

As Vincenzo pointed out, the main issues are on the flexibility side. In the simple scenario of a limited on-device fine-tuning, a simple fit based approach seems the best solution. However, this would really limit the capabilities of the framework: as I suspect, a fit-based approach would only allow for a very simple instance replay mechanism, which may be insufficient when working with Continual Learning algorithms.

On the other hand, supporting Continual Learning algorithms may require some flexibility on:

  • Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.
  • Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models
  • Ability to change the optimizer, loss, lr schedulers and other training related components
  • Ability to selectively freeze and unfreeze certain parts of the model

Of course not all CL algorithms need all these capabilities.

Consider that CL is a very variegated field but most algorithms leverage an instance replay mechanism (implemented by inserting/replacing new instances into the dataset) plus some simple regularization/distillation/bias normalization algorithm (which mostly require flexibility on the tensors manipulation side). More recent algorithms really push on the idea of manipulating the architecture of the model, but I guess that supporting this behavior would be the most problematic part of this.

Alas, I don't have a clear understanding of the translation capabilities of tf.functions from Python to TFLite models, so I'm not able to fully grasp the complexity required to accomplish this kind of flexibility.

@bhack
Copy link
Contributor

bhack commented Jun 17, 2021

I think that fedarated and continual learning are more relevant in the on device/edge use case cause, in this context, It is still hard to achieve few-shot/zero-shot learning of "general pourpose" (recent) very large scale models. At least untill we figure out how knowledge "hard distillation" on these models could be achieved efficently on constrained devices.

@miaout17
Copy link
Contributor Author

can you elaborate on how such a shape change process would work?

Replying to @lc0

For example

  • Imagine you have a classifier where the last layer is a simple fully connected (e.g. tf.relu(tf.matmul(x, weight) + bias))
  • We can define a def set_classes_num(classes_num) TF function, which re-initializes the weight and bias variables to a different size. For example, if the number of hidden units is 1024 before the last layer, the weight can have shape [1024, classes_num] and the bias can have shape [classes_num]. The function can re-initialize the weights and bias to random value close to 0, and it will be ready to retrain the last layer.

We're building low level features to make describing the semantic possible. It's considerable to wraps these into easier to use API to make it more friendly for developers.

Let me know if this makes sense. I'm happy to try to write this as a more concrete pseudo code as well.

@miaout17
Copy link
Contributor Author

Replying to @vlomonaco and @lrzpellegrini

Thanks for the feedback!

For clarification: It sounds the continual learning automatically can modify the model structure without human interfering. Is my rough understand correct?

This seems more advanced than what we're currently targeting. Trying to break down the requirements:

Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.

I think this should be doable (by wrapping required logic into TF functions).

Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models
Ability to change the optimizer, loss, lr schedulers and other training related components

We haven't tried these yet. However I think in theory:

  • A TFLite model is like a TF function. There is no easy way to change it (e.g. adding a layer) after the TFLite model is created.
  • However, I think it's possible to model some of these behavior with control flow (e.g. if a value is true, skip a layer or switch to another optimizer algorithm)
  • In the future, we can also explore on-device generation / modification of TFLite model, but it would be an even more advanced route.

Ability to selectively freeze and unfreeze certain parts of the model

This should be doable with control flow (e.g. skip some gradient computation and variable update when a boolean value is true)

@danieljanes
Copy link

Thanks for sharing this, excited to see progress here. As one of the authors of the Flower federated learning framework, I can say that on-device training support is one of the biggest challenges for cross-device federated learning right now.

After reading the RFC I was wondering how setting/changing hyperparameters would work on-device. Would we just add additional arguments (like epochs) to e.g. the train method

@tf.function
  def train(self, inputs, labels, epochs):
    self.model.fit(inputs, labels, epochs=epochs)

and then call train(train_input, train_labels, epochs=3)?

@bhack
Copy link
Contributor

bhack commented Jun 27, 2021

About changing the model in training mode check:

https://discuss.tensorflow.org/t/how-to-implement-layerdrop-in-tensorflow-transformers/2396

@gdemos01
Copy link

\cc @vassilisvas is the co-author of Continual Learning on the Edge with TensorFlow Lite and the leader of the Learning Agents & Robots MRG. This is an interesting conversation to keep our eyes on and maybe contribute to the discussion.

@martinkersner
Copy link

Thank you for bringing on-device training to TFLite!

Based on this proposal I am not sure where do you plan to manage a training loop. Are you thinking of (1) keeping it inside of TFLite or (2) letting developer decide how to the training loop will be structured on device?

As @danieljanes pointed out, the API doesn’t show how the actual training step or training phase would be controlled. Moreover, optimizer and loss do not seem to be accessible from saved model. How would train method know which one to use?

@yingding
Copy link

yingding commented Nov 6, 2021

I have similar question to @martinkersner regarding the training loop from the context of Federated ML with TF-lite. It would be fantastic to let developer to decide how to train and structure the training loop on device. In this way, it opens up the possibility to forward the gradients from the training loop to further orchestration structure to allow centralised and decentralised Fed. ML.

I can understand the benefits to keep the training loop and structure inside TFLite, so that it can be distributed unified across all the platforms. And with the training loops open up to different platforms, you might need an additional lib extension for android, IoT and so on. But with the additional lib extensions to control training loop, you can reduce the dependencies on different platforms and speed up the development cycle for TFLite, since all the extension libs can have their own deployment cycle.

@bhack
Copy link
Contributor

bhack commented Nov 6, 2021

We had already some research work at ICML 2021 to joint Federated and Continual learning with a TF reference impl:

https://github.com/wyjeong/FedWeIT

It could be nice to open this research subdomain to the Edge devices with TFlite.

@bhack
Copy link
Contributor

bhack commented Nov 10, 2021

@yingding
Copy link

https://www.tensorflow.org/lite/examples/on_device_training/overview
This is live yesterday (9.Nov) on ML Community Day stream.

@bhack
Copy link
Contributor

bhack commented Nov 13, 2021

Another interesting use case, also if Imagenet probably It is a too large dataset for many edge computing TFlite platforms, Is this recent Deepmind paper One Pass ImageNet:

https://arxiv.org/abs/2111.01956

@ematejska
Copy link
Contributor

Is this ready for community feedback? Are you ready to take this through review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.