Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation failed due to large activation tensors in model. #439

Closed
xsacha opened this issue Aug 11, 2021 · 13 comments
Closed

Compilation failed due to large activation tensors in model. #439

xsacha opened this issue Aug 11, 2021 · 13 comments
Assignees
Labels
comp:compiler Compiler related issues comp:model Model related isssues type:support Support question or issue

Comments

@xsacha
Copy link

xsacha commented Aug 11, 2021

Is there any explanation of what this error means? Is there a workaround?
The model is only 800KB so I'm unsure how any part could be too large. It's an INT8 TFLITE model that was created using this script with input size set to (1, 3, 768, 1024) [NCHW].

$ edgetpu_compiler -s tf_model/facedetector.tflite 
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
ERROR: Restored original execution plan after delegate application failure.
Compilation failed: Compilation failed due to large activation tensors in model.
Compilation child process completed within timeout period.
Compilation failed! 

Edit: I found targeting an older runtime version allows it to successfully compile. Default was 14. 12 and above fail but 11 works.

Although successful, it then fails to map any OPs to the TPU.

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 0
Number of operations that will run on CPU: 282

Operator                       Count      Status

CONV_2D                        134        Operation is working on an unsupported data type
GATHER                         1          Operation is working on an unsupported data type
RESIZE_NEAREST_NEIGHBOR        2          Operation is working on an unsupported data type
RELU                           3          Operation is working on an unsupported data type
RESHAPE                        9          Operation is working on an unsupported data type
PAD                            23         Operation is working on an unsupported data type
SPLIT                          10         Operation is working on an unsupported data type
ADD                            22         Operation is working on an unsupported data type
SOFTMAX                        1          Operation is working on an unsupported data type
QUANTIZE                       4          Operation is otherwise supported, but not mapped due to some unspecified limitation
TRANSPOSE                      57         Operation is working on an unsupported data type
CONCATENATION                  16         Operation is working on an unsupported data type
Compilation child process completed within timeout period.
Compilation succeeded! 
@manoj7410
Copy link

@xsacha Can you share your CPU .tflite model here ?

@manoj7410 manoj7410 self-assigned this Aug 11, 2021
@manoj7410 manoj7410 added the comp:model Model related isssues label Aug 11, 2021
@xsacha
Copy link
Author

xsacha commented Aug 11, 2021

facedetector.zip
Here is the tflite

@manoj7410
Copy link

@xsacha Did you try to reduce the Tensor size of (1, 3, 768, 1024) ?

@xsacha
Copy link
Author

xsacha commented Aug 11, 2021

I haven't attempted it yet. The original model runs at any resolution. Is there a limitation on the TPU?
Edit: I tried 640x480 instead and it still fails on runtime version 14 when it is quantised. If it's not quantised, then it works but obviously also fails to map any ops due to it being fp32.
The result is exactly the same as original post unfortunately.

Edit#2: For the second issue I get (on runtime 10 and 11), where none of the OPs map to the TPU: the issue appears to be the conversion from UINT8 (input) to INT8. From reading, this is a Tensorflow 2.X change that isn't supported here. I found a rather complicated workaround here: https://towardsdatascience.com/hacking-google-coral-edge-tpu-motion-blur-and-lanczos-resize-9b60ebfaa552 that allows modifying the types to uint8 so that they can run on the TPU.
The method did not work for my model though as it then complains about 'PAD'.

@hjonnala hjonnala added comp:compiler Compiler related issues type:support Support question or issue labels Aug 11, 2021
@xsacha
Copy link
Author

xsacha commented Aug 15, 2021

I've had several more attempts at this. Using v1 converter, using much smaller input size, manually allowing only uint8 ops and such and still can't get it working.

Q1. Does every op have to be uint8 to run on the TPU? I notice the first thing the TFlite does is quantize to signed int8.
Q2. Does Tensorflow 2 not create models compatible with Edge TPU?
Q3. Do I have to use Channels-Last format to be compatible with Edge TPU? My model shape is 3xHxW.

I feel there's a lack of documentation around this issue. I also haven't been able to find any tutorial online that can successfully convert any of my models even though all the OPs are supported.

@manoj7410
Copy link

manoj7410 commented Aug 16, 2021

Q1. Does every op have to be uint8 to run on the TPU? I notice the first thing the TFlite does is quantize to signed int8.

Yes. Every operations has to be int8 quantized in order to run on the TPU. Please see the model requirements at https://coral.ai/docs/edgetpu/models-intro/#model-requirements

Q2. Does Tensorflow 2 not create models compatible with Edge TPU?

TF2 models are compatible with the edgeTPU.

Q3. Do I have to use Channels-Last format to be compatible with Edge TPU? My model shape is 3xHxW.

Please see the documentation https://coral.ai/docs/edgetpu/models-intro/#compatibility-overview for the model compatibility information.

I feel there's a lack of documentation around this issue. I also haven't been able to find any tutorial online that can successfully convert any of my models even though all the OPs are supported.

You can find many tutorials around this issue at https://github.com/google-coral/tutorials. We are always eager to improve our documentation and make it more smooth for the TPU users.

@xsacha
Copy link
Author

xsacha commented Aug 16, 2021

@manoj7410
On this page https://coral.ai/docs/edgetpu/faq/ it says:

To create a compatible model with post-training quantization, you must use TensorFlow 1.15 and set both the input and output type to uint8; currently, you cannot use TensorFlow 2.0 because it supports only float input/output.)

Is this out of date? I am using the post-training quantization in TF 2.5 right now and wondering if that is an issue. I can set uint8 as input and output but the actual ops have to be int8. Due to this error (large activation tensors in model), I cannot use runtimes that support Quantize (13+), so I changed it to use int8 input and output. From what you have said, this should work but it still provides the same issue.

Also, I couldn't find on any of those pages, including model compatibility link above, if a channels-first approach would fail or not. That is the following shape input: 1x3x480x640

Edit: I also uploaded several models to the coral tutorials to quantize and convert to edge tpu and they all failed with this same issue.

@manoj7410
Copy link

Please see the section : "Can I use TensorFlow 2.0 to create my model?" at https://coral.ai/docs/edgetpu/faq/.
I'll also get this documentation checked once from the team.

@manoj7410
Copy link

@xsacha Channel first approach might fail, as tflite and edgetpu_compiler doesn't fully support all the ops.

@xsacha
Copy link
Author

xsacha commented Aug 17, 2021

I used a channels-last version instead (same model would fail on all ops as channels-first) and got the following:

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 36
Number of operations that will run on CPU: 61

Operator Count Status

LEAKY_RELU 11 Operation not supported
PAD 5 Mapped to Edge TPU
CONCATENATION 1 Operation is otherwise supported, but not mapped due to some unspecified limitation
CONCATENATION 6 More than one subgraph is not supported
RESHAPE 9 More than one subgraph is not supported
ADD 2 More than one subgraph is not supported
CONV_2D 17 Mapped to Edge TPU
CONV_2D 26 More than one subgraph is not supported
QUANTIZE 1 Mapped to Edge TPU
QUANTIZE 1 Operation is otherwise supported, but not mapped due to some unspecified limitation
RESIZE_NEAREST_NEIGHBOR 2 More than one subgraph is not supported
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
RELU 3 More than one subgraph is not supported

It looks like then that it isn't possible for me to convert a model from PyTorch/ONNX that can then work on the Edge TPU as they only have facility for channels-first ops (channels-last memory layout only).
There should probably be some big red text in the documentation stating this because I spent a lot of time to figure this out.

@manoj7410
Copy link

@xsacha Can you take a look at this comment once #419 (comment)

@xsacha
Copy link
Author

xsacha commented Aug 17, 2021

Thanks. That looks promising. From what I can see, it goes via openvino to perform the conversion and that extra step might introduce other limitations but it opens up a better possibility than retraining all the models!

@xsacha xsacha closed this as completed Aug 17, 2021
@xsacha
Copy link
Author

xsacha commented Aug 18, 2021

As an update, I managed to convert one of my models that was getting this error using the method you linked to.
Another model failed due to a rounding difference in OpenVino.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:compiler Compiler related issues comp:model Model related isssues type:support Support question or issue
Projects
None yet
Development

No branches or pull requests

4 participants