Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Using Coral USB Accelerator in Docker (ValueError: Failed to load delegate from libedgetpu.so.1) #3

Closed
jjimin opened this issue Oct 30, 2019 · 10 comments

Comments

@jjimin
Copy link

jjimin commented Oct 30, 2019

System information

  • Ubuntu 16.04
  • Docker CE 19.03.3 (container using the image tensorflow/tensorflow:nightly-devel-gpu-py3)
    • In the container:
      • Python 3.5.2

I am trying to get started with my USB Accelerator using the classify_image.py source code in a Docker container.
My Dockerfile for this project is like this:

FROM tensorflow/tensorflow:nightly-devel-gpu-py3

WORKDIR /home
ENV HOME /home
VOLUME /data
EXPOSE 8888
RUN cd ~
RUN apt-get update
RUN apt-get install -y git nano python-pip python-dev pkg-config wget usbutils

RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \
		| tee /etc/apt/sources.list.d/coral-edgetpu.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y libedgetpu1-std

RUN wget https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl
RUN pip3 install tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl

RUN mkdir coral && cd coral
RUN git clone https://github.com/google-coral/tflite.git

And I made the container with this command:
docker run -it -v /dev/bus/usb:/dev/bus/usb --gpus all coral-usb:0.1 /bin/bash

In the container, I followed the manual in 'Get started with the USB Accelerator'.

python3 classify_image.py \
--model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \
--labels models/inat_bird_labels.txt \
--input images/parrot.jpg

And after running the code above, I got some errors like this:

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 165, in load_delegate
delegate = Delegate(library, options)
File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 119, in init
raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "classify_image.py", line 118, in
main()
File "classify_image.py", line 95, in main
interpreter = make_interpreter(args.model)
File "classify_image.py", line 69, in make_interpreter
{'device': device[0]} if device else {})
File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 168, in load_delegate
library, str(e)))
ValueError: Failed to load delegate from libedgetpu.so.1

How could I solve this problem?

@Namburger
Copy link

Namburger commented Oct 30, 2019

@jjimin
A few normal diagnostic for ValueError: Failed to load delegate from libedgetpu.so.1:

  1. Could you check if libedgetpu is properly installed with
    $ ll /usr/lib/{GNU-TYPE}/libedge*?
    For reference:
$ ll /usr/lib/x86_64-linux-gnu/libedgetpu*
lrwxrwxrwx 1 root root   43 Oct 17 15:03 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1 -> /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
-rwxr-xr-x 1 root root 930K Oct 17 15:03 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
  1. We normally get this error if the linux user in the system isn't in the plugdev group, which could be an easy fix with either running in sudo or adding the user into the plugdev group. However since you're in a docker container, you should already running as root, so maybe this case should be eliminated for now.

  2. I know this is silly, but is your accelerator plugged in?

  3. Lastly, have you been able to run this demo on your host machine without using a docker container? I know that we've been seeing some hiccups with Ubuntu 16.04 with the tensorflow api(see here, but this is a different error messages). Can you also try the edgetpu api for doing inferencing?

@Namburger
Copy link

Namburger commented Oct 30, 2019

@jjimin UPDATE:
I ran into your exact issue, and was able to fix with slight modification to the Dockerfile (for dependencies issue). It appears that the docker container just didn't have access to usb devices. Here are some info information:

  • Ubuntu 18.04.3 LTS

  • Docker CE 19.03.4 with same exact image

  • Dockerfile:

FROM tensorflow/tensorflow:nightly-devel-gpu-py3
  
WORKDIR /home
ENV HOME /home
VOLUME /data
EXPOSE 8888
RUN cd ~
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test
RUN apt-get update
RUN apt-get install -y git nano python-pip python-dev pkg-config wget usbutils gcc-4.9
RUN apt-get upgrade -y libstdc++6

RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \
    | tee /etc/apt/sources.list.d/coral-edgetpu.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y libedgetpu1-std

RUN wget https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl
RUN pip3 install tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl

RUN mkdir coral && cd coral
RUN git clone https://github.com/google-coral/tflite.git
  • Build:
$ docker build -t "coral-usb:0.1" .
  • Run image (the --privileged flag was probably what you were missing):
$ docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-usb:0.1 /bin/bash
  • Example demo run:
root@c495f381807a:~/tflite# cd ~/tflite/python/examples/classification/
root@c495f381807a:~/tflite/python/examples/classification# ls
README.md  classify.py	classify_image.py  install_requirements.sh
root@c495f381807a:~/tflite/python/examples/classification# bash install_requirements.sh 
Requirement already satisfied: numpy in /usr/local/lib/python3.5/dist-packages (1.15.4)
Requirement already satisfied: Pillow in /usr/local/lib/python3.5/dist-packages (5.3.0)
You are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   189  100   189    0     0    236      0 --:--:-- --:--:-- --:--:--   236
100 3988k  100 3988k    0     0  2085k      0  0:00:01  0:00:01 --:--:-- 4692k
100   181  100   181    0     0    540      0 --:--:-- --:--:-- --:--:--  176k
100 3448k  100 3448k    0     0  3624k      0 --:--:-- --:--:-- --:--:-- 3624k
100   158  100   158    0     0    826      0 --:--:-- --:--:-- --:--:--  154k
100 40895  100 40895    0     0   117k      0 --:--:-- --:--:-- --:--:--  117k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   148  100   148    0     0    201      0 --:--:-- --:--:-- --:--:--   201
100 3068k  100 3068k    0     0  1796k      0  0:00:01  0:00:01 --:--:-- 3368k
root@c495f381807a:~/tflite/python/examples/classification# python3 classify_image.py \
> --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \
> --labels models/inat_bird_labels.txt \
> --input images/parrot.jpg
INFO: Initialized TensorFlow Lite runtime.
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.6ms
4.0ms
4.1ms
4.0ms
4.0ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.76172

Hope this helps!

P.S. Unrelated, but out of curiosity, any reasons for using the gnu tensorflow image?

@jjrugui
Copy link

jjrugui commented Jan 17, 2020

@Namburger how did you verify that the container didn't have access to usb devices? I'm currently testing it in a container (on rpi4) in which I can see the device being recognized (I see the coral USB with lsusb) but I get the same error ValueError: Failed to load delegate from libedgetpu.so.1.

I'll make an independent issue if I can't figure it out in the next couple of days.

@Namburger
Copy link

Namburger commented Jan 17, 2020

@jjimin
As I mentioned, I ran to the same exact issue, and was able to fix with the --privileged flag, this give me an idea that maybe docker did not have access to usb devices. I don't think you should make an independent issue since it would make it harder to reference. You see, running our edgetpu library in virtualized containers are actually not officially supported, I was giving some pointers since it's working for me. Have you tried to run it under this command?

$ docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-usb:0.1 /bin/bash

@jjrugui
Copy link

jjrugui commented Jan 20, 2020

Hi @Namburger , thanks for your quick reply.

I'll be debugging it later this evening (CET), so I'll be able to provide more info. I'm using docker compose to build the container and I'm mounting the volume /dev/bus/usb as well as running it with --privileged.

Thanks again for your quick response :) I'll update info and/or fix if I found it later today.

@jjrugui
Copy link

jjrugui commented Jan 22, 2020

@Namburger thanks for your input. I could run on the TPU inside a container by using the API instead of using tflite_runtime and it works without a problem, as you stated in #2 (comment) .

@Namburger
Copy link

@jjrugui hi, make sure you are using an updated version of the tflite_runtime library also, this will mostlikely solve your tflite runtime API issue. The new package should now be this:
https://dl.google.com/coral/python/tflite_runtime-2.1.0-cp35-cp35m-linux_x86_64.whl

@yumemio
Copy link

yumemio commented Mar 5, 2024

I'm way too late to the party, but here is what I have discovered regarding this ValueError issue in a Docker environment (note that I've used Raspberry Pi 4 8GB in my experiments):

  • You need to add the --privileged flag when running the container.
    • This is surely not the best practice, as you should prefer a more fine-grained capability control over granting the blanket root privilege to the container, but I haven't tested which capability is necessary to fix the error...
  • You need to run the inference script as root inside the container.
  • The first inference attempt after the system boot always fails. Add the --restart always flag to automatically re-run the script (assuming you've set the CMD/ENTRYPOINT directive properly), and all is well.
    • This issue only affects Docker environments; outside of the container, the same script works fine right after the boot.

Environment:

libedgetpu1-max=16.0
python3-pycoral=2.0.0
python3-tflite-runtime=2.5.0.post1

Hopefully this may help someone in the future! 😄

@hvn2
Copy link

hvn2 commented Apr 18, 2024

I'm way too late to the party, but here is what I have discovered regarding this ValueError issue in a Docker environment (note that I've used Raspberry Pi 4 8GB in my experiments):

  • You need to add the --privileged flag when running the container.

    • This is surely not the best practice, as you should prefer a more fine-grained capability control over granting the blanket root privilege to the container, but I haven't tested which capability is necessary to fix the error...
  • You need to run the inference script as root inside the container.

  • The first inference attempt after the system boot always fails. Add the --restart always flag to automatically re-run the script (assuming you've set the CMD/ENTRYPOINT directive properly), and all is well.

    • This issue only affects Docker environments; outside of the container, the same script works fine right after the boot.

Environment:

libedgetpu1-max=16.0
python3-pycoral=2.0.0
python3-tflite-runtime=2.5.0.post1

Hopefully this may help someone in the future! 😄

can you explain (better with example) how to use --restart always flag?
I had the problem that Coral USB fails to lead delegate from libedgetpu.so.1 after the Raspberry Pi reboot. But compose down and compose up the container again, then it works.

@yumemio
Copy link

yumemio commented Apr 19, 2024

@hvn2 Thanks for reaching out! Sorry that I forgot to mention that --restart always is a Docker CLI flag (doc) (docker run --restart always mycontainer mycmd). Other useful options are also mentioned in the linked page.

If you're using Docker Compose, you can set an equivalent option restart: always (doc) to the service that runs your Python script. For example:

services:
  ai:
    # ...
    # Restart the docker on failure, and after the system boot
    restart: always
    # Mount Coral TPU
    devices:
      - "/dev/bus/usb:/dev/bus/usb"
    # Needs privileged access to the host OS
    privileged: true

then run docker compose up to start the service. The first time it'll throw an error, but Docker would immediately restart the container and re-attempt to initialize the TPU, and this time you won't see ValueError.

We've recently open-sourced a project that utilizes Coral TPU & Raspberry Pi & Docker, which you might be interested in as a reference implementation:

(Comments are written in Japanese; you can use machine-translation if necessary!)

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants