-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot build containers, python_install.sh fails (Orin AGX 64GB, JetPack 6.1) #654
Comments
Hi @benswift, I had updated jetson-containers for initial JetPack 6.1 support a few days ago - can you try doing a git pull in your jetson-containers repo? Then when you should start seeing jetson-containers to start reporting your board correctly as JetPack 6.1, and the python build should go through. It takes a while to rebuild everything here, but PyTorch and Transformers are fine as should be diffusers. |
Thanks for the quick response. I am using the latest (53eb0e6), yeah. |
I see the problem with the version detection: 36.4.0 is too new and isn't in the version mapping dict in So I made this change: jane@ubuntu:~/Code/jetson-containers$ git diff --ignore-space-change
diff --git a/jetson_containers/l4t_version.py b/jetson_containers/l4t_version.py
index 22b7e264..896a859f 100644
--- a/jetson_containers/l4t_version.py
+++ b/jetson_containers/l4t_version.py
@@ -84,6 +84,7 @@ def get_jetpack_version(l4t_version=get_l4t_version(), default='5.1'):
NVIDIA_JETPACK = {
# -------- JP6 --------
+ "36.4.0": "6.1",
"36.3.0": "6.0 GA",
"36.2.0": "6.0 DP",
"36.0.0": "6.0 EA", and got a bit further, but still erroring out with python issues, this time about not being able to resolve a version for twine and pkginfo.
|
One more update, just in case others are playing along at home: I had to get python3 -m pip install --upgrade pip pkginfo --index-url https://pypi.org/simple (note the extra After that it gets much further, and I'm currently now erroring out on
@dusty-nv any ideas on what to do with packages that aren't in the Jetson index? |
Sorry @benswift, that change to l4t_version.py I thought I had merge into master, but it was still hiding in dev - that has now been merged, and you can see the recognition of JP 6.1 here:
@tokk-nv had reported a similar thing about |
@benswift : thanks for this - i have been struggling with the same since yesterday. I am though running into issues even after making the changes that you have suggested. @dusty-nv : i had built a container for vlm using ros iron (nano_llm:iron-r36.3.0-cu122) please refer #622. This has been running without issues on an OrinNX whch is still on jetpack version 6.0. I tried to run the same container on an Orin AGX Developer Kit which is running on 36.4.0 and Jetpack 6.1, and I started getting errors such as these Current thread 0x0000ffff99cc8da0 (most recent call first): I then tried to build a new container with cuda 12.6 but the build process stops when building the OpenCV container. I am getting the below errors. It appears to be the same python related ones - look forward to your feedback `traceback (most recent call last): An error occurred while building with CMake. error: subprocess-exited-with-error × Building wheel for opencv-contrib-python (pyproject.toml) did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. ` |
@benswift as of yesterday morning opencv was building, and the wheels are up on my pip server, which the containers should install instead of needing to build them (like it appears to have in your case). I'm not sure what the actual compilation error is from that log, but will try again (I was also trying to build ROS for that) MLC and NanoLLM I haven't gotten to yet, NanoLLM has a lot of dependencies that am in the process of working through. Does PyTorch in the r36.3 containers still work for you on r36.4, or is it just MLC? |
@dusty-nv ah, I didn't think to look in the dev branch. Do you have any guidance (or are there any docs somewhere) on which packages it might be ok to pull from "regular" pypi.org and which ones have to come from the jetson.webredirect.org one? @Fibo27 I actually had the exact same error trying to build the opencv container. I ended up just pip installing |
@dusty-nv whoops - we were writing comments at the same time. I'll re-pull the latest and check the opencv thing ASAP - will be able to do that in a couple of hours - and report back. |
@benswift if you browse the previously-built wheels for CUDA 12.2, those are the ones that end up getting specially-built with CUDA support. That pip server blacklists those packages from pypi, but mirrors others (so for example you can |
Ok, I just tried to build opencv from the latest |
@benswift don't do that unless you actually want to remove all your docker images 🤣 You can disable the build cache though, I think that's what Johnny means. It's weird you get that twine error, will cherry-pick your patch 👍 |
yeap jajajaja sometimes I have conflicts, so I clean everything and rebuild it, and works |
only the "error" that I obtain is this one: |
@dusty-nv one more update in case you're wondering. I can build the xformers container, but the torch_tensorrt (which I'd like to use, but isn't essential for me right now) one doesn't work.
|
OK, there was an additional fix needed for OpenCV (see f1b346b) Also checked in the changes to pip pkginfo and optimum. torch_tensorrt I haven't touched in a while as it was challenging to keep building, instead end up using torch2trt a lot (coincidentally which I believe also needs updated for TensorRT 10, because I also need to finish updating jetson-inference for TRT10) Sorry, you picked good week to start out haha, a lot of JetPack updates don't need a whole rebuild but this one was a pretty big version bump from CUDA 12.2->12.6, cuDNN 9.3, and TRT 10. It's a big stack and can take a couple weeks for it all to settle down. |
All good mate, I have built and maintained enough software in my life to know that everything is a fragile house of cards and it’s a miracle anything ever works at all. Thanks for your support, and I will try those things that you suggested. |
try now torch_tensorrt. It was pointing to very old version. Now is compatible with: PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12 #657 also torch2trt is compatible with tensorrt 10 in last commit |
@dusty-nv you can see it: pytorch/TensorRT@main...lluo/jp6.1 |
OK thanks yea, it's still not building, will wait until updated and merged for CUDA 12.6. Indeed torch2trt built and is working though! |
super mate! |
@dusty-nv : i started building the nano_llm container with ros and i could build the following packages - this of course a big progress since yesterday. However I get the following error sudo docker run -t --rm --runtime=nvidia --network=host exec /ros_entrypoint.sh: exec format error |
hey @dusty-nv I tried again today with the latest master (1f5dee9). The pkginfo thing has been fixed, so my The optimum one still fails, though:
Anyway, I can make do with what I've got for now. Re: optimum you mentioned above you checked in some changes - are they in master or only on dev? Do you think that tensorrt (via optimum) would give me some easy speedups for a Stable Diffusion pipeline on the AGX Orin? This is all for an interactive art installation where higher throughput for the pipeline would be handy. |
Hey @dusty-nv I actually spoke to soon above - I was still using my local change where I had commented out the optimus dep. Again, my goal is to build a container with diffusers, transformers, and torch2trt (or torch_tensorrt, or some other way of using TensorRT-accelerated pytorch). As a first step, I ensure I'm on a clean
and I try and build the transformers container:
So |
What is yout command/prompt? |
Do you mean how am I invoking the
Or do you mean what's my shell environment or something? Here's the output of
|
I'm on
|
Thanks for your help, btw. I can confirm that I just tried literally the same command:
and I still get the error about optimum
|
i'm on master too, wtf jajaj it is strange, optimum has arm wheels if not I will change optimum docker to build manually |
Yes it is strange, I have still never gotten this error, and had changed how it was installed. @benswift just comment it out in the transformers dockerfile |
wow, you are here, hero |
Thanks all. I'm open to building stuff manually if there's any doco (doesn't have to be super polished, just enough for me to fill-in-the-gaps). Or @dusty-nv is there anything else you think I should try (dev branch, run further diagnostics, etc)? |
If you have nothing changed,
|
yep, will do |
@dusty-nv Now I understand the error, we use a newer version of transformer, 4.45 (huggingface/optimum@049b00f) which optimum has not yet released to be compatible (tags/release), therefore, it does not install(so we can't recreate this problem). Maybe, here we need your opinion whether to keep optimum until now and will update it, or create build to make transformer:4.44 & optimum:4.44 type. or another solution, or simply wait to optimum release v1.23.0 ERROR: |
Solved, optimum is out for transformers 4.45.2 |
Oh, that's good news. Do I have to wait for the jetson.webredirect.org pip cache to be updated? I'm not sure if that's a nightly cronjob or has to be triggered manually by an admin. |
it is updated manually if the index is cuda. But optimum is on root http://jetson.webredirect.org/root/pypi/optimum/stable |
Yea, optimum and transformers don't need CUDA/NVCC compilation directly (all the CUDA stuff they use is in dependencies), so I don't need to have wheels for them. The 'root index' is just a mirror of pypi. That's interesting that even though we don't pin transformers version in the dockerfile, it was still failing to resolve optimum... |
I've got a brand new Orin AGX developer kit (64GB) and used the SDK Manager to download and flash the latest Ubuntu (22.04) and JetPack SDK (6.1).
I'm trying to build a container (I need diffusers and transformers for my application code), however when I run the build command from the README (actually a truncated one, just trying to build pytorch without transformers and ros) it bombs out in the "install python" step:
Get:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease [270 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease [128 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease [127 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease [129 kB]
Get:5 http://ports.ubuntu.com/ubuntu-ports jammy/restricted arm64 Packages [24.2 kB]
Get:6 http://ports.ubuntu.com/ubuntu-ports jammy/universe arm64 Packages [17.2 MB]
Get:7 http://ports.ubuntu.com/ubuntu-ports jammy/main arm64 Packages [1,758 kB]
Get:8 http://ports.ubuntu.com/ubuntu-ports jammy/multiverse arm64 Packages [224 kB]
Get:9 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 Packages [2,265 kB]
Get:10 http://ports.ubuntu.com/ubuntu-ports jammy-updates/universe arm64 Packages [1,390 kB]
Get:11 http://ports.ubuntu.com/ubuntu-ports jammy-updates/restricted arm64 Packages [2,475 kB]
Get:12 http://ports.ubuntu.com/ubuntu-ports jammy-updates/multiverse arm64 Packages [29.5 kB]
Get:13 http://ports.ubuntu.com/ubuntu-ports jammy-backports/main arm64 Packages [80.9 kB]
Get:14 http://ports.ubuntu.com/ubuntu-ports jammy-backports/universe arm64 Packages [31.8 kB]
Get:15 http://ports.ubuntu.com/ubuntu-ports jammy-security/multiverse arm64 Packages [24.1 kB]
Get:16 http://ports.ubuntu.com/ubuntu-ports jammy-security/universe arm64 Packages [1,107 kB]
Get:17 http://ports.ubuntu.com/ubuntu-ports jammy-security/main arm64 Packages [1,995 kB]
Get:18 http://ports.ubuntu.com/ubuntu-ports jammy-security/restricted arm64 Packages [2,405 kB]
Fetched 31.7 MB in 10s (3,107 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
python3.10 is already the newest version (3.10.12-1
22.04.6).22.04.6 [4,664 kB]python3.10 set to manually installed.
The following additional packages will be installed:
libexpat1-dev libpython3.10-dev zlib1g-dev
The following NEW packages will be installed:
libexpat1-dev libpython3.10-dev python3.10-dev zlib1g-dev
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 5,464 kB of archives.
After this operation, 21.0 MB of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 libexpat1-dev arm64 2.4.7-1ubuntu0.4 [129 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 zlib1g-dev arm64 1:1.2.11.dfsg-2ubuntu9.2 [163 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 libpython3.10-dev arm64 3.10.12-1
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 python3.10-dev arm64 3.10.12-1
22.04.6 [508 kB]22.04.6_arm64.deb ...debconf: delaying package configuration, since apt-utils is not installed
Fetched 5,464 kB in 3s (1,710 kB/s)
Selecting previously unselected package libexpat1-dev:arm64.
(Reading database ... 29364 files and directories currently installed.)
Preparing to unpack .../libexpat1-dev_2.4.7-1ubuntu0.4_arm64.deb ...
Unpacking libexpat1-dev:arm64 (2.4.7-1ubuntu0.4) ...
Selecting previously unselected package zlib1g-dev:arm64.
Preparing to unpack .../zlib1g-dev_1%3a1.2.11.dfsg-2ubuntu9.2_arm64.deb ...
Unpacking zlib1g-dev:arm64 (1:1.2.11.dfsg-2ubuntu9.2) ...
Selecting previously unselected package libpython3.10-dev:arm64.
Preparing to unpack .../libpython3.10-dev_3.10.12-1
Unpacking libpython3.10-dev:arm64 (3.10.12-1
22.04.6) ...22.04.6_arm64.deb ...Selecting previously unselected package python3.10-dev.
Preparing to unpack .../python3.10-dev_3.10.12-1
Unpacking python3.10-dev (3.10.12-1
22.04.6) ...22.04.6) ...Setting up libexpat1-dev:arm64 (2.4.7-1ubuntu0.4) ...
Setting up zlib1g-dev:arm64 (1:1.2.11.dfsg-2ubuntu9.2) ...
Setting up libpython3.10-dev:arm64 (3.10.12-1
Setting up python3.10-dev (3.10.12-1~22.04.6) ...
/usr/bin/python3.10
Looking in indexes: http://jetson.webredirect.org/jp5/cu126
ERROR: Could not find a version that satisfies the requirement pip (from versions: none)
ERROR: No matching distribution found for pip
python3.6
/tmp/install_python.sh: line 27: python3.6: command not found
curl: (23) Failure writing output to destination
The command '/bin/sh -c /tmp/install_python.sh' returned a non-zero code: 127
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jane/Code/jetson-containers/jetson_containers/build.py", line 112, in
build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api)
File "/home/jane/Code/jetson-containers/jetson_containers/container.py", line 147, in build_container
status = subprocess.run(cmd.replace(NEWLINE, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag my_container:r36.4.0-python --file /home/jane/Code/jetson-containers/packages/build/python/Dockerfile --build-arg BASE_IMAGE=my_container:r36.4.0-cudnn --build-arg PYTHON_VERSION_ARG="3.10" /home/jane/Code/jetson-containers/packages/build/python 2>&1 | tee /home/jane/Code/jetson-containers/logs/20241002_212941/build/my_container_r36.4.0-python.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 127.
It's failing in
packages/build/python/install.sh
in the "install pip" part of the script. I tried hacking around and fixing the python issue, and got a bit further, but I don't know how the jetson+python setup works well enough to fix it for sure so I thought I'd ask here.Is this
jetson-containers
tool still the recommended way to build this stuff? I'm using current-gen hardware, with the latest SDK (afaict), so it seems like it should work? Happy to be pointed in other directions, though - I must confess I found it a bit confusing to know what's the latest/official(ish) way to do this (pull images from nvcr.io? build them with this tool? follow these docs, or perhaps others on the NVIDIA website?).One other thing I noticed:
jtop
(andapt show nvidia-jetpack
)reports that the board is running JetPack 6.1, but thejetson-containers
output detects it asJETPACK_VERSION=5.1
. While that doesn't seem to be the issue that's derailing me, it doesn't seem good.A bit more output from
jtop
if it's helpful.The text was updated successfully, but these errors were encountered: