Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustboard's GLIBC dependency is too high, need to lower GLIBC requirement #6578

Closed
wookayin opened this issue Sep 13, 2023 · 8 comments
Closed
Labels
core:rustboard //tensorboard/data/server/...

Comments

@wookayin
Copy link

wookayin commented Sep 13, 2023

From #4784 (comment)

The rustboard extension (--load_fast=True) of Tensorboard 2.12 ~ 2.14 cannot run on old(?) linux systems, including Ubuntu 20.04, because of the GLIBC dependency >= 2.34. I think this is a too high or strict requirement; Ubuntu 20.04 LTS is still popular in many dev environments and should still be supported. I request that the minimum required GLIBC version should be at most 2.31, but the lower the better as long as we don't lose any performance (e.g., as low as 2.18).

This is probably due to the base image of CI being bumped up to 22.04 from 20.04 (for previous releases) -- see #5992 (/cc @bmd3k). While I find it may cause another problem if we revert it back to 20.04, we could instead link against an old version of glibc for the sake of best compatibility. Possible approaches include adding __asm__ directives, use a cross-compile toolchain, etc.

Ref)

Some useful facts, copied from #4784:

  • Ubuntu 20.04 has GLIBC 2.31:
$ ldd --version | grep GLIB
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
$ cat /etc/lsb-release | grep DESCRIPTION
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
  • tensorboard_data_server>=0.7 requires GLIBC>=2.34:
$ objdump -T $(python -c "from tensorboard_data_server import server_binary; print(server_binary())")  | grep GLIBC
...
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  pthread_create
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  __libc_start_main
  • tensorboard_data_server==0.6.1 (tensorboard 2.11) requires GLIBC>=2.18
$ objdump -T $(python -c "from tensorboard_data_server import server_binary; print(server_binary())")  | grep GLIBC
...
0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.18  __cxa_thread_atexit_impl
...

We could even release a minor/maintenance release (say tensorboard_data_server 0.7.1) that relaxes GLIBC requirement without an need to wait until the next TB release cycle.

This can be done mostly on the CI side, so it'd be much convenient to be done by Googlers. But I'd be happy to contribute if any help is needed from my side.

@nik-sm
Copy link

nik-sm commented Sep 26, 2023

I see an error message that I think comes from the same issue (but I'm not using rustboard AFAIK).

I'm on Ubuntu 20.04, using python3.11. This ubuntu version has glibc 2.31 as you mentioned:

$ ldd --version 
ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31
...

Using pip install tensorboard==2.11.2, then I can launch tensorboard without errors.

But using any of these more recent versions:
tensorboard==2.12.3,
tensorboard==2.13.0
tensorboard==2.14.0
... I get this error (paths below are truncated):

$ tensorboard
TensorFlow installation not found - running with reduced feature set.
tensorboard --logdir checkpoints/tutorial8/tensorboards

/.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
/.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
/.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.12.3 at http://localhost:6006/ (Press CTRL+C to quit)

Even though these errors are printed, I can still access the tensorboard server at localhost:6006 and things look normal. Should I be worried that something is not functioning right?

I found this old issue that seems somewhat related: #4928
I'm not quite sure where to check the rules about "what dependencies are OK in order to use a certain pypi version string" - but it seems like this package claims to work on any linux, when in fact it does not work on Ubuntu 20.04.

If this is actually an error (and not just a warning/something I can ignore) - could I have detected this incompatibility somehow?

@wookayin
Copy link
Author

wookayin commented Sep 27, 2023

@nik-sm As I wrote in #4784 (comment), rustboard ("fast" mode) can't run in your situation, so tensorboard is automatically falling back to painfully slow python implementations. You can see the plots and TB runs OK but data loading speed will be very, very, very slow.

If you add --load_fast=true to the tensorboard CLI flag, the command will emit errors and fail.

@nik-sm
Copy link

nik-sm commented Sep 27, 2023

Ok I understand now - thanks for explaining. I'll stick to a version that's compatible with Ubuntu 20.04 / libc 2.31 for now.

I think this issue arose because the wheel provided for tensorboard 2.12, 2.13, 2.14 is not named correctly (https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/#manylinux, https://peps.python.org/pep-0600/). For example, maybe it should use manylinux_2_32 instead of any?

(Tensorboard still does run even with an older libc version. You've explained it will be extremely slow and I should avoid this situation, so I guess I'm not sure whether one would say that "Tensorboard is technically compatible with libc 2.31"...)

@groszewn
Copy link
Contributor

groszewn commented Oct 6, 2023

The produced tensorboard-data-server wheels do specify manylinux2014, but the tensorboard wheels do not.

Looks like 0.6.1 wheels specify manylinux2010, whereas 0.7+ specifies manylinux2014.

Would either of you be able to test running the rustboard binary directly (i.e. not via the tensorboard entrypoint) and let me know if that works for you?

> pip install tensorboard-data-server
> RUSTBOARD_BINARY=$(python -c 'import tensorboard_data_server; print(tensorboard_data_server.server_binary())')
> $RUSTBOARD_BINARY --logdir <path to arbitrary logdir>

@nik-sm
Copy link

nik-sm commented Oct 6, 2023

Hi - I ran the commands above, but I get the same error message:
(I've truncated paths on my system with ...)

$ python3.11 -m venv venv
$ source venv/bin/activate
(venv) $ pip install tensorboard-data-server
Collecting tensorboard-data-server
  Obtaining dependency information for tensorboard-data-server from https://files.pythonhosted.org/packages/02/52/fb9e51fba47951aabd7a6b25e41d73eae94208ccf62d886168096941a781/tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata
  Using cached tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (1.1 kB)
Using cached tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl (6.6 MB)
Installing collected packages: tensorboard-data-server
Successfully installed tensorboard-data-server-0.7.1

(venv) $ RUSTBOARD_BINARY=$(python -c 'import tensorboard_data_server; print(tensorboard_data_server.server_binary())')

(venv) $ $RUSTBOARD_BINARY --logdir results  # Some tensorboard logs from another project
.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by .../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by .../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
.../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by .../venv/lib/python3.11/site-packages/tensorboard_data_server/bin/server)

(venv) $ pip freeze | rg tensorboard
tensorboard-data-server==0.7.1
(venv) $ python --version
Python 3.11.5
(venv) $ uname -a
Linux goliath 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Let me know if you'd like me to try anything else.

@groszewn
Copy link
Contributor

groszewn commented Oct 7, 2023

Thanks, confirmed that this is due to the glibc version on the specific machine used to create the published wheel. Just a note that the rustboard release process does not happen via GitHub CI, so #5992 is actually not the root cause of this issue (though it would have been the root cause if we did use GitHub CI for releases).

> pip` install auditwheel
> pip download tensorboard-data-server
> auditwheel show tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl

tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl is
consistent with the following platform tag: "manylinux_2_34_x86_64".

The wheel references external versioned symbols in these
system-provided shared libraries: libm.so.6 with versions
{'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0', 'GCC_4.2.0',
'GCC_3.3'}, libc.so.6 with versions {'GLIBC_2.2.5', 'GLIBC_2.9',
'GLIBC_2.32', 'GLIBC_2.3.2', 'GLIBC_2.34', 'GLIBC_2.18', 'GLIBC_2.4',
'GLIBC_2.33', 'GLIBC_2.15', 'GLIBC_2.28', 'GLIBC_2.17', 'GLIBC_2.14',
'GLIBC_2.29', 'GLIBC_2.7', 'GLIBC_2.3', 'GLIBC_2.10', 'GLIBC_2.25',
'GLIBC_2.3.4'}

This constrains the platform tag to "manylinux_2_34_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.

@groszewn
Copy link
Contributor

I've released 0.7.2 which modifies the platform tag to align with PEP 600 (i.e. manylinux_2_31_x86_64). I've verified on a machine running Ubuntu 20.04 that the new release works as expected, but please feel free to reopen if you are still seeing issues on specific platforms.

@wookayin
Copy link
Author

wookayin commented Oct 24, 2023

Thanks @groszewn, I can confirm that the minimum GLIBC version for tensorboard-data-server==0.7.2 is 2.29 and it works well on Ubuntu 20.04 (18.04 is not supported).

0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.29  posix_spawn_file_actions_addchdir_np

For record, related to #6636 as a part of TensorBoard 2.15. (The actual fix comes from the internal release process)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:rustboard //tensorboard/data/server/...
Projects
None yet
Development

No branches or pull requests

4 participants