Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File not found during import #792

Closed
iskorini opened this issue Dec 20, 2019 · 6 comments
Closed

File not found during import #792

iskorini opened this issue Dec 20, 2019 · 6 comments

Comments

@iskorini
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): 2.1.0-dev20191220 binary
  • TensorFlow-Addons version and how it was installed (source or binary): 0.7.0 source
  • Python version: 3.7.5
  • Is GPU used? (yes/no): yes

Describe the bug

When I try to import tensorflow addons I have an exception about a not found file. If I print path of that file in load_library.py gave me /data/students_home/fschipani/thesis/addons/tensorflow_addons/custom_ops/activations/_activation_ops.so.
So I searched _activation_ops.so in my home folder and I found it in /anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so

Code to reproduce the issue

import tensorflow as tf
import tensorflow_addons

Other info / logs

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/students_home/fschipani/thesis/addons/tensorflow_addons/__init__.py", line 21, in <module>
    from tensorflow_addons import activations
  File "/data/students_home/fschipani/thesis/addons/tensorflow_addons/activations/__init__.py", line 21, in <module>
    from tensorflow_addons.activations.gelu import gelu
  File "/data/students_home/fschipani/thesis/addons/tensorflow_addons/activations/gelu.py", line 24, in <module>
    get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
  File "/data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 60, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /data/students_home/fschipani/thesis/addons/tensorflow_addons/custom_ops/activations/_activation_ops.so: cannot open shared object file: No such file or directory
@seanpmorgan
Copy link
Member

seanpmorgan commented Dec 20, 2019

Try moving out of the source directory if you're still in it. If you import while in the source directory you'll get this error because the compile .so files are not present in the local directory

@iskorini
Copy link
Author

Thanks for quick reply!
But now I have this

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_addons/__init__.py", line 23, in <module>
    from tensorflow_addons import image
  File "/data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_addons/image/__init__.py", line 24, in <module>
    from tensorflow_addons.image.distort_image_ops import adjust_hsv_in_yiq
  File "/data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_addons/image/distort_image_ops.py", line 24, in <module>
    get_path_to_datafile("custom_ops/image/_distort_image_ops.so"))
  File "/data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 58, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /data/students_home/fschipani/anaconda3/envs/new_thesis/lib/python3.7/site-packages/tensorflow_addons/custom_ops/image/_distort_image_ops.so: undefined symbol: _ZN10tensorflow7strings8internal9CatPiecesB5cxx11ESt16initializer_listIN4absl11string_viewEE

Could it be an incompatibility problem with TF 2.1.0?

@seanpmorgan
Copy link
Member

#676
#574

That's an ABI incompatibility. Part of the build we're working to make easier, but ultimately the fix will come from: tensorflow/community#133

Are you compiling TF-Core from source as well? If so make sure you're using the same build environment (gcc, cxx11 flag, etc.)

Closing but feel free to continue discussion in this thread.

@seanpmorgan
Copy link
Member

@iskorini as a quick test could you edit the .bazelrc file after you run .configure.sh and flip build --action_env TF_CXX11_ABI_FLAG="0" to be 1. Then retry the build

@iskorini
Copy link
Author

iskorini commented Dec 21, 2019

Same results, so I tried to rebuild tensorflow (RC 2.1) from source and TFAddons with the same environment but nothing change.
.bazelrc from tfaddons:

build --action_env TF_HEADER_DIR="/data/students_home/fschipani/anaconda3/envs/aaa/lib/python3.7/site-packages/tensorflow_core/include"
build --action_env TF_SHARED_LIBRARY_DIR="/data/students_home/fschipani/anaconda3/envs/aaa/lib/python3.7/site-packages/tensorflow_core"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="0"
build --action_env TF_NEED_CUDA="1"
build --action_env CUDNN_INSTALL_PATH="/data/students_home/fschipani/anaconda3/envs/cuda"
build --action_env TF_CUDA_VERSION="10.1"
build --action_env TF_CUDNN_VERSION="7"
build --action_env CUDA_TOOLKIT_PATH="/data/students_home/fschipani/tmp/cuda-10.1"
test --config=cuda
build --config=cuda
build --spawn_strategy=local
build --strategy=Genrule=local
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain

.bazelrc from tf:

# Android configs. Bazel needs to have --cpu and --fat_apk_cpu both set to the
# target CPU to build transient dependencies correctly. See
# https://docs.bazel.build/versions/master/user-manual.html#flag--fat_apk_cpu
build:android --crosstool_top=//external:android/crosstool
build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:android_arm --config=android
build:android_arm --cpu=armeabi-v7a
build:android_arm --fat_apk_cpu=armeabi-v7a
build:android_arm64 --config=android
build:android_arm64 --cpu=arm64-v8a
build:android_arm64 --fat_apk_cpu=arm64-v8a
build:android_x86 --config=android
build:android_x86 --cpu=x86
build:android_x86 --fat_apk_cpu=x86
build:android_x86_64 --config=android
build:android_x86_64 --cpu=x86_64
build:android_x86_64 --fat_apk_cpu=x86_64

# Sets the default Apple platform to macOS.
build --apple_platform_type=macos

# iOS configs for each architecture and the fat binary builds.
build:ios --apple_platform_type=ios
build:ios --apple_bitcode=embedded --copt=-fembed-bitcode
build:ios_armv7 --config=ios
build:ios_armv7 --cpu=ios_armv7
build:ios_armv7 --cops -Wno-c++11-narrowing
build:ios_arm64 --config=ios
build:ios_arm64 --cpu=ios_arm64
build:ios_x86_64 --config=ios
build:ios_x86_64 --cpu=ios_x86_64
build:ios_fat --config=ios
build:ios_fat --ios_multi_cpus=armv7,arm64,x86_64
build:ios_fat --copt -Wno-c++11-narrowing

# Config to use a mostly-static build and disable modular op registration
# support (this will revert to loading TensorFlow with RTLD_GLOBAL in Python).
# By default, TensorFlow will build with a dependence on
# //tensorflow:libtensorflow_framework.so.
build:monolithic --define framework_shared_object=false

# For projects which use TensorFlow as part of a Bazel build process, putting
# nothing in a bazelrc will default to a monolithic build. The following line
# opts in to modular op registration support by default.
build --define framework_shared_object=true

# Flags for open source build, always set to be true.
build --define open_source_build=true
test --define open_source_build=true

# For workaround https://github.com/bazelbuild/bazel/issues/8772 with Bazel >= 0.29.1
build --java_toolchain=//third_party/toolchains/java:tf_java_toolchain
build --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain

# Please note that MKL on MacOS or windows is still not supported.
# If you would like to use a local MKL instead of downloading, please set the
# environment variable "TF_MKL_ROOT" every time before build.
build:mkl --define=build_with_mkl=true --define=enable_mkl=true
build:mkl --define=tensorflow_mkldnn_contraction_kernel=0
build:mkl -c opt

# This config option is used to enable MKL-DNN open source library only,
# without depending on MKL binary version.
build:mkl_open_source_only --define=build_with_mkl_dnn_only=true
build:mkl_open_source_only --define=build_with_mkl_dnn_v1_only=true
build:mkl_open_source_only --define=build_with_mkl=true --define=enable_mkl=true
build:mkl_open_source_only --define=tensorflow_mkldnn_contraction_kernel=0

build:download_clang --crosstool_top=@local_config_download_clang//:toolchain
build:download_clang --define=using_clang=true
build:download_clang --action_env TF_DOWNLOAD_CLANG=1
# Instruct clang to use LLD for linking.
# This only works with GPU builds currently, since Bazel sets -B/usr/bin in
# auto-generated CPU crosstool, forcing /usr/bin/ld.lld to be preferred over
# the downloaded one.
build:download_clang_use_lld --linkopt='-fuse-ld=lld'

# This config refers to building with CUDA available. It does not necessarily
# mean that we build CUDA op kernels.
build:using_cuda --define=using_cuda=true
build:using_cuda --action_env TF_NEED_CUDA=1
build:using_cuda --crosstool_top=@local_config_cuda//crosstool:toolchain

# This config refers to building CUDA op kernels with nvcc.
build:cuda --config=using_cuda
build:cuda --define=using_cuda_nvcc=true

# This config refers to building CUDA op kernels with clang.
build:cuda_clang --config=using_cuda
build:cuda_clang --define=using_cuda_clang=true
build:cuda_clang --define=using_clang=true
build:cuda_clang --action_env TF_CUDA_CLANG=1

build:tensorrt --action_env TF_NEED_TENSORRT=1

build:rocm --crosstool_top=@local_config_rocm//crosstool:toolchain
build:rocm --define=using_rocm=true --define=using_rocm_hipcc=true
build:rocm --action_env TF_NEED_ROCM=1

build:sycl --crosstool_top=@local_config_sycl//crosstool:toolchain
build:sycl --define=using_sycl=true
build:sycl --action_env TF_NEED_OPENCL_SYCL=1

build:sycl_nodouble --config=sycl
build:sycl_nodouble --cxxopt -DTENSORFLOW_SYCL_NO_DOUBLE

build:sycl_nodouble --config=sycl
build:sycl_asan --copt -fno-omit-frame-pointer --copt -fsanitize-coverage=3 --copt -DGPR_NO_DIRECT_SYSCALLS --linkopt -fPIC --linkopt -fsanitize=address

build:sycl_nodouble --config=sycl
build:sycl_trisycl --define=using_trisycl=true

# Options extracted from configure script
build:ngraph --define=with_ngraph_support=true
build:numa --define=with_numa_support=true

# Options to disable default on features
build:noaws --define=no_aws_support=true
build:nogcp --define=no_gcp_support=true
build:nohdfs --define=no_hdfs_support=true
build:nonccl --define=no_nccl_support=true

build --define=use_fast_cpp_protos=true
build --define=allow_oversize_protos=true

build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build -c opt

# By default, build TF in C++ 14 mode.
build --cxxopt=-std=c++14
build --host_cxxopt=-std=c++14

# Make Bazel print out all options from rc files.
build --announce_rc

# Other build flags.
build --define=grpc_no_ares=true

# See https://github.com/bazelbuild/bazel/issues/7362 for information on what
# --incompatible_remove_legacy_whole_archive flag does.
# This flag is set to true in Bazel 1.0 and newer versions. We tried to migrate
# Tensorflow to the default, however test coverage wasn't enough to catch the
# errors.
# There is ongoing work on Bazel team's side to provide support for transitive
# shared libraries. As part of migrating to transitive shared libraries, we
# hope to provide a better mechanism for control over symbol exporting, and
# then tackle this issue again.
#
# TODO: Remove this line once TF doesn't depend on Bazel wrapping all library
# archives in -whole_archive -no_whole_archive.
build --noincompatible_remove_legacy_whole_archive

# Modular TF build options
build:dynamic_kernels --define=dynamic_loaded_kernels=true
build:dynamic_kernels --copt=-DAUTOLOAD_DYNAMIC_KERNELS

# Build TF with C++ 17 features.
build:c++17 --cxxopt=-std=c++1z
build:c++17 --cxxopt=-stdlib=libc++
build:c++1z --config=c++17

# Default paths for TF_SYSTEM_LIBS
build --define=PREFIX=/usr
build --define=LIBDIR=$(PREFIX)/lib
build --define=INCLUDEDIR=$(PREFIX)/include

# Suppress all warning messages.
build:short_logs --output_filter=DONT_MATCH_ANYTHING

# Options when using remote execution
build:rbe --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
build:rbe --auth_enabled=true
build:rbe --auth_scope=https://www.googleapis.com/auth/cloud-source-tools
build:rbe --define=EXECUTOR=remote
build:rbe --flaky_test_attempts=3
build:rbe --jobs=200
build:rbe --remote_accept_cached=true
build:rbe --remote_cache=remotebuildexecution.googleapis.com
build:rbe --remote_executor=remotebuildexecution.googleapis.com
build:rbe --remote_local_fallback=false
build:rbe --remote_timeout=600
build:rbe --spawn_strategy=remote,worker,sandboxed,local
build:rbe --strategy=Genrule=remote,worker,sandboxed,local
build:rbe --strategy=Closure=remote,worker,sandboxed,local
build:rbe --strategy=Javac=remote,worker,sandboxed,local
build:rbe --strategy=TestRunner=remote,worker,sandboxed,local
build:rbe --tls_enabled
test:rbe --test_env=USER=anon

# Options to build TensorFlow 1.x or 2.x.
build:v1 --define=tf_api_version=1
build:v2 --define=tf_api_version=2
build:v1 --action_env=TF2_BEHAVIOR=0
build:v2 --action_env=TF2_BEHAVIOR=1
build --config=v2
test --config=v2

# Default options should come above this line

# Options from ./configure
try-import %workspace%/.tf_configure.bazelrc

# Put user-specific options in .bazelrc.user
try-import %workspace%/.bazelrc.user

tf_configure.bazelrc:

build --action_env PYTHON_BIN_PATH="/data/students_home/fschipani/anaconda3/envs/aaa/bin/python"
build --action_env PYTHON_LIB_PATH="/data/students_home/fschipani/anaconda3/envs/aaa/lib/python3.7/site-packages"
build --python_path="/data/students_home/fschipani/anaconda3/envs/aaa/bin/python"
build:xla --define with_xla_support=true
build --config=xla
build --action_env CUDA_TOOLKIT_PATH="/data/students_home/fschipani/tmp/cuda-10.1"
build --action_env CUDNN_INSTALL_PATH="/data/students_home/fschipani/anaconda3/envs/cuda"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="5.2"
build --action_env LD_LIBRARY_PATH="/data/students_home/fschipani/tmp/cuda-10.1/lib64"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/x86_64-linux-gnu-gcc-7"
build --config=cuda
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-no_gpu
test --build_tag_filters=-no_gpu
test --test_env=LD_LIBRARY_PATH
build --action_env TF_CONFIGURE_IOS="0"

environment variable for tf:
export TF_CUDA_PATHS="/data/students_home/fschipani/tmp/cuda-10.1, /data/students_home/fschipani/anaconda3/envs/cuda"

@seanpmorgan
Copy link
Member

Thanks @iskorini for your help troubleshooting this issue. Would you mind trying to build this GPU custom op on the same system setup:
https://github.com/tensorflow/custom-op#bazel-1

That'll greatly help us going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants