Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Symbol Exports #2136

Closed
4 of 6 tasks
benjaminwinger opened this issue Oct 2, 2023 · 15 comments
Closed
4 of 6 tasks

Explicit Symbol Exports #2136

benjaminwinger opened this issue Oct 2, 2023 · 15 comments
Assignees
Labels
compiler Issues related to building from source files, clang-tidy, and other compiler tech

Comments

@benjaminwinger
Copy link
Collaborator

benjaminwinger commented Oct 2, 2023

In #2118 we switched from exporting all symbols on Windows to explicit export of symbols to avoid hitting the limit in a single windows dll. This change would have a number of benefits on other platforms too (smaller dynamic libraries, faster library loading, no accidental use of private APIs), and needs to be more rigorously tested.

TODO:

  • Make default visibility of functions hidden (by making these statements unconditional)
  • Remove re-exported static dependency symbols from the dynamic libraries on Linux (probably macOS too since I think it's related to ELF). Maybe there are better solutions, but what I could find before was using linker version scripts to filter the public symbols. They support a wildcard syntax which can probably be used to avoid explicitly listing all the symbols. I don't know how it interacts with C++ name mangling, but maybe we could just allow a variation of kuzu* to filter everything not in the kuzu namespace (or C functions prefixed with kuzu_).
  • Get the API tests to link against the dynamic libraries with the restricted symbols so that we can properly test the exported APIs (the main issue is that the testing helper classes use private APIs, so we either need to remove those, or maybe set up an internal static library for those helpers that they can link with alongside the dynamic library).
  • Produce the single file header as part of the cmake build (maybe we could set up a cmake install command at the same time).
    • Get the API tests to use the single file header.
    • Test linking the rust API against the dynamic library and single file header.
@Riolku
Copy link
Contributor

Riolku commented Oct 2, 2023

From LD-Version-Scripts

If you target platforms that do not support linker scripts (i.e., all platforms that doesn’t use GNU LD)
you may want to consider a more portable but less powerful alternative: libtool -export-symbols.
It will hide internal symbols from your library, but will not add ELF versioning symbols. 

We should investigate if the more portable solution is possible. Otherwise, we can't build on Alpine linux, and potentially not on MacOS(?).

@Riolku
Copy link
Contributor

Riolku commented Oct 2, 2023

It seems the portable solution is a file that lists all symbols to export. I am in favor of this solution, because it's very explicit and we have to explicitly specify when we want to export new symbols (and we don't do this very often, afaik).

@benjaminwinger
Copy link
Collaborator Author

specify when we want to export new symbols (and we don't do this very often, afaik).

With a growing and unstable API, we might do it a little more frequently than is convenient, but as long as there's a relatively easy way to get the symbol names I don't think it should be a huge issue (at the moment all I can think of is grepping nm libkuzu.a for the function names you want). But it's not like we aren't already explicitly annotating the functions.

There's also the --exclude-libs option (see the ld man page), which I forgot to include above, but that again will only work with some linkers.

And another interesting option I just saw is extracting the object files from the static libraries (see https://stackoverflow.com/a/2251477) and adding them directly, which will make them be affected by -fvisibility=hidden. That might work more easily across platforms.

@Riolku
Copy link
Contributor

Riolku commented Oct 2, 2023

Interesting. I wonder how easy the last thing is to integrate into cmake.

@Riolku
Copy link
Contributor

Riolku commented Oct 3, 2023

Updates from today:

  • Global hidden symbols (so, anything in our objects or third party libraries that are not explicitly exported) are converted to local symbols when linking. This means they are accessible within the same link step, but invisible after we link the symbols into the shared library. Furthermore, this means we can strip them at install time.
  • We can use CMake install scripts for stripping release binaries and libraries, which will reduce our size, which is probably desirable given our aim to be an embedded database.
  • The reason a stripped binary still contains many Arrow symbols is because Arrow is compiled without fvisibility=hidden, but it doesn't seem like there's an easy way to change that, so we will ignore the problem, since we will soon be removing arrow for good.
  • This is not a concern on windows, since windows uses hidden visibility (or, the equivalent), by default.

As for the mentioned issue about building precompiled headers & library in CMake, I don't see anything easy to do that, though we could integrate our existing script into a release build. Either way, it should definitely make it into CI.

Tasks:

  • Integrate precompiled file and API tests into CI.
  • Use an install target to strip binaries when publishing releases.
  • Enable visibility=hidden on all platforms.

@benjaminwinger
Copy link
Collaborator Author

I think for the existing API tests we'll probably want to more or less re-write the DBTest class to use only public APIs (or otherwise re-write the API tests to simplify them).

There are too many private APIs being used, and they are too closely related to the public ones. E.g. if it was just stuff like StringUtils, we could create a separate static StringUtils library, which is relatively unrelated to the APIs used in the API tests and shouldn't give any linking issues or false positives for API coverage, but the test runner, for example, uses Connection::prepareNoLock, which I don't think we can really expose while also properly testing that the dlls provide the public Connection functions.
For the most part, I think the private APIs are just being used for extra verbosity in how things are failing, particularly when testing queries, but the API tests don't really need that sort of information as they aren't meant to test the correctness of the underlying database engine, and other tests should cover the data and queries used in the API tests.

@Riolku
Copy link
Contributor

Riolku commented Oct 4, 2023

Sounds good to me. It'd be good to have someone look at that class (I think it's been a while since anyone has). I'm going to prepare a PR for stripping symbols and using CMake install.

@benjaminwinger
Copy link
Collaborator Author

This was causing ASAN failures on macOS (see #2204).

@Riolku
Copy link
Contributor

Riolku commented Oct 12, 2023

What?? How? Can we find a solution that's better than "disable explicit exports"?

@benjaminwinger
Copy link
Collaborator Author

I don't really understand why it's occurring, so I'm not really sure what we can do beyond disabling it on macOS at the moment.

Interestingly, linking against the shared library is also meaning that spdlog is not producing output in the API tests (when the calls are from within the kuzu code in the shared library; using spdlog in the test code works as expected).

Riolku added a commit that referenced this issue Nov 11, 2023
Exception information necessary for properly catching exceptions, seems
to not be exported on Mac unless we export the exception, which seems to
be different from Linux & Windows.

In additional, Mac issues a warning when building the Python module
(which itself builds as a shared library). The issue is that the python
module links with our static library, and the static library does not
export symbols (since it doesn't do anything for static libraries).

However, this mismatch in visibility causes a warning. Technically, we
shouldn't build a shared library with a static library since the static
library isn't position-independent code, but we don't really have other
options unless we embed all of the object files?

My current solution is to keep the visibility annotations in the static
library, which seems to work fine and avoids the warning. However, we
should probably find some other solution. Embedding all of the object
files isn't super appealing from a design perspective but seems the
easiest. It'd be great to embed the shared library.. but that would
require some changes to how we package for python, and I haven't looked
into it.

Ref #2136.
Riolku added a commit that referenced this issue Nov 13, 2023
Exception information necessary for properly catching exceptions, seems
to not be exported on Mac unless we export the exception, which seems to
be different from Linux & Windows.

In additional, Mac issues a warning when building the Python module
(which itself builds as a shared library). The issue is that the python
module links with our static library, and the static library does not
export symbols (since it doesn't do anything for static libraries).

However, this mismatch in visibility causes a warning. Technically, we
shouldn't build a shared library with a static library since the static
library isn't position-independent code, but we don't really have other
options unless we embed all of the object files?

We should probably link the python library against `libkuzu.so` and ship
it alongside the python bindings. However, this will be delayed until
after the release.

Ref #2136.
Riolku added a commit that referenced this issue Nov 13, 2023
Exception information necessary for properly catching exceptions, seems
to not be exported on Mac unless we export the exception, which seems to
be different from Linux & Windows.

In additional, Mac issues a warning when building the Python module
(which itself builds as a shared library). The issue is that the python
module links with our static library, and the static library does not
export symbols (since it doesn't do anything for static libraries).

However, this mismatch in visibility causes a warning. Technically, we
shouldn't build a shared library with a static library since the static
library isn't position-independent code, but we don't really have other
options unless we embed all of the object files?

We should probably link the python library against `libkuzu.so` and ship
it alongside the python bindings. However, this will be delayed until
after the release.

Ref #2136.
Riolku added a commit that referenced this issue Nov 13, 2023
Exception information necessary for properly catching exceptions, seems
to not be exported on Mac unless we export the exception, which seems to
be different from Linux & Windows.

In additional, Mac issues a warning when building the Python module
(which itself builds as a shared library). The issue is that the python
module links with our static library, and the static library does not
export symbols (since it doesn't do anything for static libraries).

However, this mismatch in visibility causes a warning. Technically, we
shouldn't build a shared library with a static library since the static
library isn't position-independent code, but we don't really have other
options unless we embed all of the object files?

We should probably link the python library against `libkuzu.so` and ship
it alongside the python bindings. However, this will be delayed until
after the release.

Ref #2136.
@benjaminwinger
Copy link
Collaborator Author

It should be feasible to remove main/kuzu.h and generate the single file header exclusively from files that use KUZU_API. That may omit things if there are headers which are supposed to be public APIs which are entirely inline and don't need KUZU_API for symbol exports, but we could work around that issue (e.g. if we're just searching for headers which mention KUZU_API anywhere in the file's text, then including it in a comment would be sufficient).

We'd also discussed at one point renaming KUZU_API to KUZU_EXPORT, which is a little closer to its actual behaviour of declaring functions whose symbols need to be exported (and matches the pattern produced by CMake's GenerateExportHeader).

@Riolku
Copy link
Contributor

Riolku commented Dec 22, 2023

I think overall it would be better to be more diligent about what our API actually is instead of hacking it into our C++ codebase. Our C++ API should, ideally, be more similar to our C API instead. This should be much more stable and lightweight.

@benjaminwinger
Copy link
Collaborator Author

That's true, but on the other hand, kuzu.hpp is already ~6000 lines of code. Even if we were to hand-write it and omit as much internal stuff as possible, it's still probably going to be much larger than the C API header (~1000 lines of code), so we might still want to have a way to collect the files together. And there are also headers which can probably be shared between internal code and the API, like a lot of the stuff in common/types which are just type definitions, operators and getter functions.

@Riolku
Copy link
Contributor

Riolku commented Dec 22, 2023

I agree that a lot of code should be shared.

The current generation is opaque. When a symbol is missed, there isnt a clear way to solve it, and it's easy to forget to include symbols, as we saw.

We also have a lot of inline code that is in the header and exported that really need not be. It's just not a good design.

@semihsalihoglu-uw semihsalihoglu-uw added the compiler Issues related to building from source files, clang-tidy, and other compiler tech label Jan 8, 2024
@benjaminwinger
Copy link
Collaborator Author

Closing as this has been more or less finished.
One outstanding part I created a new issue for (#3632) and, the other (second bullet point in the original description), I'm not sure if it was fully completed, but the vast majority of unnecessary third-party symbols have been removed (it was mostly arrow I think), and the release build is now much smaller and no longer seems to be full of third party symbols, so even if there are more that can be removed I don't think it's particularly important at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler Issues related to building from source files, clang-tidy, and other compiler tech
Projects
None yet
Development

No branches or pull requests

3 participants