Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on linker to use in runtime builds #84794

Closed
sbomer opened this issue Apr 13, 2023 · 24 comments
Closed

Decide on linker to use in runtime builds #84794

sbomer opened this issue Apr 13, 2023 · 24 comments

Comments

@sbomer
Copy link
Member

sbomer commented Apr 13, 2023

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

  • Use lld as we do with Build on CBL-mariner host with rootfs #84148.
  • Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot. (edit: NativeAot will detect and used the right prefixed linker if available).

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 13, 2023
@ghost
Copy link

ghost commented Apr 13, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

  • Use lld as we do with Build on CBL-mariner host with rootfs #84148.
  • Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

Author: sbomer
Assignees: -
Labels:

untriaged, area-NativeAOT-coreclr

Milestone: -

@agocke
Copy link
Member

agocke commented Apr 13, 2023

I thought there was another consideration: we need a relatively recent version of lld to produce good results. How many distros ship with a recent-enough lld for us to use it as the default?

@sbomer
Copy link
Member Author

sbomer commented Apr 13, 2023

We need at least lld 13 to use @am11's fix that improves output size: #84493. But I am not suggesting to change the default for customers - that would stay bfd either way. We will be able to use @am11's fix in runtime builds as soon as we update to llvm 16 (which I'm working on).

@ghost
Copy link

ghost commented Apr 13, 2023

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

  • Use lld as we do with Build on CBL-mariner host with rootfs #84148.
  • Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

Author: sbomer
Assignees: -
Labels:

area-Infrastructure, untriaged, area-NativeAOT-coreclr

Milestone: -

@am11
Copy link
Member

am11 commented Apr 14, 2023

Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

We recommend clang compiler in docs, we build NativeAOT runtime with llvm-libunwind, we use llvm-based objwriter to create the linkable objects, so lets make lld as a default and document it as a recommended linker for PublishAot for linux. This will be aligned with other aspects and understandable. The upside is that we will be testing with the same component in CI which we recommend for users in docs.

So lets apply the patch, turn default to lld and cleanup LinkerFlavor=lld sprinkled in various places. That will hopefully resolve this matter satisfying everyone.

Alternatives are making less sense to me:

We have also been wanting to move away from binutils as a dependency.

Why is this? binutils is used since .NET Core 1.0, there is no issue with licensing in using binutils to build any software; proprietary or otherwise.

We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

This is not correct. The "toolchain" does this job. If configured properly, we don't need to manually specify prefixes for cross-compilation.

How many distros ship with a recent-enough lld for us to use it as the default?

You will find it pretty hard to discover a distro which is not using bfd as their default linker, let alone the versioning concerns. bfd, aka GNU linker, has decades of more experience than other linkers, still actively maintained, choice of top distros and beats other linkers in binary-size benchmarks.

@am11 I hope that's a fair summary - please add to this if I missed anything.

  • we can no longer build runtime repo with binutils alone.

i.e. LinkerFlavor=lld if target==linux constructs added are missing and Compiler!=gcc condition; we set Compiler at the start of build:

arguments="$arguments /p:Compiler=$compiler /p:CppCompilerAndLinker=$compiler"
(maps to ./build.sh -gcc)

@MichalStrehovsky
Copy link
Member

I don't have a strong opinion on this. We're already in a situation where we can't test E2E the thing that will be produced by the toolchain because the version of ld is out of our control. The solution to that could be to just ship lld in the ILCompiler package. But that's a can of worms since constructing the command line to invoke the linker is per-distro bespoke magic and it's also the reason why we use clang as the "linker" instead of invoking ld directly.

Ubuntu 18 comes with lld that is ancient. Ubuntu 22 has a good one. I don't have a good sense of the landscape. If we tell people to use lld but the version we need is hard to obtain, that would be a problem.

I also don't like testing with a completely different linker codebase (bfd vs lld) than what we use on customer machines. We did run into linker issues in the past. At least using the same codebase (even though a different version) would keep things more sane.

@kg
Copy link
Contributor

kg commented Apr 15, 2023

This has broken builds on Debian for me.

ILCompiler -> /home/kate/Projects/dotnet-runtime-wasm/artifacts/bin/coreclr/linux.x64.Release/ilc/ilc.dll
  Generating native code
clang : error : invalid linker name in argument '-fuse-ld=lld' [/home/kate/Projects/dotnet-runtime-wasm/src/coreclr/tools/aot/ILCompiler/ILCompiler.csproj]
/home/kate/.nuget/packages/microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/build/Microsoft.NETCore.Native.targets(335,5): error MSB3073: The command ""/usr/bin/clang-11" "/home/kate/Projects/dotnet-runtime-wasm/artifacts/obj/coreclr/ILCompiler/x64/Release/native/ilc.o" -o "/home/kate/Projects/dotnet-runtime-wasm/artifacts/bin/coreclr/linux.x64.Release/ilc/native/ilc" -fuse-ld=lld /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libbootstrapper.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libRuntime.ServerGC.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libeventpipe-disabled.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libstdc++compat.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libnumasupportdynamic.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Globalization.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.IO.Compression.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Net.Security.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Security.Cryptography.Native.OpenSsl.a -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -ldl -lz -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--discard-all" exited with code 1. [/home/kate/Projects/dotnet-runtime-wasm/src/coreclr/tools/aot/ILCompiler/ILCompiler.csproj]

I do have llvm, clang, and lld in my /usr/bin, but they have a version number on the end. I tried helping the build process out by using ln -s to put a 'lld' in my PATH pointing at the right one, but that didn't fix it. What do I do?

@am11
Copy link
Member

am11 commented Apr 15, 2023

@kg, this is the situation when we have different versions of clang and lld installed. e.g. if I have clang-16, clang-14 and lld-14, installed, then the following things happen:

  1. ./build.sh via init-compiler will select clang-16 (because it is the latest version)
  2. init-compiler will introspect if fuse-ld=lld and based on the result, will select the linker
  3. then changes in Build on CBL-mariner host with rootfs #84148 will "force" fuse-ld=lld unconditionally, disregarding #2's introspection based linker selection, and fail the build because we didn't install lld-16

This is exactly how

  • we can no longer build runtime repo with binutils alone.

community-driven scenarios are also affected..

Workaround is to apt install lld-<version of clang which build is picking up>

@kg
Copy link
Contributor

kg commented Apr 15, 2023

Workaround is to apt install lld-<version of clang which build is picking up>

Thanks, that indeed worked. I did
/usr/bin/clang -v
which output Debian clang version 11.0.1-2, so sudo apt-get install lld-11 caused the build to succeed.

@tmds
Copy link
Member

tmds commented Apr 17, 2023

cc @omajid

@sbomer
Copy link
Member Author

sbomer commented Apr 17, 2023

We have also been wanting to move away from binutils as a dependency.

Why is this?

I think mainly for the same reasoning as with lld lld - to simplify our official (cross) builds and avoid having to detect or install prefixed versions of objcopy and other tools. But I agree with your feedback that we should continue supporting binutils-based builds, at least for native builds.

We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

This is not correct. The "toolchain" does this job. If configured properly, we don't need to manually specify prefixes for cross-compilation.

Thanks for the correction - fixed above. I got my wires crossed when I was thinking about the gcc support mentioned in #78559.

we can no longer build runtime repo with binutils alone.

Filed #84934 to track this.

Regarding runtime linker defaults: it sounds like we are all on board with using lld as the default for runtime builds, as long as it's optional.

Regarding customer NativeAot linker defaults: I don't have a strong opinion on what the default should be, but if we consider both of them to be supported, we should have testing in place for both, at least to cover basic scenarios, even if most of our testing uses the default.

@agocke
Copy link
Member

agocke commented Apr 20, 2023

Regarding runtime linker defaults: it sounds like we are all on board with using lld as the default for runtime builds, as long as it's optional.

I'm not quite sure what this means. I think lld should be the default for CI specifically, but I think ld should be the default in dev builds, same as the Native AOT default.

@sbomer
Copy link
Member Author

sbomer commented Apr 20, 2023

I was thinking that the defaults used in ci should match those for the local dev scenario, but if there's a desire for the local dev scenario to use ld, that's certainly an option. My read of the discussion here was that we were actually leaning towards changing the NativeAOT defaults to use lld in the customer scenario too.

@am11
Copy link
Member

am11 commented Apr 20, 2023

My read of the discussion here was that we were actually leaning towards changing the NativeAOT defaults to use lld in the customer scenario too.

Yup, or more generally, the linker the CI legs are frequently using to test all NativeAOT tests is the linker we should recommend in the docs. In case it wasn't obvious, LinkerFlavor supports all four linkers which clang's/gcc' fuse-ld option recognize: bfd, gold, lld and mold -- and (empirically speaking) they are normally favored in that order by linux devs; gold > lld. Also, fuse-ld=lld is supported by clang7+ and gcc9+, which is old enough to be a viable candidate for default.

@agocke
Copy link
Member

agocke commented Apr 20, 2023

I think we could credibly test both linkers. It may not be in PR, but it probably is not a big deal to spin up an ld-specific run of the test suite with Ubuntu-latest-LTS.

It seems like there's two separate considerations -- for our official build we want to keep the dependencies as tight as possible, and using LLVM for everything makes sense.

But unlike with coreclr, the vast majority of linking is going to happen on user machines, so it's not particularly relevant what linker the official build machines use.

In other words, if we think that bfd is a better experience for customers, I don't think it's bad to pick that and use it in most of our testing.

@am11
Copy link
Member

am11 commented Apr 20, 2023

But unlike with coreclr, the vast majority of linking is going to happen on user machines, so it's not particularly relevant what linker the official build machines use.

Yes, no confusion here that this issue is about which linker is used to exercise the tests in CI; it is not about which linker is used to build official coreclr release.

The linker we use in CI is the linker we should recommend for user. It is a "singular" linker, because we normally do not force non-default NativeAOT options in tests. There are number of options which we don't test and they are left as experimental / community-supported; e.g. static lib/exe, no-pie, static-pie etc. The red flag is our usage of LinkerFlavor outside src/nativeaot/BuildIntegration directory introduced in #84148 which is a clear deviation from what we set as default.

In other words, if we think that bfd is a better experience for customers, I don't think it's bad to pick that and use it in most of our testing.

Makes sense. Alternatively, we can set lld as default in BuildIntegration for linux, update Microsoft docs to include lld in apt-get install clang ... line, cleanup <LinkerFlavor>lld</LinkerFlavor> instances from rest of the repo and continue testing with lld. Personally, alternative is making more sense to me because:

  1. we are using other llvm components in NativeAOT: objwriter at build-time , llvm-objcopy (preferred with clang) at publish-time, llvm-libunwind at run-time and clang is used to compile NativeAOT Runtime as well as recommended linker-driver in docs.
  2. cross-PublishAot is currently only supported with clang.
  3. since we compile NativeAOT Runtime with clang, using bfd has potential to run into this issue on some platforms: Fix redhat arm64 #52244 (the sole reason we switched coreclr to build with lld which may or may not be applicable to NativeAOT.. there is no easy way to tell with 100% certainty)

@agocke
Copy link
Member

agocke commented Apr 21, 2023

@jkotas @MichalStrehovsky @sbomer How do we feel about changing the default linker to lld?

Presumably we would still use bfd if lld is not installed, but this would become the default.

@am11
Copy link
Member

am11 commented Apr 21, 2023

Presumably we would still use ld

nit: its name is bfd because ld can be anything; ld.lld gold.ld etc.

@jkotas
Copy link
Member

jkotas commented Apr 21, 2023

How do we feel about changing the default linker to lld?

Fine with me.

@MichalStrehovsky
Copy link
Member

How do we feel about changing the default linker to lld?

Sounds good. Only one question:

Presumably we would still use bfd if lld is not installed, but this would become the default.

Is the proposal that we would also check if lld is recent enough and consider it missing if it e.g. doesn't support the command line arguments we want to specify? I want to avoid situations when someone is e.g. on Ubuntu 18, installs lld, and gets worse off than if we used bfd (I might be wrong about this one - I don't know how old distros do we have to care about but do know that lld only became "good" "recently").

@am11
Copy link
Member

am11 commented Apr 24, 2023

Is the proposal that we would also check if lld is recent enough and consider it missing if it e.g. doesn't support the command line arguments we want to specify?

If lld is available, we support it. We already have a version check whether or not use a linker script needed by lld-13+. The rest of the arguments are supported by both old and new versions. We just need a condition (like we have for llvm-objcopy) i.e.; if lld is available on linux, use it otherwise fallback to bfd.

@MichalStrehovsky
Copy link
Member

What I meant is that lld is doing worse than bfd size-wise in general, but if we can't do linker script, it's significantly worse and users would be better off using bfd. I'm okay with the regression with recent lld's, but it's more than 5% for older llds and that makes me raise an eyebrow (correct me if I'm wrong).

@am11
Copy link
Member

am11 commented Apr 24, 2023

The size difference with rid: linux-arm64, template: api, configuration: release, stripped: true, executable: true, static-executable: false are mixed (v10 is better than v11):

clang/lld version published binary size comments
7 8.8M (9183120) first version to support fuse-ld=lld
8 8.8M (9183120)
9 8.8M (9190264)
10 8.7M (9058800)
11 8.7M (9060144)
12 8.7M (9060152)
13 8.7M (9060152)
14 8.7M (9058552)
15 8.7M (9058552)

I think we can keep it simple. The standing logic is that the newer toolchain brings about better optimizers and analyzers, and older libc is best to increase portability. This combination renders best results on any build machine. (for static executables, old libc is not necessary)

@sbomer
Copy link
Member Author

sbomer commented May 16, 2023

Closing this since we decided that:

@sbomer sbomer closed this as completed May 16, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 16, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

7 participants