Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allow changing the default allocator #1183

Merged
merged 1 commit into from
Jul 29, 2015

Conversation

alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Jun 30, 2015

Add support to the compiler to override the default allocator, allowing a
different allocator to be used by default in Rust programs. Additionally, also
switch the default allocator for dynamic libraries and static libraries to using
the system malloc instead of jemalloc.

rendered

Add support to the compiler to override the default allocator, allowing a
different allocator to be used by default in Rust programs. Additionally, also
switch the default allocator for dynamic libraries and static libraries to using
the system malloc instead of jemalloc.

```rust
extern {
fn __rust_allocate(size: usize, align: usize) -> *mut u8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using magic symbol names instead of annotation-tagged functions a la #[lang_item="foo"] or #[plugin_registrar]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise, this is what everything will boil down to (pre-defined symbols), and this is currently the path of least resistance forward. This is all unstable, however, so we'll definitely be able to change it in the future to perhaps using lang items or more official attributes. The current downside of attributes are:

  • During a compilation, there may actually be two loaded allocators in the crate store (but we won't link one of them), so the compiler would detect duplicate lang items and yield an error. Extra logic would have to be added to "not worry about" the allocator lang items.
  • None of the signatures are currently typechecked, and having an official attribute makes it feel like it should be typechecked.

Basically I'd love to move to using attributes and such, but I don't see much immediate benefit over just defining some symbols in the short-term. I also don't mind adding some words to this effect in the RFC, though, and we could perhaps spec the "ideal implementation" here where the actual implementation just has some TODOs.

My ideal situation would be to have an attribute-per-function which defines the symbol, visibility, and typechecks the signature. We'd then also have a check that an #![allocator] crate contains the necessary functions (tagged with attributes). That's a good deal of attribute-surface-area to start stabilizing right off the bat though.

@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-lang Relevant to the language team, which will review and decide on the RFC. labels Jul 1, 2015
@Ericson2314
Copy link
Contributor

Are any stable interfaces proposed here? Or are we just changing the way the allocator is automatically picked as far as stable rust is concerned? I find it hard to tell.

I like the general goal, but as I said in the other Core, alloc, and log all have a need to use functionality defined elsewhere, and traits won't cut it, so it would be nice to really think through a language-level way to solve this problem once and for all (something like ML functors on the crate level, probably).

If nothing is being stabilized here, great! This is definitely a better situation than what we have currently. If interfaces are being stabilized, than I rather way for a general solution for all three crates.

@Tobba
Copy link

Tobba commented Jul 1, 2015

I'm pretty sure what everyone has wanted in this area for a very long time is trait-based allocator selection a la RFC #39 (which we sadly never got due to some GC-related concerns, and the GC-aware version was such an abomination everyone pretends it never happened). This would allow you to adjust the allocator for not just an entire crate, but for individual objects and in a much cleaner fashion.

allocation functions used by Rust, defined as:

```rust
extern {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must it be C ABI?

I’d rather have something #[lang]-ish here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C ABI is not required, but leaves the door open to allowing external implementations of an allocator in the future (e.g. implementing one in C instead of Rust).

I discussed #[lang] above which may be of interest as well.

@Ericson2314
Copy link
Contributor

I'm more sympathetic to not stabilizing an allocators interface until we have GC, but it seems pretty harmless to implement something like #39 without stabilizing it, and just use it behind std.

@alexcrichton
Copy link
Member Author

@Ericson2314

Are any stable interfaces proposed here?

Currently, no


@Tobba

I'm pretty sure what everyone has wanted in this area for a very long time is trait-based allocator selection a la RFC #39

I see the concept of collection-specific allocators as orthogonal to this RFC, and implementation-wise there basically must be some global symbols which represent the "allocator interface". This RFC is just connecting the dots to allow programs to switch the global allocator, not have a full-blown allocation API (hence the instability of all items proposed here)

@nnethercote
Copy link

From my point of view this all looks quite plausible. Thank you, @alexcrichton.

@alexcrichton alexcrichton self-assigned this Jul 2, 2015

* `alloc_system` is a crate that will be tagged with `#![allocator]` and will
redirect allocation requests to the system allocator.
* `alloc_jemalloc` is another allocator crate that will bundle a static copy of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#![allocator] instead of allocator would be less confusing (I wasn't sure if it was implied that it would not have the tag)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 6, 2015

It might be worth discussing how this RFC solves or improves on the situation described in Reenix: Implementing a Unix-Like Operating System in Rust 3.3 Critical Problem: Allocation.

@alexcrichton
Copy link
Member Author

@gnzlbg this is somewhat orthogonal in the sense that it's not stabilizing an allocator API, nor is it altering the semantics of what to do on a failed allocation. It would only help in terms of switching out which allocator is used by default.

@Ericson2314
Copy link
Contributor

It is possible if an allocator trait is created to only introduce the system allocator (as per this RFC) in std. That would force libcollections to be allocator agnostic :D.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 7, 2015

@gnzlbg this is somewhat orthogonal in the sense that it's not stabilizing an allocator API, nor is it altering the semantics of what to do on a failed allocation.

@alexcrichton would it be possible to modify this API to return Result or Option ?
A particular allocator can still then panic, but this might allow writing a wrapper over the system's allocator that does not panic but just returns None in case allocation failed. Of course this would be the subject of a different RFC, but it would be nice to know if this can be added without too much trouble in the future.

@kornelski
Copy link
Contributor

I'm writing Rust libraries that I expect to be linked statically with both C and Rust programs.

Would there be a way to say "Use malloc if linked with C, and whatever Rust program wants when linked with Rust"?

i.e. my library doesn't care about which allocator is used, but doesn't want to impose any allocator on the client.

@retep998
Copy link
Member

retep998 commented Jul 7, 2015

@pornel malloc might not be the allocator that the C code is using. If your library needs to be compatible with allocations coming from an external location, then it would probably be best to provide your own API to consumers of your library to set allocator callbacks.

@alexcrichton
Copy link
Member Author

@gnzlbg Sure it could possibly use one of those types eventually, but this RFC isn't stabilizing the signatures of these functions currently, just adding infrastructure to swap them out.


@pornel You could manually link to alloc_jemalloc or alloc_system and then toggle between the two with a --cfg, but you probably wouldn't actually need to do anything in practice. To link into C, you need to build a staticlib at some point with the Rust code at which point the system allocator will be linked in. To link into Rust you follow all the normal standard paths and get the default allocator.

@kornelski
Copy link
Contributor

@alexcrichton Great! 👍

@pnkfelix
Copy link
Member

pnkfelix commented Jul 8, 2015

@Tobba

trait-based allocator selection a la RFC #39 (which we sadly never got due to some GC-related concerns, and the GC-aware version was such an abomination everyone pretends it never happened)

I think that is an unfair characterization on multiple levels.

In the second RFC you are referencing (#244), the handling of GC issues certainly had problems, but feeding more type-metadata into a high-level allocator is not an inherently bad idea, IMO.


Anyway, trait-based allocator selection is a distinct issue that we are planning to address independently of this RFC.

Having a high-level / low-level split in the trait definitions may or may not be necessary, but I suspect it will be the only way to actually placate all of the parties involved.

@erickt
Copy link

erickt commented Jul 8, 2015

@Tobba: I suspect you were making a joke, but please keep from describing other people's work in that way.

@nikomatsakis
Copy link
Contributor

👍 from me. I'm still in favor of this plan. It does have this "complex" feeling -- but all the "simple" alternatives seem to have real downsides. That said, I think it is imperative that we be able to link rustc such that it and LLVM use the same allocator. I'm intrigued by your question about whether that is possible -- if it is not possible, why not? What would it take to make it work? If feels like precisely the kind of scenario other people will hit and that we are trying to make seamless, no?

I guess another way to put it is: this unresolved question suggests that there is one rather obvious case we didn't analyze as thoroughly as the others. We know that calling Rust from C makes Rust use the allocator. We know that pure Rust gets to use the builtin jemalloc. But we really want to make sure that C used by Rust will use jemalloc too! And naturally this gets into the static/dynamic linking question, and (for dynamic linking in particular) the differences betweeen platforms, right?

It feels like there ought to be some obvious precedent to follow here! Why don't other big C frameworks have this sort of problem? I guess nobody is in quite our position of wanting to simultaneously function as main and as a callee, and do the best thing in both cases?

@cuviper
Copy link
Member

cuviper commented Jul 8, 2015

@nikomatsakis I expect the more general approach is to just use malloc/free, and let the executable link an unprefixed jemalloc implementation if desired, or let the user set LD_PRELOAD=libjemalloc.so.

Of course, you don't get any advanced jemalloc functionality this way, unless perhaps you create weak fallbacks for those extra functions.

@alexcrichton
Copy link
Member Author

That said, I think it is imperative that we be able to link rustc such that it and LLVM use the same allocator. I'm intrigued by your question about whether that is possible -- if it is not possible, why not?

Ah I should clarify in that I'm not sure how to do this on all platforms. On linux I believe if we just don't prefix jemalloc then "everything should work out", but I'm less certain how to override the system allocator on OSX and Windows. I think we can coerce the system allocator on OSX to be overridden (and jemalloc may already do this), but I haven't tested any of these use cases.

But we really want to make sure that C used by Rust will use jemalloc too!

I agree! This is a very good point. I think one of the problems here is that it's a very platform-specific issue. For example on many unixes you can just use LD_PRELOAD to load in something or perhaps even just override the default allocator via malloc and free. On OSX and Windows, however, I'm less sure that it's possible to do this in a robust fashion.

Otherwise some C library provide the ability to define an allocator (e.g. via a virtual function call), but that's definitely a library-specific concern.

@nikomatsakis
Copy link
Contributor

Right. The goal of the current design was to give us the full advantage of jemalloc when rust was in charge, and fallback to system allocator otherwise. 

Niko

-------- Original message --------
From: Josh Stone notifications@github.com
Date:07/08/2015 18:13 (GMT-05:00)
To: rust-lang/rfcs rfcs@noreply.github.com
Cc: Niko Matsakis niko@alum.mit.edu
Subject: Re: [rfcs] RFC: Allow changing the default allocator (#1183)
@nikomatsakis I expect the more general approach is to just use malloc/free, and let the executable link an unprefixed jemalloc implementation if desired, or let the user set LD_PRELOAD=libjemalloc.so.

Of course, you don't get any advanced jemalloc functionality this way, unless perhaps you create weak fallbacks for those extra functions.


Reply to this email directly or view it on GitHub.

funnel Rust allocations to the same source as the host application's allocations
then a crate can be written and linked in.

Finally, providers of allocators will simply provide a crate to do so, and then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add text to this section (either in this paragraph or in a separate one) spelling out how a client who wants to provide a wrapper around Rust's default allocator (or otherwise instrument it) would do so?

This use case was alluded to, at the end of the motivation section, but I am not 100% clear on how arduous the process will be, in particular whether one will be confident that the allocator one is injecting is truly a wrapper around the allocator that Rust would have selected otherwise (that is, without the injection)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if the answer is "It is indeed a bit arduous to write such a wrapper robustly, e.g. involving cfg switches to select properly between alloc_system and alloc_jemalloc in the alloc crate one is injecting, that is acceptable. I just want to know up front if that is the expectation.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(its also possible that the answer involves somehow observing the values of lib_allocation_crate and exe_allocation_crate during the compilation of the crate I want to inject, and just assume they will stay the same at the time of the final link where I am being injected? Still wondering out loud; probably should just wait for @alexcrichton to answer...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this RFC doesn't currently easily allow this sort of instrumentation to happen. If we wanted to support this right out of the gate, this RFC would necessitate four crates:

  • Two crates for implementing the allocation API, but not tagged with #![allocator]. There'd be one crate for jemalloc and one for the system.
  • Two crates for linking to the previous crates, but are tagged with #![allocator] and redirect the formal allocation API into the desired crate.

In a nutshell, if you want to write an allocator which can be instrumented, or shimmed then you need to write a crate which is not tagged #![allocator] but probably still exposes the allocation API via normal Rust functions. The provider of the allocator would then write their own shims that redirect to the allocator desired after the instrumentation has happened.

Does that make sense? If so I'll add some words.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I missed this response back when it was written.

I guess I would have liked for some more concrete details in the RFC regarding use cases like this, i.e. spelling out what the steps are for the expected uses of this RFC, and then also including little sketches like the one in your comment for unexpected use cases.

Anyway I plan to have a shot at playing around with the PR rust-lang/rust#27400 since I am finding myself needing to do some allocation debugging. Perhaps it will inspire me to write an amendment for the RFC with such notes.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Aug 14, 2015
This commit is an implementation of [RFC 1183][rfc] which allows swapping out
the default allocator on nightly Rust. No new stable surface area should be
added as a part of this commit.

[rfc]: rust-lang/rfcs#1183

Two new attributes have been added to the compiler:

* `#![needs_allocator]` - this is used by liballoc (and likely only liballoc) to
  indicate that it requires an allocator crate to be in scope.
* `#![allocator]` - this is a indicator that the crate is an allocator which can
  satisfy the `needs_allocator` attribute above.

The ABI of the allocator crate is defined to be a set of symbols that implement
the standard Rust allocation/deallocation functions. The symbols are not
currently checked for exhaustiveness or typechecked. There are also a number of
restrictions on these crates:

* An allocator crate cannot transitively depend on a crate that is flagged as
  needing an allocator (e.g. allocator crates can't depend on liballoc).
* There can only be one explicitly linked allocator in a final image.
* If no allocator is explicitly requested one will be injected on behalf of the
  compiler. Binaries and Rust dylibs will use jemalloc by default where
  available and staticlibs/other dylibs will use the system allocator by
  default.

Two allocators are provided by the distribution by default, `alloc_system` and
`alloc_jemalloc` which operate as advertised.

Closes rust-lang#27389
bors added a commit to rust-lang/rust that referenced this pull request Aug 14, 2015
This commit is an implementation of [RFC 1183][rfc] which allows swapping out
the default allocator on nightly Rust. No new stable surface area should be
added as a part of this commit.

[rfc]: rust-lang/rfcs#1183

Two new attributes have been added to the compiler:

* `#![needs_allocator]` - this is used by liballoc (and likely only liballoc) to
  indicate that it requires an allocator crate to be in scope.
* `#![allocator]` - this is a indicator that the crate is an allocator which can
  satisfy the `needs_allocator` attribute above.

The ABI of the allocator crate is defined to be a set of symbols that implement
the standard Rust allocation/deallocation functions. The symbols are not
currently checked for exhaustiveness or typechecked. There are also a number of
restrictions on these crates:

* An allocator crate cannot transitively depend on a crate that is flagged as
  needing an allocator (e.g. allocator crates can't depend on liballoc).
* There can only be one explicitly linked allocator in a final image.
* If no allocator is explicitly requested one will be injected on behalf of the
  compiler. Binaries and Rust dylibs will use jemalloc by default where
  available and staticlibs/other dylibs will use the system allocator by
  default.

Two allocators are provided by the distribution by default, `alloc_system` and
`alloc_jemalloc` which operate as advertised.

Closes #27389
@froydnj
Copy link

froydnj commented Apr 15, 2016

It would be splendid if the RFC described the semantics of the various __rust_* functions that an allocator crate must implement. While the functions straightforwardly map onto malloc et al, the failure modes could be quite different. For instance, does __rust_allocate panic on failure to allocate, or does it simply return a null pointer? Can any of these functions be called with a zero size? What does __rust_reallocate do around the edge cases of realloc (see this comment in Firefox, for instance)?

Some of these things can be derived from exploring the built-in crates of Rust, but it'd be much nicer for people who have to implement custom allocators to have the function semantics written down somewhere.

@alexcrichton
Copy link
Member Author

@froydnj this RFC actually intentionally left out the specifications for each symbol (because they're all unstable), and the exact semantics/requirements may change over time (depending on how allocators shake out). So in that sense I don't believe these have been highly scrutinized in terms of solidifying what the semantics should be vs what they do now. Essentially the only "stable implementations" of a custom allocator are alloc_jemalloc and alloc_system as they're what we maintain.

You can learn more about what we currently require, however, from reading the heap.rs documentation for each wrapper function. Does that help out for now?

@froydnj
Copy link

froydnj commented Apr 15, 2016

@alexcrichton thanks for the explanation! It seems quite odd to introduce an interface that's stable (that's my understanding of the Rust RFC process, anyway), but then to not define interface semantics because the interface is subject to change over time. I see after a more careful reading that the RFC does call this out, though. I guess at some point these interfaces will be stabilized and then their API will be documented?

The heap.rs comments are helpful, thanks for pointing them out!

@Ericson2314
Copy link
Contributor

It's not stable.

@sfackler
Copy link
Member

The acceptance of an RFC is only the first step on the road to stability. The implementation of an RFC will almost always land unstable, and can still change after that point, until it is formally stabilized.

@alexcrichton
Copy link
Member Author

@froydnj yeah as mentioned by @Ericson2314 and @sfackler most of this RFC isn't actually stable. The only stable feature is that dylibs/staticlibs use the system allocator whereas executables use jemalloc. Beyond that everything is unstable and feature gated.

Now that being said, if you guys need any help about clarifications of the current implementation or find it falls short, please let me know as I'd love to help out or help tweak the design :)

@froydnj
Copy link

froydnj commented Apr 18, 2016

Thanks for the clarifications! I feel enlightened. :)

@Centril Centril added A-allocation Proposals relating to allocation. A-attributes Proposals relating to attributes labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocation Proposals relating to allocation. A-attributes Proposals relating to attributes final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.