Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental recompilation #2369

Closed
catamorphism opened this issue May 8, 2012 · 25 comments
Closed

Incremental recompilation #2369

catamorphism opened this issue May 8, 2012 · 25 comments
Assignees
Labels
A-incr-comp Area: Incremental compilation A-linkage Area: linking into static, shared libraries and binaries E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority

Comments

@catamorphism
Copy link
Contributor

I want to be able to recompile only dependencies (rather than everything in a crate), like SML's compilation manager or GHC's --make mode. I don't really care about the approach so long as it works and the outcome is that adding one #debug call in one file doesn't trigger recompilation of 100 other files in the same crate :-) I am volunteering to work on this after 0.3, but suggestions are welcome. Patrick suggested a good place to start would be to generate a (visualizable) graph of item dependencies, which makes sense to me.

@ghost ghost assigned catamorphism May 8, 2012
@nikomatsakis
Copy link
Contributor

there is the code in trans that tries to determine dependencies for the purposes of metadata export. this seems like a reasonable starting point for such a graph.

@Dretch
Copy link
Contributor

Dretch commented May 9, 2012

I think it would be nice if this was done in such a way that incremental compilation appears to the programmer no differently than full-compilation. For example I mean that:

  1. Partially-compiled files, like object files, should be kept in a hidden (say .build) directory that is not normally seen by the programmer.
  2. Manually deleting the hidden files should never be required - e.g. right now I sometimes have to delete .cargo when I update rustc from git, but I think that the incremental compiler should detect when rustc/libstd/core have changed and automatically delete the now-invalid .build files.

@catamorphism
Copy link
Contributor Author

@Dretch -- I totally agree; the only thing the programmer should notice is that recompilation will usually be much faster :-) (1) and (2) are great goals to have written down; if we don't achieve those (especially (2)), I won't consider this issue to be completed.

@graydon
Copy link
Contributor

graydon commented May 15, 2012

If we do this at the level of "caching bitcode for distinguished subtreees of a crate", it's probably not too hard (though I'm not sure it'll make things a lot faster). If we do this at the level of "trying to cache target-machine object files", things get a fair bit weirder. We (or, well, I) chose the existing compilation model for crates based on the need to "eventually" do cross-module optimization (in particular, single-instantiation of monomorphic code and inlining). Crates were the optimization boundary, the compilation-unit boundary.

We have subsequently grown cross-crate inlining. So this distinction is a bit less meaningful, at least in terms of optimization. There is still a meaningful (in the sense of "not easy to eliminate or blur") linkage boundary at work in two terms: monomorphization instantiations and version-insensitivity (the "anything called 1.0 with the same types is the same crate" rule discussed last week).

Overall, this is one of a few bugs pointing in a similar direction: #2176, #1980, #2238, #558, #2166, #456 and even #552 to some extent.

I am not saying these are all wrong. They are all pointing to similar sets of semantic weaknesses .. or "surprises" .. in the existing compilation model. I would like to have a conversation at some point (probably in public, or videoconf, or both) where we approach this problem as a design problem, and try to work out a new set of agreeable principles and plan-of-action for future work that spans the whole set of related bugs. I do want to fix them, but do not want to go much further down this road without having a map of where we're going.

As an example: it could be that we wind up treating all inter-module functionality uniformly via a multiple dimensions of a single kind of link item, with inter-link versioning (either "symbolic" or "by content") managed orthogonally from the nested-source, recycled-bitcode, static-library or dynamic-library linkage format. Being able to vary these independently -- and even switch between them depending on selected configuration -- might make a lot more sense than endlessly patching up the increasingly-vague "crate" concept. It might be past its expiration date.

@catamorphism
Copy link
Contributor Author

Yes, don't worry, I won't jjump into this without some serious design discussions.

@bblum
Copy link
Contributor

bblum commented Aug 20, 2013

issue appears to be properly classified

@catamorphism
Copy link
Contributor Author

High, not 1.0

@thestinger
Copy link
Contributor

Do we still want to attempt this, or is splitting into crates viewed as enough now that we have static linking? I guess the missing feature would be combining multiple static libraries into a dynamic library.

@catamorphism catamorphism removed their assignment Jun 16, 2014
@errordeveloper
Copy link
Contributor

Right... Ideally this should take into account whether compiler flags and/or environment variables have changed. Perhaps compiler version would be another thing to take care of. So a simple solution is to store output in directories which contain checksum of all the parameters we care about.

@errordeveloper
Copy link
Contributor

Personally, I do still believe that compilers shouldn't be to claver about how to build projects and attempting to replace tools like make wouldn't likely end quite well. I know that clang has those JSON files, which I haven't quite looked into... Also wanted to point out that Qt's QBS certainly looks very appealing.

@l0kod
Copy link
Contributor

l0kod commented Aug 31, 2014

Distributed build systems for Rust: http://discuss.rust-lang.org/t/distributed-build-systems-for-rust/400

Ninja should be an inspiration for rustc: https://martine.github.io/ninja/

Or maybe Rust should use Ninja…

cc #8456, #16367

@l0kod
Copy link
Contributor

l0kod commented Sep 6, 2014

Shake is another interesting build system: https://github.com/ndmitchell/shake

A comparaison with Ninja: http://neilmitchell.blogspot.fr/2014/05/build-system-performance-shake-vs-ninja.html

@suhr
Copy link

suhr commented Feb 7, 2015

To be notified: there's also tup with a somewhat different approach.

@nh2
Copy link

nh2 commented Aug 7, 2015

Hi, let me give some pointers to the Haskell world. GHC has this solved - and probably the best working incremental recompilation engine on the planet.

It can do:

  • recompilation avoidance for module imports (easy for any language that doesn't implement imports by string concatenation - hi C++
  • recompilation avoidance within Haskell packages (equivalent to crates)
  • recompilation avoidance across Haskell packages (this even works for system-installed packages: GHC notices when a globally installed library changes its interface files - this has the effect as if a Makefile included correct dependencies on all of the system level /usr/include/**.h files, and gives correct builds even in changing environments)
  • recompilation avoidance on a semantic level (changing a comment doesn't invalidate downstream modules like it would in make)
  • recompilation avoidance at function granularity (changed function not used => don't need to recompile downstream if the includer doesn't use that very function)
  • recompilation avoidance even if TemplateHaskell (AST modifying macro language) is involved
  • detection of compilation flag changes
  • all of the above in presence of whole program optimization (cross-module inlining)
  • all of the above in presence of parallel builds (ghc --make -j)
  • all of the above with full correctness (in my 5 years of Haskell programming I have not once encountered a case where the equivalent of a make clean would have been necessary with ghc --make)

GHC has documented its approach in high detail, and highlighted what the problems are. See here:

https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/RecompilationAvoidance

I believe that Rust is in perfect shape to reach the same level of incremental compilation, and can likely apply the same techniques.

It's all there, we just have to copy it ;)

Personally, I do still believe that compilers shouldn't be to claver about how to build projects and attempting to replace tools like make wouldn't likely end quite well

I once also thought so, but this is not the case. External build tools like make, tup, or even Shake (which has really figured it out) can never reach the same level of granularity that a compiler can, given that it understands what syntax (comments etc.) and semantics (includes, unused functions) are.

@ebassi
Copy link

ebassi commented Aug 11, 2015

An important side effect of incremental recompilation that was not mentioned, and I think it's more important than building environment enablement, is enabling tooling (like IDEs) to perform operations like on-the-fly code analysis and prompt the user for warnings, code completion, etc.

@da-x
Copy link
Member

da-x commented Aug 20, 2015

The problem with 99% of build tools out there is that they duplicate the dependency management by either letting the user specify dependencies prior to target execution and/or by scanning them heuristically with prior knowledge over the type of executed target, and often doing so wrongly, failing to duplicate intrinsic logic, resulting in broken or failed builds. The builds are sometimes slower than they could be because the developer forsakes on specifying dependencies in favor of safe but slow re-execution, to avoid the aforementioned broken builds.

I suggest that you look into a newer approach, where dependencies are detected reliably for any kind of intermediate. The developer only needs to worry about target invocation itself. So, if you neatly split the build process to a DAG of separate process invocations, even with extremely complicated dependencies, this tool can track them easily.

@huonw
Copy link
Member

huonw commented Jan 5, 2016

@nikomatsakis nikomatsakis self-assigned this Jan 6, 2016
@nh2
Copy link

nh2 commented Jan 23, 2016

@huonw Is this issue the right place to discuss the RFC?

@nh2
Copy link

nh2 commented Jan 23, 2016

Even if not, I'll ask some questions here:

the compiler will always parse and macro expand the entire crate

Why was this chosen? Wouldn't it make sense to include source files in the dependency graph as well, so that you can skip parsing and even reading the file contents if the file modification time suggests that the file has not changed?

Optimization and codegen units

I'm not familiar with codegen units, and where their boundaries would be set, but if it's typically on the library/crate level, that could make some inlining problematic. Take for example Haskell's Data.Bits module in the base package. You would definitely want functions like bitwise-or to be inlined. If the inlining boundary was at the library level, this would not be possible. GHC solves this by using a heuristic on the size of the function; if it's small (or otherwise inline-worthy), an "unfolding" (IR syntax tree) is put into the Interface File Data/Bits.hi (on which GHC's incremental compilation works) so that other modules can inline the unfolding. If Data.Bits was updated to a different implementation of bitwise-or, incremental compilation would "just work": The Haskell file would be detected as being changed, the Interface File would be updated, and all users of that unfolding would be recompiled.
It is not clear to me if the current RFC permits this type of cross-library inlining or not.

@nikomatsakis
Copy link
Contributor

@nh2

Why was this chosen? Wouldn't it make sense to include source files in the dependency graph as well, so that you can skip parsing and even reading the file contents if the file modification time suggests that the file has not changed?

Eventually perhaps yes. But for the initial versions, we're targeting the things in compilation that are most expensive: LLVM and type-checking. Hashing the HIR also means that we can avoid doing recompilation for smaller, trivial changes, like tweaking a comment -- at least in some cases (it turns out that because that affects the line/col number of all statements, we would need to change at least debuginfo, but we can hopefully isolate the effects of that in the future.)

There are also just practical concerns. It's much easier to reduce the amount of the compiler we have to instrument.

I'm not familiar with codegen units, and where their boundaries would be set, but if it's typically on the library/crate level, that could make some inlining problematic.

Users can always add #[inline] manually to indicate things that should be inlined widely (e.g., across crates).

@bluss bluss added the A-incr-comp Area: Incremental compilation label Apr 30, 2016
@steveklabnik
Copy link
Member

#34956 🎊

@nh2
Copy link

nh2 commented Aug 19, 2016

@nikomatsakis Thanks for your explanation.

@brson
Copy link
Contributor

brson commented Jun 5, 2017

@michaelwoerister is this still the best tracking issue for incremental? What's the current status?

@nikomatsakis
Copy link
Contributor

I forgot this issue existed. The preferred tracker is rust-lang/rust-roadmap-2017#4. In fact, I'm just going to close this issue.

@kmcallister
Copy link
Contributor

@nh2: It's frustrating (and hardly uncommon) for people to show up and say we "just need to copy Haskell" while ignoring the very real differences between the languages. In this case, the most important difference is that a whole Rust crate is semantically a single compilation unit, in fact a single syntax tree. Any .rs file in a crate can use stuff from any other .rs file, without any kind of forward declaration. This simply isn't true for the .hs files in a Haskell package. Mutually recursive imports will give you an error:

$ ghc --make Main.hs
Module imports form a cycle:
         module ‘Foo’ (./Foo.hs)
        imports ‘Bar’ (./Bar.hs)
  which imports ‘Foo’ (./Foo.hs)

The only way around this is the tedious and error-prone approach of writing a hs-boot file, equivalent to writing header files in C. In practice almost nobody does this; they simply structure their programs so the module dependency graph is acyclic. In terms of the build system (not in terms of packaging / versioning), this is like placing every .rs file into its own crate.

So, most of the features you highlight in ghc --make are present in Rust -- but they're features of Cargo, not rustc! True incremental compilation, within a single unit of mutually-referential stuff, is to my knowledge not a problem GHC tries to solve.

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2022
reborrow error: clarify that we are reborrowing *from* that tag

`@saethlin` I found the current message not entirely clear, so what do you think about this?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-linkage Area: linking into static, shared libraries and binaries E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority
Projects
None yet
Development

No branches or pull requests