-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: describe CLIF opcodes declaratively #1150
Comments
|
My thoughts: I like the idea of making it easier to add and understand instructions! However, along the lines of @bjorn3's point 2 above, I think exposing the encodings in this type of thing is problematic because it starts to leak the inner workings of cranelift--would you still want to do this type of thing without the encoding section? Another thought would be to add a section with the semantics of the instruction to make it possible to generate an interpreter for CLIF. I have some minimal experience with DynSem and it seems possible to create DynSem definitions for a large portion of the instructions. |
What about operations like popcnt, bitrev, clz, ctz and other of those bit level operations? I think they will be easier to implement in rust instead. |
I think they would all be easier to implement in Rust but the point of using some other semantics language (doesn't have to be DynSem) is that outside tools can do stuff like mechanized proofs (e.g., as in #1138). |
@bjorn3 The point about magic does not make sense to me, could you explain where you see the magic in what I proposed? The YAML is mostly a mapping of the For me it would be more ideal if there was something like a trait that needed to be implemented for every instruction and that contained the code for the encoding transfromation. So you could see all of the logic contained in rust rather than fragments. impl InstructionEncoding<Riscv> for Jump {
fn encode(&self, func: !, destination: !, sink: CodeSink) {
let dest = i64::from(func.offsets[destination]);
let disp = dest - i64::from(sink.offset());
put_uj(bits, disp, 0, sink);
}
} I also don't understand how this prevents keeping the legalisation code you mentioned, though I have no experience with the legalisation step. I'm also not against the separation between ISA's, which is why I included an example of how the original could simplify reference a specific instruction such x86 that could be defined in another file, so you could have the exact same structure that exists today but using a data format. What's important to me though is that it has a reference so that it is clear from looking at the instruction what steps you need to take to implement the instruction. Right now this is not the case. From my own experience I tried to add a single instruction that took zero operands in or out that just needed to be translated directly into opcodes and I found the experience of trying to find out how to do this so frustrating I gave up on trying to implement it. |
I had a similar experience with trying to add the Here's another, more extreme, data point: Speaking only for myself, I believe the metalanguage system is complex and burdensome for all involved. It should be simplified and replaced by straight Rust code wherever feasible. In the 2004-2005 timeframe I implemented all the functionality that the metalanguage currently gives us, in a different compiler targeting x86_32/64, arm32 and Power32. This was without any metalanguage, just using normal structs, switches, and functions. The result was simple, obvious, does good quality instruction selection and is easily maintained. It also seemed to be easy for others to work with -- other folks subsequently added Power64, MIPS32/64 and S390 support and greatly extended the existing targets without apparent difficulty. I would be somewhat sympathetic to the metalanguage idea if the underlying target -- the instruction sets -- changed frequently. Then it would save us from having to rewrite said switches, structs, etc, frequently. But instruction semantics and encodings never change once they are defined, since that would break backwards compatibility of the architectures. So in practice we don't get that benefit from the metalanguage. |
Current and future roles of the meta-languageAnswering the latest comments first: the meta language does much more than machine encodings. In addition to handling instructions, it also allows you to:
Things we haven't done yet, but that we could do with such a meta-language:
I fully agree that encodings and machine code generation are a bit hard to understand, but I would argue that this is because we're in a weird halfway position between using the meta language and not using it, and that there is room to improve. So overall, I don't think getting rid of the meta language is a good idea. Also look at future directions for optimizing compilers for more detailed motivations to use meta languages. Using static descriptionsThis part answers the original point of this issue. Note my remarks are common to your proposition (using static descriptors in files) and to a proposition I thought of a while ago to use our own, very small meta-language (a proper DSL with a parser, etc.). As stated by other people, this makes the code more magic, in the sense that you have to know the names of the keys, and it's easy to make a mistake. For instance, a typo in a name, since names would be looked up dynamically with descriptor files (note there's also some dynamic lookups happening right now, but they ought to disappear). Such an approach doesn't allow easy code reuse either: for instance, having the meta language specified in Rust allows us to have real control flow structures (for loops) that reduce the number of lines of code needed to specify multiple definitions. We would lose this with descriptions in files. Using Rust allows us to make use of other developer-oriented facilities too: having a language-server plugin around allows me to jump at definitions of things, see how concepts are used, autocomplete names, and give me compilation feedback when I make a typo in a name. The one downside is that, the more definitions we add, the longer the meta crate takes to compile. How we can help with thisLocality of definitionsLocality of meta definitions could be improved in the code too, and it's a great thing to enhance. We could imagine that each IR instruction have its own file, and that related entities would be defined throughout the file:
It would imply having many more files and many more functions, so this might regress compile time of the meta crate badly. Or it could make it better in a future where Rustc can parallelize compilation more effectively. Improve the documentationHaving a high-level explanation of how things interact with each other would be helpful. Also detailed examples of how to add instructions and encodings, would be great. We could check in such tutorials, or put them into a wiki so they don't need reviews and can be updated faster by members of the community. Removal of recipes, and general simplifications of the meta crate |
I would disagree with the assertion that either a DSL or static descriptions would be magic. To me what makes something feel "magic" is implicit behaviour that is hard to track down. This isn't inherent to descriptor files or a DSL and more something that comes from the design itself. I should have been a bit more clear, but I don't think descriptor files would be a replacement for the Rust meta language, nor would I want that solution. Rather I would want them to act as a compliment to the meta language, reducing boilerplate, and providing more locality. If I could use one of my own projects as an example of how a descriptor file could be used. Tokei uses a descriptor file for all of the programming languages it supports in a single JSON file. This file is used with the handlebars templating engine to generate the definition of the
I don't think every instruction should have its own file, not for the performance reasons you mentioned, rather it would create a lot of boilerplate and overhead in terms of having to hop between a lot of files when working improving on certain parts. To me grouping instructions by their related domain, (e.g. jump instructions in one module, comparison instructions in another) would be ideal. |
I agree (kind of). To me, grouping code by functionality would be ideal. For In particular, when writing instruction selectors I have found it useful to
These can only be answered (optimally) when one knows the context in which the Also .. if the instruction selector can merge multiple IR instructions into a
should that be associated with the description of the |
I think now we are discussing pretty much the same thing in bytecodealliance/rfcs#15. |
Not exactly, this is also about cranelift ir instruction definitions, while ISLE is only about lowering of instructions. |
Motivation
Cranelift's IR went through a huge overall from the original python generator to one written in Rust. This has made it easier to work on it as we're using the same language to generate the code that used in the rest of the project. However it is still not ideal in terms of contributing new instructions or changing existing instructions as it is not clear what steps are required, and which parts of the meta code generator need to be modified to correctly add an instruction.
This could be partially solved by providing better documentation, however documentation doesn't improve the experience of contributing to the codebase itself, and documentation is likely to fall out of date as time goes on similar to the current situation, as there's no strict link between an instruction and its documentation besides what's available in
InstBuilder
.Instead I would like to propose rewriting at least some of the meta code generator with a greater focus on data locality, so that if someone wants to contribute to the IR, the number of overall steps is reduced. This should also help with documenting the IR format, as the documentation can be more closely coupled.
Proposed Solution
I would like to propose something along the lines of using a data format to encode information about a instruction and how it is encoded. I've used YAML to illustrate what this could look like, though I'm not advocating for the use of YAML specifically.
The main advantage to this approach is that moves the current imperative style of
cranelift-codegen
into something that is more declarative and data oriented. To me this provides more clarity around how a instruction is defined and used, if you wanted to create a new instruction one could simply copy and paste from another already working instruction.Drawbacks
While the goal of this proposal is to simplify working on the IR language itself it could make the underlying meta code more complex, as well potentially increasing the compile time to build
cranelift-codegen
since there would now be a deserialisation step that wasn't there before.Alternatives
Instead of changing the system there could be a greater focus on building resources for contributing to cranelift that try to explain how to use the current system.
The text was updated successfully, but these errors were encountered: