Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨[Zxcfu ISA ext.] add option to implement custom RISC-V instructions #264

Merged
merged 34 commits into from
Jan 31, 2022

Conversation

stnolting
Copy link
Owner

@stnolting stnolting commented Jan 29, 2022

With this PR the NEORV32 now provides an option to add custom RISC-V instructions. 🚀

This PR adds a Custom Functions Unit (CFU) wrapped in the Zxcfu ISA extension, which is a NEORV32-specific custom ISA extension. The extension's name follows the RISC-V naming scheme:

  • Z = this is a sub-extension
  • x = the second letter behind the Z defines the "parent-extension" where this sub-extension belongs to: in this case it belongs to the X "custom extensions" extension (platform-specific extension that is not defined by the RISC-V spec.)
  • cfu = name of the extension (Custom Functions Unit)

neorv32_processor

The CFU is implemented as a new hardware module (rtl/core/neorv32_cpu_cp_cfu.vhd) that is integrated right into the CPU's ALU. Thus, the CFU has direct access to the core's register file, which provides minimal data transfer latency. A special OPCODE, which has been officially reserved for custom extensions by the RISC-V spec, is used to build custom instructions. The custom instructions supported by the CFU use the R2-type format that provides two source registers, one destinations register and a 10-bit immediate (split into two bit-fields:

cfu_r2type_instruction

The funct7 and funct3 bit-fields can be used to pass immediates to the CFU for certain computations (for example offsets, addresses, shift-amounts, ...) or they can be used to select the actual custom instruction to be executed (allowing up to 1024 different instructions).

Software can utilize the custom instruction by using the provides intrinsics (defined in sw/lib/include/neorv32_cpu_cfu.h. These pre-defined functions implicitly set the funct3 bit field. Each intrinsic can be treated as "normal C function" (see #263). A simple demo program using the default CFU hardware is available in sw/example/demo_cfu/main.c.

// custom instruction prototypes
neorv32_cfu_cmd0(funct7, rs1, rs2); // funct3 = 000
neorv32_cfu_cmd1(funct7, rs1, rs2); // funct3 = 001
neorv32_cfu_cmd2(funct7, rs1, rs2); // funct3 = 010
neorv32_cfu_cmd3(funct7, rs1, rs2); // funct3 = 011
neorv32_cfu_cmd4(funct7, rs1, rs2); // funct3 = 100
neorv32_cfu_cmd5(funct7, rs1, rs2); // funct3 = 101
neorv32_cfu_cmd6(funct7, rs1, rs2); // funct3 = 110
neorv32_cfu_cmd7(funct7, rs1, rs2); // funct3 = 111

This new feature was highly inspired by @google's CFU-Playground - thanks again to @umarcor for showing me that framework. With some logic plumbing it should be possible to install the CFUs from the CFU-Playground into the NEORV32.

📚 Documentation

The documentation of the CFU module is available in the online processor data sheet section "Custom Functions Unit (CFU)". A comparison of different processor extension options is available in user guide section "Adding Custom Hardware Modules".


CFU vs. CFS

There are two processor-internal options for custom hardware now: the Custom Functions Subsystem (CFS) and the Custom Functions Unit (CFU).

  • Custom Functions Subsystem (CFS): The CFS is a memory-mapped peripheral that is accessed using load/store instructions. It is intended for complex accelerators that - once triggered - perform some "long" processing in a CPU-independent manner (like a complete AES encryption). The CFS also provides the option to implement custom interfaces as it has direct access to special top entity signals.
  • Custom Functions Unit (CFU): The CFU is located right inside the CPU's pipeline. It is intended for custom instructions that implement certain functionality, which is not supported by the official (and supported) RISC-V ISA extensions. These instructions should be rather simple data transformations (like bit-reversal, summing elements in a vector, elementary AES operations, ...) rather than implementing a complete algorithm (even if this is also supported) since the CFU instructions are absolutely CPU-dependent and will stall the core until completed.

@stnolting stnolting added enhancement New feature or request HW hardware-related SW software-related labels Jan 29, 2022
@stnolting stnolting self-assigned this Jan 29, 2022
@stnolting stnolting changed the title ✨[Zxcfu ISA extensions] add option to implement custom RISC-V instructions ✨[Zxcfu ISA ext.] add option to implement custom RISC-V instructions Jan 29, 2022
@stnolting
Copy link
Owner Author

I have added a summarized comparison of the four most obvious (IMHO) options for adding custom hardware modules to the processor (user guide, section "Adding Custom Hardware Modules"). These options are:

  • attach custom hardware via the external memory interface (WISHBONE)
  • attach custom hardware via the stream link interface (SLINK)
  • implement custom hardware via the custom functions subsystem (CFS)
  • implement custom hardware via the custom functions unit (CFU)

@umarcor

We recently had a short discussion about this topic. Could have a look at the comparison table (-> https://github.com/stnolting/neorv32/blob/zxcfu_isa_extension/docs/userguide/adding_custom_hw_modules.adoc#16-comparative-summary)? Maybe you have some ideas for additional (or better) comparison "metrics". 😉

@stnolting stnolting marked this pull request as ready for review January 31, 2022 04:32
@stnolting stnolting merged commit 3ac6303 into master Jan 31, 2022
@stnolting stnolting deleted the zxcfu_isa_extension branch January 31, 2022 07:10
@umarcor
Copy link
Collaborator

umarcor commented Feb 1, 2022

@stnolting you are so fast! I commented it mostly for gathering some knowledge and you implemented all of it in 1-2 days! That's impressive! Thank you so much!

/cc @tcal-x @mithro @kgugala might be interested in knowing that NEORV32 supports CFU and might be combined with the content from google/CFU-Playground.

@tcal-x
Copy link

tcal-x commented Feb 3, 2022

Hi! That's great news! Sorry I forgot to follow up on this. It would be super awesome to have an alternative CPU that would connect to CFUs similar to VexRiscv, even if an adapter is needed (I haven't yet checked out the CFU interface on NEORV32 to see how similar it is). Connecting VexRiscv to CFU is actually done in LiteX: https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/cpu/vexriscv/core.py#L275-L328 .

Does NEORV32 currently plug into LiteX?

@stnolting
Copy link
Owner Author

Does NEORV32 currently plug into LiteX?

Not yet, but that is already on the to-do stack (#115) 😉

haven't yet checked out the CFU interface on NEORV32 to see how similar it is

According to this CFU-Playground template the interface seems to be quite similar. However, I need to find some real specification for that and test some existing CFU setups (see #269).

@umarcor
Copy link
Collaborator

umarcor commented Feb 5, 2022

FTR, there is https://github.com/umarcor/neorv32-setups/commits/umarcor/edaa, which is a work-in-progress prj.py to declare NEORV32 sources through pyEDAA.ProjectModel. Having it defined in Python should make it easier to reuse in Litex. However, I'm not sure about the steps required to support a new CPU in Litex (most of the uses I see are packaging SoC with already supported cores/modules). I've seen Migen and SpinalHDL designs begin available in the ecosystem, but not VHDL. Is there any Litex example using VHDL or do we need to convert NEORV32 to Verilog (#266)?

On the other hand, it might be interesting to add a .core file to stnolting/neorv32-setups. Can Litex read FuseSoC's .core files or does it need some specific declarative format?

/cc @enjoy-digital @olofk

@enjoy-digital
Copy link
Contributor

Hi @umarcor,

we currently have one VHDL CPU integrated in LiteX: Microwatt. It can used and integrated as VHDL or pre-converted to Verilog through GHDL/Yosys.

To add a CPU, you first need to create the LiteX wrapper around it. You can eventually use these PRs as reference: CV32E40P or FemtoRV, with still local sources in a first time. Once working, we could package NEORV32 in a pythondata-xxyy package as we are doing for other CPUs.

@stnolting's NEORV32 looks awesome and I would be really happy to help for the integration in LiteX. This would also probably be a good stress test for the GHDL/Yosys plugin :) (Proprietary tools will accept VHDL, but conversion to Verilog is useful to simulate with litex_sim (through Verilator) or to implement it with open-source tools on hardware).

@stnolting
Copy link
Owner Author

@enjoy-digital

Thank you very much! I will take a closer look at the Microwatt integration and try to find out how things work 😉

@enjoy-digital
Copy link
Contributor

@stnolting: In fact, since NeoRV32 seems a lot more documented than LiteX and that I'm also a VHDL developer, it would probably be more efficient that I at least do the skeleton for the integration to initiate the work and allow us to work together on this. It seems different issues are related to this or derived aspects (Verilog generation, CFU, etc...) so it could allow you to go further on these aspects and doing the NeoRV32 integration could also be a good occasion for me to write a CPU integration tutorial for LiteX :) I'll have a look at integrating neorv32_cpu.vhd as a LiteX CPU in the next days and will share progress here.

@umarcor
Copy link
Collaborator

umarcor commented Feb 13, 2022

@enjoy-digital, please, don't do it alone. I mean, @stnolting is the author of NEORV32 as a whole, and I wrote most of the Makefiles in neorv32-setups. See the diagram in hdl.github.io/constraints/Usage. So, please, ask as soon as you don't understand anything about the structure.

I suggest you take a look at processor_templates and system_integration subdirs in this repo. Rather than integrating neorv32_cpu, you might want to start with one of those. With regard to the board_tops in neorv32-setups, maybe you don't want to use them in LiteX at all (because that's the core functionality of LiteX and you do have litex-boards already), however, they can serve for inspiration.

@enjoy-digital
Copy link
Contributor

@enjoy-digital, please, don't do it alone. I mean, @stnolting is the author of NEORV32 as a whole, and I wrote most of the Makefiles in neorv32-setups. See the diagram in hdl.github.io/constraints/Usage. So, please, ask as soon as you don't understand anything about the structure.

Sure, that's what I mean with the integration skeleton. It's easier for me to put things in place for LiteX, when it will be done and if there are issues, I'll be able to share a simulation environment we could use to continue the bringup.

I suggest you take a look at processor_templates and system_integration subdirs in this repo. Rather than integrating neorv32_cpu, you might want to start with one of those. With regard to the board_tops in neorv32-setups, maybe you don't want to use them in LiteX at all (because that's the core functionality of LiteX and you do have litex-boards already), however, they can serve for inspiration.

In fact the NeoRV32 CPU seems to be the equivalent of other CPUs integrated in LiteX so it's easier to start with this. If LiteX can also be useful to allow running the NeoRV32 Processor on different hardware or to provide some peripherals not present in NeoRV32 Processor, I'll also be happy to provide help/directions.

@enjoy-digital
Copy link
Contributor

@umarcor: This is working and I suggest moving the LiteX specific discussion to #115 instead of this closed PR :)

@umarcor
Copy link
Collaborator

umarcor commented Feb 14, 2022

@enjoy-digital ack and agree. Thanks for you awesomely fast and effective response!

@stnolting
Copy link
Owner Author

I think I would recommend to use the processor top entity (rtl/core/neorv32_top.vhd) for integration (or maybe some of the template processor wrappers). The bus interfaces of the stand-alone CPU are somehow proprietary even though they were inspired by Wishbone.

You can disable all processor-internal modules (even the memories) via generics. By this configuration you can get a processor setup providing just a Wishbone-compatible bus interface.

This is working and I suggest moving the LiteX specific discussion to #115 instead of this closed PR :)

I agree 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request HW hardware-related SW software-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants