Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of Logic Elements almost double when synthesizing for Max 10 #140

Closed
SaabFAN opened this issue Aug 12, 2021 · 7 comments
Closed

Usage of Logic Elements almost double when synthesizing for Max 10 #140

SaabFAN opened this issue Aug 12, 2021 · 7 comments

Comments

@SaabFAN
Copy link

SaabFAN commented Aug 12, 2021

Hi!

Great project, I really would like to implement it in my Signal Generator-Project, but I only have a 10M04SCE144C8G (Max 10 FPGA with 4k Logic Elements) on the board.
The smallest configuration I can synthesize (only the Processor), however, requires something between 5 and 6K LEs.
If I synthesize the Processor for a Cyclone V, or Cyclone 10 LP FPGA, I can get the LE count to somewhere around 2.5 to 3K.
Other VHDL Projects consume roughly (1% difference) the same amount of LEs on both Cyclone 10 and Max 10.

Do you have any idea why this might be happening. And maybe even a solution in the works already? :)

I use Quartus Lite 20.1.0 Build 711.
To repeat the issue, just start a fresh project in Quartus and specifiy any Max 10 FPGA as the target device, add all the vhdl-files and one of the Templates and then make the Template the Top Level Design Entity. After that, just hit Compile. Note the LE-Usage.
After that, change to any other FPGA and try again. You'll see the Max 10 is using almost twice the LEs.

PS: Upgrade to a larger FPGA was my first idea, but neither Mouser, nor Digikey or Farnell have pin compatible devices in stock and list them to be available sometime in 2022 ;)

@stnolting
Copy link
Owner

Hey there!

I just tried a quick-and-dirty setup for a 10M04SCE144CEG. Seems like I have the same version of Quartus on my system as you do.

I am using a setup that uses the bootloader. The IMEM (internal instruction memory) and DMEM (internal data memory) are mapped to block RAM primitives without problems. However, the bootloader ROM is a problem: quartus uses LUTs to implement this 4kB ROM. That is the cause of the excessive LUT usage. For my test setup, the bootloader ROM takes up 2902 LUTs (more than 50% of the whole FPGA).

The MAX10 FPGAs contain M9K BRAM primitives that support pre-initialization. So they can be used as ROM. On my Cyclone 4 setup - that also contains M9K primitives - Quartus is capable of generating the according MIF (memory initialization file) and can map the bootloader ROM also to M9K blocks. I am not sure why Quartus has problems with the MAX10... I will further investigate. However, it seems to be a "common" problem: enjoy-digital/litex#228

@stnolting
Copy link
Owner

stnolting commented Aug 12, 2021

Oh, I think I just found the reason: https://community.intel.com/t5/Intel-Quartus-Prime-Software/Inferring-ROM-for-MAX-10-FPGA/td-p/271693?profile.language=en

Seems like the "compact" SC MAX10 FPGAs do not support memory initialization via the bitstream... :(

There are two options left to get an executable into the NEORV32 without the default bootloader:

  1. Use the on-chip debugger, but that requires an external JTAG adapter: 📚 https://stnolting.github.io/neorv32/ug/#_debugging_using_the_on_chip_debugger
  2. You could still use the bootloader, but then you have to make it smaller so that it fits into less LUTs. You can remove code from the bootloader sources (for example all the SPI flash stuff) or you can use the pre-defined bootloader configuration options: 📖 https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader.

Unfortunately, the bootloader does not provide a configuration option to disable all the SPI-related options (like booting from an external SPI flash). I think I need to fix that 😉

@SaabFAN
Copy link
Author

SaabFAN commented Aug 12, 2021

What about the User Flash-Area in the Max 10?
As far as I know, it's something that sets the Max 10 Series apart from all other FPGAs.

I've never tinkered with it (I'm pretty much a beginner in terms of FPGA-Stuff), but would it be possible to store the Bootloader in it?
One of the ideas I had for my device was to use one of the IPs available in Quartus to connect to the User-Flash via SPI and use it basically as an internal SPI-Flash for the Application via the External Memory-Interface.
Maybe if the Bootloader is stored in the User-Flash and just dumped from there into IMEM (implemented as RAM) by a simple state-machine before the processor is started, it could basically work around the problem that memory cannot be initialized by the bit-stream directly.

@stnolting
Copy link
Owner

stnolting commented Aug 12, 2021

I've never tinkered with it

Me neither. I always wanted to check out the MAX10 FPGAs but I don't have an actual board (yet).

but would it be possible to store the Bootloader in it?

Sure. The core does not care where the executable comes from. Furthermore, the interface constraints are quite relaxed. So there is no problem if reading a single instruction word takes several cycles to complete.

One of the ideas I had for my device was to use one of the IPs available in Quartus to connect to the User-Flash via SPI and use it

I think there might also be a Quartus IP block for wrapping up the user flash. Connecting it to the external memory interface shouldn't be a big deal - even it it uses some Intel-specific Avalon bus interface.

Maybe if the Bootloader is stored in the User-Flash and just dumped from there into IMEM (implemented as RAM) by a simple state-machine before the processor is started, it could basically work around the problem that memory cannot be initialized by the bit-stream directly.

This is indeed a feasible option. Another option would be to use a tiny, minimalist bootloader (implemented as tiny LUT ROM) that only provides the very basic features to allow upload of executables via UART. I think this could be done in ~512 bytes (or maybe even less) memory size. I think this would be the most flexible solution (beside the on-chip debugger approach). If the executable is fetched from internal flash, one need to "re-program" the FPGA (or at least the flash section) every time the software changes. I am currently tinkering with the NEORV32 bootloader - I think I can shrink that a little bit more 😉

@SaabFAN
Copy link
Author

SaabFAN commented Aug 12, 2021

Looking forward to the possibility to cram the processor into the FPGA on my Signal Generator.

Regarding the Avalon-Interface: There are Wishbone to Avalon-Bridges available, which consume very little ressources.

Btw. I've sent you a Email regarding the FPGA-Board - I've got a Board with a 10M08SAU here that I don't need anymore. ;)

@stnolting
Copy link
Owner

Looking forward to the possibility to cram the processor into the FPGA on my Signal Generator.

I am curious. Is your setup based on any public available board? 🤔

Regarding the Avalon-Interface: There are Wishbone to Avalon-Bridges available, which consume very little ressources.

Right. I think the whole "bridging" can be done just by some simple gates in-between the control signals.

Btw. I've sent you a Email regarding the FPGA-Board - I've got a Board with a 10M08SAU here that I don't need anymore. ;)

I will have a look! 😉

@stnolting
Copy link
Owner

I think this can be closed. Feel free to open another issue or start a new discussion if you have further questions. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants