Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVS manual garbage collection (IDFGH-13397) #14305

Open
bryghtlabs-richard opened this issue Aug 5, 2024 · 9 comments
Open

NVS manual garbage collection (IDFGH-13397) #14305

bryghtlabs-richard opened this issue Aug 5, 2024 · 9 comments
Assignees
Labels
Status: Opened Issue is new Type: Feature Request Feature request for IDF

Comments

@bryghtlabs-richard
Copy link
Contributor

Is your feature request related to a problem?

When NVS has plenty of erased space, writes are fast. But after NVS has been fully written writes require erases of previously written data, which make my UI stutter.

Describe the solution you'd like.

I would like the ability to request garbage collection and erasure of free space at a time when it makes sense for my application, similar to TRIM command for managed Flash SSDs. Once complete, most unused space in NVS would be erased.

Describe alternatives you've considered.

I've considered deferring writes or commits to NVS, but this is not a great workaround, as it increases the time where data may be lost before NVS writes to Flash.

Additional context.

It might be nice to be able to specify a limit in terms of time.

@bryghtlabs-richard bryghtlabs-richard added the Type: Feature Request Feature request for IDF label Aug 5, 2024
@github-actions github-actions bot changed the title NVS manual garbage collection NVS manual garbage collection (IDFGH-13397) Aug 5, 2024
@espressif-bot espressif-bot added the Status: Opened Issue is new label Aug 5, 2024
@rrtandler
Copy link
Collaborator

Hi @bryghtlabs-richard,

The different response of NVS api in the "empty" vs "full" scenarios are result of different way, the free space for the newly written data is discovered. If the NVS has already written into all the 4k pages reserved to it, subsequent attempts to write the data can either fit into the remaining space in the active NVS page (then it is fast) or needs to reclaim some space from the NVS pages marked as full (with some entries in it marked as erased). Then the "local" defragmentation takes place as a part of the nvs_set_* operation. Which, because of its unpredictability, causes annoying delays.

If I understand your proposal, you would like to extend the API by function defragmenting the complete NVS space. In the context of this proposal, what exactly is meant by the note: "It might be nice to be able to specify a limit in terms of time." ?

Please share also more about the granularity of data storage (whether you use blob or store individual items using single entry operations) and frequency of updates.

@bryghtlabs-richard
Copy link
Contributor Author

bryghtlabs-richard commented Aug 12, 2024

Hi @rrtandler, you understand my problem well.

In the context of this proposal, what exactly is meant by the note: "It might be nice to be able to specify a limit in terms of time." ?

Optional partial defragmentation. On startup, I'd like to defrag the whole partition. But once our UI is running, sometimes there are better times to defrag that are not be long enough to defrag the entire partition, then I might like to request defragmentation, but once defragmentation exceeds the specified time, please return soon.

Please share also more about the granularity of data storage (whether you use blob or store individual items using single entry operations) and frequency of updates.

We have a 64KB NVS, about 25% utilization on most devices, mix of items:

  • Most values are small individual items. Also ESP-IDF items for things like WiFi which can trigger writes during UI updates.
  • One 8KB blob contains periodic calibration data from a suite of analog sensors. This is not updated very often.

@rrtandler rrtandler self-assigned this Sep 12, 2024
@rrtandler
Copy link
Collaborator

Hi @bryghtlabs-richard - sorry for late response. I'll take this topic to our internal feature candidate discussion.

@bryghtlabs-richard
Copy link
Contributor Author

Thank you kindly. I haven't had time to look into this myself, but it still causes our UI animations to stutter sometimes - it seems almost every time the 8KB blob is updated during an animation the UI stutters.

I wonder if a different way to specify would be useful - If I could specify that NVS defrag until at least 12KB of space were free, or the time limit were exceeded, this would keep my UI working, without defragging unnecessarily.

@rrtandler
Copy link
Collaborator

There is one thing that comes to my mind related to the animation blocking. Is your animation using separate FreeRTOS task to create the animation in parallel to the task calling the NVS blob update ?

@bryghtlabs-richard
Copy link
Contributor Author

It occurs in both cases:

  • When non-UI thread writes to NVS, it's more noticeable. Example: user is transitioning from one screen to another or on a page with animation, and UI stalls due to flash erase time.
  • When the UI thread writes to NVS, it's due to a button click or other user triggered event, so it's not very noticeable. Example: user clicks language chooser, button animates, NVS write stalls UI thread but not for very long, then screen transition animation occurs. What I mean is that stalls due to UI thread are usually scheduled at times that won't impact UI badly, because they are synchronous with UI interaction.

@rrtandler
Copy link
Collaborator

I have consulted options that might help you at the moment and there are actually 2 potential settings in place:

  1. You can take a look at the kconfig parameters influencing the way spi_flash component interacts with the FreeRTOS scheduler during erase operation.
    Run menuconfig and look at menu path: (Top) -> Component config -> SPI Flash driver ->
    SPI_FLASH_YIELD_DURING_ERASE
    SPI_FLASH_ERASE_YIELD_DURATION_MS
    SPI_FLASH_ERASE_YIELD_TICKS
    And provided the SPI_FLASH_YIELD_DURING_ERASE is enabled, try to tune the duration / ticks

  2. If you are running your application on ESP32-S2 or S3 and your module is PSRAM equipped, there exists a possibility to let the application execute from PSRAM instead of SPI Flash. This could help during flash erase as this operation in default situation blocks the CPU cache. There is an example demonstrating this technique in examples/system/xip_from_psram. You can start with README.md there.

@bryghtlabs-richard
Copy link
Contributor Author

Thanks for those ideas Radek,

  1. We've already enabled SPI_FLASH_YIELD_DURING_ERASE and tuned the granularity. It did help some.
  2. I'd like to move my application to PSRAM, but we don't have enough space to move the entire application, and it doesn't seem possible to move individual functions to PSRAM like it is possible to do with IRAM. I had brought this up on the forum, if you think this would help, should I make an ESP-IDF feature request? I think this could be useful to move LVGL graphics code to PSRAM, but I don't think we can move all of our image resources.

@rrtandler
Copy link
Collaborator

Thanks for the update,

At the moment, we do not have plan for implementation of NVS defragmentation. While this is true, I was already thinking of how to implement it and as a side effect, following NVS use scheme came to my mind when taking into account your particular use case.

  1. Have 2 NVS partitions. permanent_nvs (the one you have at the moment) and cache_nvs. Make the size of the cache_nvs same as permanent_nvs.
  2. On init or after the call to defrag, use nvs iterator to visit every record in the cache_nvs and update permanent_nvs with the data. After the loop ends, erase complete cache_nvs partition and reinitialize it.
  3. Wrap the nvs_get_* by first looking into the cache_nvs and in case of error (any) return what call to permanent_nvs returns.
  4. To write data, call the nvs_set on cache_nvs only.
  5. Call defrag function periodically to avoid unwanted memory reclaim and associated flash_erase at the wrong moment.

Hope it could inspire you and help.

Radek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new Type: Feature Request Feature request for IDF
Projects
None yet
Development

No branches or pull requests

3 participants