[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647

chipweinberger · 2023-01-30T09:30:19Z

Answers checklist.

I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v4.4.

Operating System used.

macOS

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-S3 Dev Kit C

Power Supply used.

USB

What is the expected behavior?

AES CBC should complete.

What is the actual behavior?

Hits AES timeout.

/* Wait for AES hardware block operation to complete */
static void esp_aes_dma_wait_complete(bool use_intr, lldesc_t *output_desc)
{
#if defined (CONFIG_MBEDTLS_AES_USE_INTERRUPT)
    if (use_intr) {
        if (!xSemaphoreTake(op_complete_sem, 2000 / portTICK_PERIOD_MS)) {
            /* indicates a fundamental problem with driver */
            ESP_LOGE("AES", "Timed out waiting for completion of AES Interrupt");
            abort();
        }
#ifdef CONFIG_PM_ENABLE
        esp_pm_lock_release(s_pm_cpu_lock);
        esp_pm_lock_release(s_pm_sleep_lock);
#endif  // CONFIG_PM_ENABLE
    }
#endif

Steps to reproduce.

I've never hit this before. Might be tempermental.

Debug Logs.

0x40376487: panic_abort at /esp-idf/components/esp_system/panic.c:429

0x40383581: esp_system_abort at /esp-idf/components/esp_system/esp_system.c:128

0x4038daab: abort at /esp-idf/components/newlib/abort.c:46

0x4211057e: esp_aes_dma_wait_complete at /esp-idf/components/mbedtls/port/aes/dma/esp_aes.c:215
 (inlined by) esp_aes_process_dma at /esp-idf/components/mbedtls/port/aes/dma/esp_aes.c:435

0x421109a3: esp_aes_crypt_cbc at /esp-idf/components/mbedtls/port/aes/dma/esp_aes.c:729

0x4207f297: pd_ota_session_write(unsigned long long, pd_buf_t, pd_error_t*) at /jamcorder-firmware/jamcorder_common/jcommon/ota/pd_ota_session.cpp:467
 (inlined by) pd_ota_session_write(unsigned long long, pd_buf_t, pd_error_t*) at /jamcorder-firmware/jamcorder_common/jcommon/ota/pd_ota_session.cpp:672

More Information.

No response

The text was updated successfully, but these errors were encountered:

chipweinberger · 2023-01-30T09:31:07Z

This is really bad behavior. Crashing is not acceptable during an ota update (where I use AES CBC).

Can we not return an error? Should we increase this timeout? is 2 seconds really enough?

chipweinberger · 2023-01-30T09:41:32Z

PR to do something more friendly: https://github.com/espressif/esp-idf/pull/10648/files

ESP-Marius · 2023-02-08T03:40:05Z

Hmm, I dont really see how this could happen. It would help a lot if you were able to find a way to reproduce it.

Adding an error return instead of abort seems reasonable, but I would still like to get to the bottom of how this happened.

ESP-Marius · 2023-02-08T03:43:13Z

@chipweinberger are you by any chance using the Digital signature peripheral as part of your OTA process? (or at all in your app)

chipweinberger · 2023-02-08T04:45:59Z

No I don't use DS peripheral for OTA.

Wifi + aes.

Yes it would be great to repro. I've only hit it once.

Thanks for merging the band-aid fix.

chipweinberger · 2023-02-08T04:51:08Z

Oh one thing! When I hit this I was debugging a big memory leak in my OTA code! I was allocating a freertos semaphore in internal ram ~20 times a second.

My OTA would fail due to out of memory half way into the process.

Maybe ota + aes + memory exhaustion.

I am also using PSRAM, not sure if related.

mahavirj · 2023-03-20T06:33:02Z

@chipweinberger

Is it possible to create a small reproducible use-case for this issue? So far, we have not been able to recreate this scenario on our side (tried different configurations including SPIRAM).

chipweinberger · 2023-03-20T09:14:19Z

I appreciate your effort!

I remember reproducing it a couple times at least, but I imagine a minimal example will take awhile for me to create.

Did you try creating a new mutex every time new ota data arrived?

I was doing the update over wifi.

mahavirj · 2023-03-20T09:21:09Z

Did you try creating a new mutex every time new ota data arrived?

No, I didn't. Do you mean OTA with low internal memory could run into this situation?

chipweinberger · 2023-03-20T16:00:10Z

yes. low internal memory, which was then fully exhausted during the ota update.

I started the update with 20KB internal ram, and due to a bug was allocating and acquiring a mutex every 4KB of the ota update.

dsiganos · 2023-03-28T17:45:57Z

I am hitting this problem every time. Increasing the main stack size to 6000 bytes makes it work. I do not know what is eating up the stack memory yet. However this likely means that the problem happens when the task performing the AES operations runs out of stack.

For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).

mahavirj · 2023-09-11T05:32:12Z

Update:

We have added few fixes in the AES DMA port layer recently, following are pointers for v4.4 branch

First one is the fix for heap corruption issue that may occur with specific AES dma lengths and second is specific to incorrect DMA descriptor setup. Please have a look at them, they are relevant to this issue. If you could not recreate "AES timeout" issue on latest release/v4.4 branch then we can close this tracker.

Thanks.

For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: espressif#10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).

For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).

chipweinberger · 2023-11-28T11:10:36Z

im still seeing this error in v5.1.2

as a workaround im using CONFIG_MBEDTLS_AES_USE_INTERRUPT=n

we otherwise hit the error below, when parsing mainnet block 812548: Timed out waiting for completion of AES Interrupt espressif/esp-idf#10647 This measure does not incur a memory, or latency cost

kel30a · 2024-01-19T14:35:05Z

im still seeing this error in v5.1.2

as a workaround im using CONFIG_MBEDTLS_AES_USE_INTERRUPT=n

Same. It was happening on 5.0.2 for me and still on 5.1.2 after upgrading.

Maldus512 · 2024-01-19T16:20:04Z

I'm facing the same issue while generating the private key for a Rainmaker application. In order to make it work I have to increase ESP_RMAKER_CLAIM_TASK_STACK_SIZE to at least 22 * 1024 bytes (up from 10 * 1024).

chipweinberger added the Type: Bug bugs in IDF label Jan 30, 2023

chipweinberger changed the title ~~[v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt"~~ [CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" Jan 30, 2023

espressif-bot added the Status: Opened Issue is new label Jan 30, 2023

github-actions bot changed the title ~~[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt"~~ [CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) Jan 30, 2023

chipweinberger mentioned this issue Jan 30, 2023

[AES] Timeout: return error dont abort (IDFGH-9265) #10648

Merged

espressif-bot assigned mahavirj Mar 20, 2023

espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Sep 21, 2023

espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress labels Oct 3, 2023

mahavirj closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647

[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647

chipweinberger commented Jan 30, 2023

chipweinberger commented Jan 30, 2023 •

edited

Loading

chipweinberger commented Jan 30, 2023

ESP-Marius commented Feb 8, 2023

ESP-Marius commented Feb 8, 2023

chipweinberger commented Feb 8, 2023

chipweinberger commented Feb 8, 2023 •

edited

Loading

mahavirj commented Mar 20, 2023

chipweinberger commented Mar 20, 2023 •

edited

Loading

mahavirj commented Mar 20, 2023

chipweinberger commented Mar 20, 2023 •

edited

Loading

dsiganos commented Mar 28, 2023

mahavirj commented Sep 11, 2023

chipweinberger commented Nov 28, 2023

kel30a commented Jan 19, 2024

Maldus512 commented Jan 19, 2024

[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647

[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647

Comments

chipweinberger commented Jan 30, 2023

Answers checklist.

IDF version.

Operating System used.

How did you build your project?

If you are using Windows, please specify command line type.

Development Kit.

Power Supply used.

What is the expected behavior?

What is the actual behavior?

Steps to reproduce.

Debug Logs.

More Information.

chipweinberger commented Jan 30, 2023 • edited Loading

chipweinberger commented Jan 30, 2023

ESP-Marius commented Feb 8, 2023

ESP-Marius commented Feb 8, 2023

chipweinberger commented Feb 8, 2023

chipweinberger commented Feb 8, 2023 • edited Loading

mahavirj commented Mar 20, 2023

chipweinberger commented Mar 20, 2023 • edited Loading

mahavirj commented Mar 20, 2023

chipweinberger commented Mar 20, 2023 • edited Loading

dsiganos commented Mar 28, 2023

mahavirj commented Sep 11, 2023

chipweinberger commented Nov 28, 2023

kel30a commented Jan 19, 2024

Maldus512 commented Jan 19, 2024

chipweinberger commented Jan 30, 2023 •

edited

Loading

chipweinberger commented Feb 8, 2023 •

edited

Loading

chipweinberger commented Mar 20, 2023 •

edited

Loading

chipweinberger commented Mar 20, 2023 •

edited

Loading