Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #10804 by running continuous ADC DMA in endless loop instead of restarting after each run to avoid losing samples (IDFGH-10235) #11500

Closed
wants to merge 2 commits into from

Conversation

Erlkoenig90
Copy link
Contributor

This fixes #10804 by running the ADC DMA in a continuous loop. A function adc_hal_read_desc_finish is introduced to set a DMA descriptor's owner field to 1 such that it can be re-used by DMA.

As discussed in #11411, this requires PR #11499 to be merged too as it relies on the new on_descr_err callback to detect/debug GDMA buffer overruns.

Erlkoenig90 and others added 2 commits May 25, 2023 10:11
…tead of restarting after each run (descriptor chain) to avoid losing samples. Use descriptor error callback for GDMA to check for DMA buffer overrun.
@espressif-bot espressif-bot added the Status: Opened Issue is new label May 25, 2023
@github-actions github-actions bot changed the title Fix #10804 by running continuous ADC DMA in endless loop instead of restarting after each run to avoid losing samples Fix #10804 by running continuous ADC DMA in endless loop instead of restarting after each run to avoid losing samples (IDFGH-10235) May 25, 2023
@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Opened Issue is new Status: Reviewing Issue is being reviewed labels Jun 5, 2023
espressif-bot pushed a commit that referenced this pull request Jun 14, 2023
DMA EOF may happens per multiple dma descriptors, instead of only one.

Closes #11500
espressif-bot pushed a commit that referenced this pull request Jul 22, 2023
DMA EOF may happens per multiple dma descriptors, instead of only one.

Closes #11500
@Erlkoenig90
Copy link
Contributor Author

Erlkoenig90 commented Nov 14, 2023

@Icarus113 I think the changes introduced in 9bec423 don't guarantee consistency of the DMA buffers and contain a race condition, i.e. #10804 would not be fully fixed:

adc_hal_get_reading_result now sets eof_desc->dw0.owner = 1; i.e. before the caller can even read the buffer data. This means that the DMA could overwrite this buffer even before adc_hal_get_reading_result returns.

E.g. when s_adc_dma_intr from adc_continuous calls adc_hal_get_reading_result, the data in finished_buffer might get overwritten by DMA even before it is copied to the ringbuffer by xRingbufferSendFromISR.

I think this can't be properly implemented without providing a function adc_hal_read_desc_finish as proposed which should be called after the user of adc_hal_get_reading_result is done with the data.

Since a descriptor error callback is not used, we don't get notified if DMA attempts to overwrite the buffer before setting owner.

It could be fixed like this;

adc_hal_dma_desc_status_t adc_hal_get_reading_result(adc_hal_dma_ctx_t *hal, const intptr_t eof_desc_addr, uint8_t **buffer, uint32_t *len)
{
   ...

    //Find the eof list start
    eof_desc = eof_desc->next;
    buffer_start = eof_desc->buffer;
    eof_len += eof_desc->dw0.length;
    if ((intptr_t)eof_desc == eof_desc_addr) {
        goto valid;
    }

    //Find the eof list end
    for (int i = 1; i < hal->eof_step; i++) {
        eof_desc = eof_desc->next;
        eof_len += eof_desc->dw0.length;
        if ((intptr_t)eof_desc == eof_desc_addr) {
            goto valid;
        }
    }

valid:
    *buffer = buffer_start;
    *len = eof_len;

    return ADC_HAL_DMA_DESC_VALID;
}

void adc_hal_read_desc_finish(adc_hal_dma_ctx_t *hal, const intptr_t eof_desc_addr) {
    dma_descriptor_t *eof_desc = hal->cur_desc_ptr;

    //Find the eof list start
    eof_desc = eof_desc->next;
    
    eof_desc->dw0.owner = 1;
    //Find the eof list end
    for (int i = 1; ((intptr_t)eof_desc != eof_desc_addr) && i < hal->eof_step; i++) {
        eof_desc = eof_desc->next;
        eof_desc->dw0.owner = 1;
    }
    hal->cur_desc_ptr = eof_desc;
}

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Done Issue is done internally labels Nov 20, 2023
@Icarus113
Copy link
Collaborator

Hi @Erlkoenig90 , yes, this is actually the competition between the A. software (dealing with the ready dma desc) and B. hardware (moving new dma data). If the A is less than B then

  • the setting-owner-to-1 operation is dangerous
  • if owner isn't DMA, the dma desc err error event will be triggered.

We adopted the DMA rx desc err event PR. For ADC side we didn't adopt the error log logic yet. On chips later than S2, the ADC sampling speed is not very fast, so this shouldn't be an issue. May I know if you observe that under certain condition A is less than B?

@Erlkoenig90
Copy link
Contributor Author

May I know if you observe that under certain condition A is less than B?

Hi @Icarus113 , I did not observe this in practice (yet). I just thought this might potentially be an issue from looking at the code and that it might be worth fixing before an issue arises. This also depends on CPU load and interrupt load; an application with high CPU load might experience this problem (when the interrupt is processed right before the DMA attempts to re-use the buffer). This would cause data corruption which would be hard to debug.

@Icarus113
Copy link
Collaborator

Icarus113 commented Nov 21, 2023

The downside for your solution of delaying the set owner to 1 is:

  • it makes the change to a single descriptor scattered, thus further lengthen the A (software dealing speed)

So if it's really in an extreme condition for example

  • A is just a little bit faster than B, this change will make DMA check owner fail, then leading desc err event, and DMA will stops fetching dma descriptors.

Another condition is:

  • A is much slower than B, DMA may overwrite some descriptors whose owner is set to 1, and it will finally find an owner-equals-0 descriptor then lead to DMA stop with err event

So changing the descriptor update timing isn't a good way for such condition.

A better solution is to avoid such extreme condition. There are few solutions when designing the driver:

  • make the conversion frame longer by setting conv_frame_size bigger
  • as you can see the dma descriptor number is fixed in the current driver, there is a plan to make it configurable by users. (We didn't implement this yet)

So by increasing these we can avoid the extreme condition.

On the other hand,

I just thought this might potentially be an issue from looking at the code and that it might be worth fixing before an issue arises

Having a tip for this condition is acceptable. We can try to plan it then and give some hints accordingly.

@Erlkoenig90

@Erlkoenig90
Copy link
Contributor Author

  • A is just a little bit faster than B, this change will make DMA check owner fail, then leading desc err event, and DMA will stops fetching dma descriptors.

  • A is much slower than B, DMA may overwrite some descriptors whose owner is set to 1, and it will finally find an owner-equals-0 descriptor then lead to DMA stop with err event

I think it is better to get a DMA error rather than having the DMA causing data corruption by overwriting a buffer that is still being copied to the ringbuffer ("silent error").

There are few solutions when designing the driver:

Yes, but my suggestion does not attempt to solve the issue, it just allows the application developer to notice the issue (by getting a DMA error) at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Selected for Development Issue is selected for development
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adc_digi_read_bytes() messes the order of channels between read chuncks (IDFGH-9437)
3 participants