Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_eth_transmit tries to transmit packets even if the link is down. (IDFGH-9490) #10851

Closed
3 tasks done
feelfreelinux opened this issue Feb 27, 2023 · 17 comments
Closed
3 tasks done
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@feelfreelinux
Copy link

feelfreelinux commented Feb 27, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

ESP-IDF v5.0.1-3-g886e98a2c1

Operating System used.

macOS

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

What is the expected behavior?

Assuming the following state

  • Ethernet driver is properly initialized
  • Ethernet netif is attached to TCP/IP stack using esp_netif_attach
  • Link is not up, cable disconnected

If any traffic occurs on the network stack, the function esp_eth_transmit should prevent it's transmission, due to the link not being up.

The expected behaviour is implemented in current IDF 5.1 master, commit 256d4579695a67d5fabaa3b678da46c59757790e.

if (atomic_load(&eth_driver->link) != ETH_LINK_UP) {

What is the actual behavior?

In ESP-IDF v5.0.1-3-g886e98a2c1, function esp_eth_transmit only checks if the interface is started, hence allowing for transmission on an interface that is not capable to do so. This causes undefined behaviour of esp.emac: emac_esp32_transmit(232): insufficient TX buffer size further down.

https://github.com/espressif/esp-idf/blob/release/v5.0/components/esp_eth/src/esp_eth.c#L339

Issue in ESP-IDF v5.0.1 can be resolved by applying the same logic as IDF 5.1 does.

Steps to reproduce.

Explained in expected behaviour section

Build or installation Logs.

No response

More Information.

No response

@feelfreelinux feelfreelinux added the Type: Bug bugs in IDF label Feb 27, 2023
@github-actions github-actions bot changed the title esp_eth_transmit tries to transmit packets even if the link is down. esp_eth_transmit tries to transmit packets even if the link is down. (IDFGH-9490) Feb 27, 2023
@espressif-bot espressif-bot added the Status: Selected for Development Issue is selected for development label Mar 3, 2023
@espressif-bot espressif-bot added Status: In Progress Work is in progress Status: Reviewing Issue is being reviewed and removed Status: Selected for Development Issue is selected for development Status: In Progress Work is in progress labels Apr 5, 2023
@kostaond
Copy link
Collaborator

kostaond commented May 9, 2023

@AxelLin thanks for the reminder. I am aware of this issue. The update is ready for some time but non-related CI constantly failed so it didn't make it to the latest release :/

@AxelLin
Copy link
Contributor

AxelLin commented May 21, 2023

@kostaond
Could you confirm if v4.x branches have this issue or the is a regression in v5.0+ only?

@kostaond
Copy link
Collaborator

@AxelLin esp_eth_transmit is allowed to transmit frames even if the link is down in v4.4. However, insufficient TX buffer size is not observed in v4.4. The fundamental root cause of insufficient TX buffer size in v5.0 is the GARP is periodically issued even when link is down. This issue is not present in v4.4. Preventing transmission of frames by Ethernet driver level is extra check which could be considered as extra feature and therefore not needed to be backported to v4.4.

@rretanubun
Copy link
Contributor

@kostaond : Sharing a data point on 4.4 in case it helps

My codebase uses esp-idf at https://github.com/espressif/esp-idf/commit/6a7d83af1984b93ebaefa7ca9be36304806c3dc8/

I am seeing the issue on 4.4 and added an assert(0) to get a backtrace:

static esp_err_t emac_esp32_transmit(esp_eth_mac_t *mac, uint8_t *buf, uint32_t length)
{
    esp_err_t ret = ESP_OK;
    emac_esp32_t *emac = __containerof(mac, emac_esp32_t, parent);
    uint32_t sent_len = emac_hal_transmit_frame(&emac->hal, buf, length);
    ESP_GOTO_ON_FALSE(sent_len == length, ESP_ERR_INVALID_SIZE, err, TAG, "insufficient TX buffer size");
    return ESP_OK;

err:
    assert(0); // HACK: Let's assert() here so we reboot (and get a nice call-stack?)
    return ret;
}

Call stack on assert:

0x4008269d: panic_abort at /esp-idf/components/esp_system/panic.c:402
0x4009a372: abort at /esp-idf/components/newlib/abort.c:46
0x40226f30: __assert_func at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c:62 (discriminator 8)
0x401ef70c: emac_esp32_transmit at /esp-idf/components/esp_eth/src/esp_eth_mac_esp.c:238 (discriminator 5)
0x401d46f5: esp_eth_transmit at /esp-idf/components/esp_eth/src/esp_eth.c:329 (discriminator 2)
0x4024978b: esp_netif_transmit at /esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:884
0x401e8ea9: ethernet_low_level_output at /esp-idf/components/lwip/port/esp32/netif/ethernetif.c:127
0x40141395: ethernet_output at /esp-idf/components/lwip/lwip/src/netif/ethernet.c:311 (discriminator 2)
0x401e99d1: ethip6_output at /esp-idf/components/lwip/lwip/src/core/ipv6/ethip6.c:122
0x4013cc69: ip6_output_if_src at /esp-idf/components/lwip/lwip/src/core/ipv6/ip6.c:1354
0x4013cce2: ip6_output_if at /esp-idf/components/lwip/lwip/src/core/ipv6/ip6.c:1249
0x4013d8ec: mld6_send at /esp-idf/components/lwip/lwip/src/core/ipv6/mld6.c:659 (discriminator 12)
0x4013de87: mld6_tmr at /esp-idf/components/lwip/lwip/src/core/ipv6/mld6.c:529
0x4013ded7: mld6_timeout_cb at /esp-idf/components/lwip/lwip/src/core/ipv6/mld6.c:501
0x401365ad: sys_check_timeouts at /esp-idf/components/lwip/lwip/src/core/timeouts.c:400
0x4012db56: tcpip_timeouts_mbox_fetch at /esp-idf/components/lwip/lwip/src/api/tcpip.c:118
0x4012dc17: tcpip_thread at /esp-idf/components/lwip/lwip/src/api/tcpip.c:151
0x40097a16: vPortTaskWrapper at /esp-idf/env/esp-idf/components/freertos/port/xtensa/port.c:130

In this setup we use LWIP with IPv6 and the trigger is frequent link-up <-> link-down event on the ethernet port.
Originally I suspected CONFIG_LWIP_TIMERS_ONDEMAND until I found this discussion.

@rretanubun
Copy link
Contributor

rretanubun commented May 26, 2023

@AxelLin : Thanks for the fix. Do you know if your fix can be applied on 4.4 as-is? (it seems to compile okay for me) or is there other patches needed to make sure eth_driver->link is set to ETH_LINK_UP that only exist after 4.4?

Also, did you come across a setup that can induce this issue easily? The observed reproduction rate is admittedly very low on 4.4 (sometimes > 10 link-down/link-up events before I maybe see one Insufficient Tx Buffer event).

@kostaond
Copy link
Collaborator

@rretanubun thanks for sharing your observation. The fix will be backported to v4.4 too.

The observed reproduction rate is admittedly very low on 4.4 (sometimes > 10 link-down/link-up events before I maybe see one Insufficient Tx Buffer event).

That sounds possible since there is 10 Tx buffers by default.

@rretanubun
Copy link
Contributor

@kostaond and @AxelLin : Thanks for the responses (apologies for misunderstanding on the author), also thanks for the backport confirmation.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 5, 2023

Just note that there is a follow up fix 5ebe6b5 needs backport as well.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Reviewing Issue is being reviewed labels Jun 5, 2023
@kostaond
Copy link
Collaborator

kostaond commented Jun 5, 2023

Fixed by 0872feb in v5.0.

@kostaond kostaond closed this as completed Jun 5, 2023
@AxelLin
Copy link
Contributor

AxelLin commented Jun 19, 2023

@rretanubun thanks for sharing your observation. The fix will be backported to v4.4 too.

@kostaond Just remind that the fix is not yet available in v4.4 branch.

@kostaond
Copy link
Collaborator

@AxelLin it's been fixed by a0c87d6 in v4.4.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 19, 2023

@AxelLin it's been fixed by a0c87d6 in v4.4.

Thanks for the update. So the fix was there even before you mentioned will backport to v4.4 3 weeks ago.

@kostaond
Copy link
Collaborator

The MR with this commit was stuck in review for some time and I thought I did not include the "link up" fix when I mentioned it here.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 19, 2023

@kostaond : Sharing a data point on 4.4 in case it helps

@rretanubun I think you should upgrade to v4.4.5.

@kostaond
Copy link
Collaborator

@rretanubun I think you should upgrade to v4.4.5.

The fix is present in v4.4. release branch. However, unfortunately, it didn't get to v4.4.5 release :/

@rretanubun
Copy link
Contributor

Thank you both for the updates, I'll open an internal tracker ticket on our in-house project to monitor this thread.

@phatpaul
Copy link
Contributor

phatpaul commented Sep 8, 2023

I think I am having this issue on 4.3.5.
It happens when I remove and re-plug in the Ethernet. Ethernet is no longer working now because it doesn't get an IP. Does this look like the same issue?

I (31378) ethernet_init.c: Ethernet Link Down
...
I (67388) ethernet_init.c: Ethernet Link Up
I (67388) ethernet_init.c: Ethernet HW Addr 84:cc:a8:0d:5c:d3
E (74208) emac_esp32: emac_esp32_transmit(262): insufficient TX buffer size
E (74298) emac_esp32: emac_esp32_transmit(262): insufficient TX buffer size
E (74388) emac_esp32: emac_esp32_transmit(262): insufficient TX buffer size
E (74488) emac_esp32: emac_esp32_transmit(262): insufficient TX buffer size
E (74588) emac_esp32: emac_esp32_transmit(262): insufficient TX buffer size

EDIT - I upgraded to 4.3.6 and it seems to be fixed. I didn't see any mention of it in the changelog for that release, so maybe it was another issue that's now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

6 participants