Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug SDK 5.2.2 in BLESCAN using NIMBLE causing a panic'ed (LoadProhibited). Exception was unhandled in commiot d1ed3a8c5cdd5b3b128a94bfa04cb1a7752b2bed . [BUG] (IDFGH-13398) #14306

Closed
3 tasks done
filzek opened this issue Aug 5, 2024 · 10 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@filzek
Copy link

filzek commented Aug 5, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Esp32 PSIRAM

Nimble running sometimes crash in the ble_gap_disc call.

commit d1ed3a8 (HEAD -> release/v5.2, origin/release/v5.2)
Merge: f36b6a0 6d19ff6
Author: Rahul Tank rahul.tank@espressif.com
Date: Thu Aug 1 20:26:25 2024 +0800
Merge branch 'bugfix/add_return_value_to_rpa_to_api_v5.2' into 'release/v5.2'
fix(nimble): Add return value to RPA Timeout API (v5.2)
See merge request espressif/esp-idf!32476

The commit is the cause of this problem.

Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.

Core 1 register dump:
PC : 0x4011f6dd PS : 0x00060330 A0 : 0x80122ed9 A1 : 0x3ffcda30
0x4011f6dd: ble_adv_list_refresh at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:1273 (discriminator 3)

A2 : 0x00000001 A3 : 0x00000001 A4 : 0x3ffcda80 A5 : 0x00000000
A6 : 0xfffffffd A7 : 0x3f80014c A8 : 0xfffffffd A9 : 0x3ffcda10
A10 : 0xfffffffd A11 : 0x00000002 A12 : 0x0000000c A13 : 0x0000000b
A14 : 0x00000001 A15 : 0x3ffb65fc SAR : 0x00000014 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000005 LBEG : 0x400914de LEND : 0x400914e9 LCOUNT : 0x00000000
0x400914de: memset at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp-elf/src/newlib/newlib/libc/machine/xtensa/memset.S:150
0x400914e9: memset at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp-elf/src/newlib/newlib/libc/machine/xtensa/memset.S:160

Backtrace: 0x4011f6da:0x3ffcda30 0x40122ed6:0x3ffcda50 0x400e4081:0x3ffcdab0 0x40100b19:0x3ffcdad0
0x4011f6da: ble_adv_list_refresh at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:1273 (discriminator 1)
0x40122ed6: ble_gap_disc at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_gap.c:5284
0x400e4081: ble_scan at D:/Dropbox/Dev/working/main/include/blescan.c:239

Line 239 is:
int rc = ble_gap_disc(BLE_OWN_ADDR_PUBLIC, 30000, &ble_scan_params, ble_gap_event_handler, NULL); //BLE_HS_FOREVER will run forever 30000 is like 30 seconds like bluedroid

sometimes this ran trigger the error above.

So, a problem with the nimble still exist.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Aug 5, 2024
@github-actions github-actions bot changed the title Bug SDK 5.2.2 in BLESCAN using NIMBLE causing a panic'ed (LoadProhibited). Exception was unhandled in commiot d1ed3a8c5cdd5b3b128a94bfa04cb1a7752b2bed . [BUG] Bug SDK 5.2.2 in BLESCAN using NIMBLE causing a panic'ed (LoadProhibited). Exception was unhandled in commiot d1ed3a8c5cdd5b3b128a94bfa04cb1a7752b2bed . [BUG] (IDFGH-13398) Aug 5, 2024
@rahult-github
Copy link
Collaborator

Hi @filzek ,

The commit is the cause of this problem.

Can you possibly revert the commit at your end once to double confirm that indeed this is failing commit ? , since the change in this commit is to add a return value for rpa timeout related API, which should not be getting invoked in the ble scan path.

Also , can you confirm if you have enabled HOST_QUEUE_CONG_CHECK ? Does disabling it help ?

@filzek
Copy link
Author

filzek commented Aug 8, 2024

Hi @rahult-github,

CONFIG_BT_NIMBLE_HOST_QUEUE_CONG_CHECK=y is enabled.

We have made some changes to the scanning interval to allow more time for Wi-Fi operations before the scan starts, reducing the occurrence of concurrency issues. This serves as a workaround to bypass the error.

Tomorrow, we will try disabling this setting and run it simultaneously with other tasks to see if it resolves the error.

@filzek
Copy link
Author

filzek commented Aug 12, 2024

@rahult-github we tried with both options and the same problems occurs.

@rahult-github
Copy link
Collaborator

Hi @filzek ,

we tried with both options and the same problems occurs.

Just to clarify, the "both" options being referred to are:

  1. Revert the commit which you pointed as failure . This means the issue is not related to commit ?
  2. Disable CONFIG_BT_NIMBLE_HOST_QUEUE_CONG_CHECK , and yet issue is seen ?

@filzek
Copy link
Author

filzek commented Aug 12, 2024

we try to update to the latest commig but now the problem is far beyond now the BLE with commit 876eaf8 return an error as ESP_ERR_NO_MEM:
No matter using Internal, External or Default alloc mode in the sdkconfig in BLE section. Same error:

Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.

Core 1 register dump:
PC : 0x40116444 PS : 0x00060430 A0 : 0x80116765 A1 : 0x3ffd2630
0x40116444: ble_npl_os_started at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:143

A2 : 0x00000000 A3 : 0x3ffd2640 A4 : 0x00000010 A5 : 0x3ffc0b5c
A6 : 0x38695a4a A7 : 0x6e683575 A8 : 0x00000000 A9 : 0x3ffd2570
A10 : 0x0000000a A11 : 0x3f400a08 A12 : 0x00000000 A13 : 0x3f400b8b
A14 : 0x00000007 A15 : 0x3ffb6a04 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x400911fc LEND : 0x40091218 LCOUNT : 0x00000000
0x400911fc: memcpy at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:162
0x40091218: memcpy at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:197

Backtrace: 0x40116441:0x3ffd2630 0x40116762:0x3ffd2650 0x40117409:0x3ffd2670 0x400fa588:0x3ffd2690
0x40116441: ble_npl_os_started at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:143
0x40116762: ble_hs_lock at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:218
0x40117409: ble_att_set_preferred_mtu at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_att.c:551

I have opened an issue to this so fast too..

I have revert the commit to git reset --hard 3883a17

About the Questions made:

Revert the commit which you pointed as failure . This means the issue is not related to commit ?
We do not know when this start to happen, so cant findout the commit causing the issue.

Disable CONFIG_BT_NIMBLE_HOST_QUEUE_CONG_CHECK , and yet issue is seen ?
Same problem with it enable or disable.

@filzek
Copy link
Author

filzek commented Aug 12, 2024

static const struct ble_gap_disc_params ble_scan_params = {
.itvl = 0x80, // This is equivalent to Bluedroid's scan_interval (was 50)
.window = 0x40, // Equivalent to Bluedroid's scan_window (was 30)
.filter_policy = BLE_HCI_SCAN_FILT_NO_WL,
.limited = 0,
.passive = 0 // Set to 1 for passive scanning
};

     int rc = ble_gap_disc(BLE_OWN_ADDR_PUBLIC, 30000, &ble_scan_params, ble_gap_event_handler, NULL);  //BLE_HS_FOREVER will run forever  30000 is like 30 seconds like bluedroid

@rahult-github
Copy link
Collaborator

Hi @filzek ,

I think you are posting two different issues in one single thread. The initial crash pointed to the path :

0x4011f6da: ble_adv_list_refresh at C:/Espressif/frameworks/esp-idf-v5.2/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:1273 (discriminator 1)

As seen here , the function is under the feature CONFIG_BT_NIMBLE_HOST_QUEUE_CONG_CHECK . So if we disable this flag, the code will never hit this function , and hence that issue should not be observed.

As for the crash observed on issue #14355 , i have commented that it works ok locally, so may be ensure that the codebase is correct, ( do submodule fetch )

@filzek
Copy link
Author

filzek commented Aug 13, 2024

I will try to fullclean and rebase everything.

@rahult-github
Copy link
Collaborator

Hi @filzek , did you get a chance to try after rebase / update ?

@rahult-github
Copy link
Collaborator

Closing this issue. Feel free to reopen in case of any further updates.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new labels Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

3 participants