Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare segmentation fault - node v10.13 - CentOS (ip 0000000000efb532, node[400000+1e8c000]) #24955

Closed
assaf-xm opened this issue Dec 11, 2018 · 5 comments
Labels
memory Issues and PRs related to the memory management or memory footprint.

Comments

@assaf-xm
Copy link

node -v
v10.13.0
Installed using: node-v10.13.0-linux-x64.tar.xz

uname -a
Linux <> 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

--------------- Bug description -------------

After upgrading from node v10.7.0 to node v10.13.0, we started to see crashes in the node process due to segmentation faults.
Crashes happens every one/few days, so it's really hard to reproduce, but it happens.

dmesg logs show some consistency with fixed instruction pointer (ip 0000000000efb532):

[Mon Dec 3 21:10:54 2018] node[25407]: segfault at 3ff0c07598f0 ip 0000000000efb532 sp 00007f1a1a830c40 error 4 in node[400000+1e8c000]
[Tue Dec 4 13:28:47 2018] node[11340]: segfault at 3aea57a0a9c0 ip 0000000000efb532 sp 00007f0ff238dc40 error 4 in node[400000+1e8c000]
[Wed Dec 5 16:13:54 2018] node[13359]: segfault at 329fc81dd2f0 ip 0000000000efb532 sp 00007fc21effcc40 error 4 in node[400000+1e8c000]
[Fri Dec 7 19:36:45 2018] node[29239]: segfault at 13604b37a558 ip 0000000000efb532 sp 00007f57ebffec40 error 4 in node[400000+1e8c000]
[Sat Dec 8 18:53:54 2018] node[30821]: segfault at 204d978e2e50 ip 0000000000efb532 sp 00007f212de0cc40 error 4 in node[400000+1e8c000]
[Sun Dec 9 18:05:08 2018] node[10990]: segfault at 9888ab3a790 ip 0000000000efb532 sp 00007f26261c2c40 error 4 in node[400000+1e8c000]
[Mon Dec 10 20:21:07 2018] node[14981]: segfault at 3cbdf8b02340 ip 0000000000efb532 sp 00007f38a894fc40 error 4 in node[400000+1e8c000]

The failing PID is not the exact PID of the node process, but usually close to it.

Looking at the symbols around 'IP 0000000000efb532':
Points to the symbol - RememberedSetUpdatingItem class, 'process' function:
0000000000efaf50 W v8::internal::RememberedSetUpdatingItemv8::internal::MajorNonAtomicMarkingState::Process()

I wasn't able to capture a coredump yet.

@bnoordhuis
Copy link
Member

That's part of the GC. If the PID isn't that of the process, it means it's happening on a thread that isn't the main thread (because the main thread has PID == TID.)

Unfortunately GC crashes are hard to debug because nine times out of ten the real bug is elsewhere; e.g., memory corruption that doesn't manifest until the GC runs.

But let's try anyway. Does find node_modules/ -name \*.node print anything? Does node --predictable app.js work better?

@bnoordhuis bnoordhuis added memory Issues and PRs related to the memory management or memory footprint. v10.x labels Dec 11, 2018
@assaf-xm
Copy link
Author

Thanks for the quick response!

I'll try running the system with the '--predictable' flag for a while

Regarding the native modules:

find node_modules/ -name *.node
node_modules/uws/uws_linux_51.node
node_modules/uws/uws_darwin_48.node
node_modules/uws/uws_win32_51.node
node_modules/uws/uws_linux_48.node
node_modules/uws/uws_linux_47.node
node_modules/uws/uws_darwin_47.node
node_modules/uws/uws_darwin_46.node
node_modules/uws/uws_darwin_51.node
node_modules/uws/uws_win32_48.node
node_modules/uws/uws_linux_46.node
node_modules/grpc/src/node/extension_binary/node-v64-linux-x64-glibc/grpc_node.node
node_modules/ref/build/Release/obj.target/binding.node
node_modules/ref/build/Release/binding.node
node_modules/modern-syslog/build/Release/obj.target/core.node
node_modules/modern-syslog/build/Release/core.node
node_modules/heapdump/build/Release/obj.target/addon.node
node_modules/heapdump/build/Release/addon.node
node_modules/sleep/build/Release/obj.target/node_sleep.node
node_modules/sleep/build/Release/node_sleep.node
node_modules/ffi/build/Release/obj.target/ffi_bindings.node
node_modules/ffi/build/Release/ffi_bindings.node
node_modules/diskusage/build/Release/obj.target/diskusage.node
node_modules/diskusage/build/Release/diskusage.node

Thanks,
Assaf

@bnoordhuis
Copy link
Member

Right, that's quite a few native modules, anyone which might be the culprit. ref and ffi are the most likely but it could be anyone of them.1 Try excluding them and see if the crashes go away.

1 I'm reasonably sure it's not heapdump (I'm its author) because it doesn't do anything unless activated but still.

@bnoordhuis
Copy link
Member

I'm going to close this out for lack of follow-up. Let me know if you still want to pursue this and I'll reopen.

@assaf-xm
Copy link
Author

Thanks, it's still relevant, but we'll try other nodejs versions (10.14 / 10.15) and reopen in case we'll have the ability to reproduce it more frequently. Currently it's really hard to narrow it down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint.
Projects
None yet
Development

No branches or pull requests

2 participants