Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] malloc error when using likwid-pin on ARM Jetson AGX Xavier (ARMv8) #488

Open
SF-N opened this issue Oct 20, 2022 · 11 comments
Open

Comments

@SF-N
Copy link

SF-N commented Oct 20, 2022

With likwid-pin -- Version 5.2.2 (commit: 233ab943543480cd46058b34616c174198ba0459) I get the following error on an ARMv8 processor (using Linux) just in the beginning before the program starts:
e.g. when calling likwid-pin -c S0:0-3 ./executable

malloc(): invalid size (unsorted)
Aborted (core dumped)
@SF-N SF-N changed the title malloc error when using likwid-pin on ARM Jetson AGX Xavier (ARMv8) [BUG] malloc error when using likwid-pin on ARM Jetson AGX Xavier (ARMv8) Oct 20, 2022
@TomTheBear
Copy link
Member

It is hart to tell where the problem is exactly. It might be the build options for the Lua interpreter or inside the LIKWID library. In order to find the locations:

  • Rebuild LIKWID with DEBUG=true in config.mk (do make distclean)
  • Install it
  • Run gdb likwid-lua
  • Run your command inside gdb: r likwid-pin -c S0:0-3 ./executable

As soon as it stops, type bt for backtrace and supply the output.

@SF-N
Copy link
Author

SF-N commented Oct 21, 2022

Thanks, when trying this inside the likwid-5.2.2 folder, I get:

icarus@ubuntu:~/likwid-5.2.2$ gdb likwid-lua
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from likwid-lua...
(gdb) r likwid-pin -c S0:0-3 ./linpackc 
Starting program: /usr/local/bin/likwid-lua likwid-pin -c S0:0-3 ./linpackc
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
malloc(): invalid size (unsorted)

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000fffff7e10aac in __GI_abort () at abort.c:79
#2  0x0000fffff7e5df40 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0xfffff7f1f518 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x0000fffff7e65344 in malloc_printerr (str=str@entry=0xfffff7f1b4b0 "malloc(): invalid size (unsorted)") at malloc.c:5347
#4  0x0000fffff7e67edc in _int_malloc (av=av@entry=0xfffff7f5ea98 <main_arena>, bytes=bytes@entry=32) at malloc.c:3736
#5  0x0000fffff7e694ac in __GI___libc_malloc (bytes=32) at malloc.c:3058
#6  0x0000fffff6d5aa88 in create_lookups () at ./src/affinity.c:201
#7  0x0000fffff6d5c604 in affinity_init () at ./src/affinity.c:645
#8  0x0000fffff6d46d90 in lua_likwid_getNumaInfo (L=0xaaaaaaab52a8) at ./src/luawid.c:1172
#9  0x0000fffff7f8fb9c in luaD_precall (L=L@entry=0xaaaaaaab52a8, func=func@entry=0xaaaaaaabc7a0, nresults=nresults@entry=1) at ./src/ldo.c:360
#10 0x0000fffff7fa3968 in luaV_execute (L=L@entry=0xaaaaaaab52a8) at ./src/lvm.c:1115
#11 0x0000fffff7f8ffbc in luaD_call (L=L@entry=0xaaaaaaab52a8, func=<optimized out>, nResults=<optimized out>) at ./src/ldo.c:491
#12 0x0000fffff7f90000 in luaD_callnoyield (L=0xaaaaaaab52a8, func=<optimized out>, nResults=<optimized out>) at ./src/ldo.c:501
#13 0x0000fffff7f8f37c in luaD_rawrunprotected (L=L@entry=0xaaaaaaab52a8, f=f@entry=0xfffff7fa82b8 <f_call>, ud=ud@entry=0xffffffffeb18) at ./src/ldo.c:142
#14 0x0000fffff7f90298 in luaD_pcall (L=L@entry=0xaaaaaaab52a8, func=func@entry=0xfffff7fa82b8 <f_call>, u=u@entry=0xffffffffeb18, old_top=80, ef=<optimized out>) at ./src/ldo.c:722
#15 0x0000fffff7fa9c34 in lua_pcallk (L=0xaaaaaaab52a8, nargs=<optimized out>, nresults=-1, errfunc=<optimized out>, ctx=<optimized out>, k=<optimized out>) at ./src/lapi.c:968
#16 0x0000aaaaaaaa1ad4 in docall (L=0xaaaaaaab52a8, narg=3, nres=-1) at ./src/lua.c:203
#17 0x0000aaaaaaaa2810 in handle_script (argv=<optimized out>, L=0xaaaaaaab52a8) at ./src/lua.c:443
#18 pmain (L=0xaaaaaaab52a8) at ./src/lua.c:577
#19 0x0000fffff7f8fb9c in luaD_precall (L=L@entry=0xaaaaaaab52a8, func=0xaaaaaaab58d0, nresults=1) at ./src/ldo.c:360
#20 0x0000fffff7f8ff80 in luaD_call (L=L@entry=0xaaaaaaab52a8, func=<optimized out>, nResults=<optimized out>) at ./src/ldo.c:490
#21 0x0000fffff7f90000 in luaD_callnoyield (L=0xaaaaaaab52a8, func=<optimized out>, nResults=<optimized out>) at ./src/ldo.c:501
#22 0x0000fffff7f8f37c in luaD_rawrunprotected (L=L@entry=0xaaaaaaab52a8, f=f@entry=0xfffff7fa82b8 <f_call>, ud=ud@entry=0xffffffffee68) at ./src/ldo.c:142
#23 0x0000fffff7f90298 in luaD_pcall (L=L@entry=0xaaaaaaab52a8, func=func@entry=0xfffff7fa82b8 <f_call>, u=u@entry=0xffffffffee68, old_top=16, ef=<optimized out>) at ./src/ldo.c:722
#24 0x0000fffff7fa9c34 in lua_pcallk (L=0xaaaaaaab52a8, nargs=<optimized out>, nresults=1, errfunc=<optimized out>, ctx=<optimized out>, k=<optimized out>) at ./src/lapi.c:968
#25 0x0000aaaaaaaa1878 in main (argc=5, argv=0xfffffffff008) at ./src/lua.c:603
(gdb) 

Does this help?

@TomTheBear
Copy link
Member

Yes, it helps, thank you.

But it is still not easy to find the problem. The only reason I could think of would be if the detection "how many hardware threads the system" has failed. This is of course a fundamental information and I'm surprised it reaches that point without that info.

Can you please send me the content of /proc/cpuinfo. It might be a bug earlier in the execution in the cpuinfo parser on ARM.

@SF-N
Copy link
Author

SF-N commented Oct 21, 2022

icarus@ubuntu:/$ cat /proc/cpuinfo 
processor	: 0
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 1
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 2
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 3
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 4
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 5
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 6
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

processor	: 7
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 62.50
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop
CPU implementer	: 0x4e
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x004
CPU revision	: 0
MTS version	: 55637613

@TomTheBear
Copy link
Member

I tested the current parser and found a bug but it is not relevant for your issue.

Can you please do the gdb thing again and when it fails do a
print(cputopo->numHWThreads)

And provide the content of the files /sys/devices/system/cpu/present and /sys/devices/system/cpu/online.

As a last resort, comment out the line DEFINES += -DLIKWID_USE_HWLOC in make/config_defines.mk and rebuild (make distclean && make). This will use a different parser.

@SF-N
Copy link
Author

SF-N commented Oct 21, 2022

I get:

(gdb) print(cputopo->numHWThreads)
$1 = 8

And

cat /sys/devices/system/cpu/present
0-7
cat /sys/devices/system/cpu/online
0-7

When commenting out line 360 #DEFINES += -DLIKWID_USE_HWLOC, I get the following error while make:

===>  COMPILE  GCCARMv8/topology_hwloc.o
./src/topology_hwloc.c:50:1: error: unknown type name ‘hwloc_topology_t’
   50 | hwloc_topology_t hwloc_topology = NULL;
      | ^~~~~~~~~~~~~~~~
./src/topology_hwloc.c:50:35: warning: initialization of ‘int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
   50 | hwloc_topology_t hwloc_topology = NULL;
      |                                   ^~~~
make: *** [Makefile:302: GCCARMv8/topology_hwloc.o] Error 1

@TomTheBear
Copy link
Member

OK, very surprising. The failing line contains only cputopo->numHWThreads that could lead to an 'invalid size'. And there are two other malloc calls with the same inputs in the lines before which seem to work. I have to think about other reasons.

It seems I have broken the "disabling of hwloc" somewhen in the past. Hwloc works on almost all systems that's why the disabling is tested rarely/almost never.

@TomTheBear
Copy link
Member

Can you please run likwid-topology -V 3 and send the output. I assume there is some failure before the actual segfault like "no NUMA domains". I have seen that in the past on exotic hardware.

@SF-N
Copy link
Author

SF-N commented Nov 3, 2022

Here it is:

likwid-topology -V 3
DEBUG - [proc_init_cpuInfo:336] PROC CpuInfo Family 8 Model 0 Stepping 0 isIntel 0 numHWThreads 8
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 2 Thread 0 Core 0 Die 0 Socket 1 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 3 Thread 0 Core 1 Die 0 Socket 1 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 4 Thread 0 Core 0 Die 0 Socket 2 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 5 Thread 0 Core 1 Die 0 Socket 2 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 6 Thread 0 Core 0 Die 0 Socket 3 inCpuSet 1
DEBUG - [proc_init_nodeTopology:712] PROC Thread Pool PU 7 Thread 0 Core 1 Die 0 Socket 3 inCpuSet 1
DEBUG - [affinity_init:539] Affinity: Socket domains 4
DEBUG - [affinity_init:541] Affinity: CPU die domains 4
DEBUG - [affinity_init:546] Affinity: CPU cores per LLC 8
DEBUG - [affinity_init:549] Affinity: Cache domains 0
DEBUG - [affinity_init:553] Affinity: NUMA domains 1
DEBUG - [affinity_init:554] Affinity: All domains 10
DEBUG - [affinity_addNodeDomain:370] Affinity domain N: 8 HW threads on 8 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S0: 2 HW threads on 2 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S1: 2 HW threads on 2 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S2: 2 HW threads on 2 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S3: 2 HW threads on 2 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D0: 2 HW threads on 2 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D1: 2 HW threads on 2 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D2: 2 HW threads on 2 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D3: 2 HW threads on 2 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 2 HW threads on 2 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 2 HW threads on 2 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 2 HW threads on 2 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 2 HW threads on 2 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M0: 8 HW threads on 2 cores
DEBUG - [create_lookups:290] T 0 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 1 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 2 T2C 0 T2S 1 T2D 1 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 3 T2C 1 T2S 1 T2D 1 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 4 T2C 0 T2S 2 T2D 2 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 5 T2C 1 T2S 2 T2D 2 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 6 T2C 0 T2S 3 T2D 3 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 7 T2C 1 T2S 3 T2D 3 T2LLC 0 T2M 0
--------------------------------------------------------------------------------
CPU name:	
CPU type:	nil
CPU stepping:	0
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		4
Cores per socket:	2
Threads per core:	1
--------------------------------------------------------------------------------
HWThread        Thread        Core        Die        Socket        Available
0               0             0           0          0             *                
1               0             1           0          0             *                
2               0             0           0          1             *                
3               0             1           0          1             *                
4               0             0           0          2             *                
5               0             1           0          2             *                
6               0             0           0          3             *                
7               0             1           0          3             *                
--------------------------------------------------------------------------------
Socket 0:		( 0 1 )
Socket 1:		( 2 3 )
Socket 2:		( 4 5 )
Socket 3:		( 6 7 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			64 kB
Cache groups:		( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 )
--------------------------------------------------------------------------------
Level:			2
Size:			2 MB
Cache groups:		( 0 1 ) ( 2 3 ) ( 4 5 ) ( 6 7 )
--------------------------------------------------------------------------------
Level:			3
Size:			4 MB
Cache groups:		( 0 1 2 3 4 5 6 7 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		1
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 1 2 3 4 5 6 7 )
Distances:		10
Free memory:		7138.31 MB
Total memory:		14898.7 MB
--------------------------------------------------------------------------------

@TomTheBear
Copy link
Member

As I thought:

DEBUG - [affinity_init:549] Affinity: Cache domains 0

but it should be 1 since you seem to have a single L3 cache. The architecture is quite strange (when looking at the output): There are four sockets, each with 2 cores but all four sockets share a single L3 cache. It might be but it is against the current logic in LIKWID that each socket has its own L3 cache.

I'll check the cache domain detection

@TomTheBear
Copy link
Member

So, it seems the 4 sockets cause the problem. The cache domain detection divides "Total number of caches" by "Socket count" and casts it to Integer. This results in 0 cache domains per socket.

Hard to fix without access for testing but I'll try to create a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants