Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build OpenBLAS 0.3.6 for iOS #2275

Closed
L1onKing opened this issue Sep 30, 2019 · 71 comments · Fixed by #2277
Closed

Build OpenBLAS 0.3.6 for iOS #2275

L1onKing opened this issue Sep 30, 2019 · 71 comments · Fixed by #2277

Comments

@L1onKing
Copy link

Hello! I'm trying to build Open BLAS 0.3.5 version for iOS. Here's my shell script I'm using for building:

TOOLCHAIN_PATH=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
SYSROOT_PATH=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk
make TARGET=ARMV8 BINARY=64 HOSTCC=clang CC="$TOOLCHAIN_PATH/clang -isysroot $SYSROOT_PATH -arch arm64 -miphoneos-version-min=10.0 -O2" NOFORTRAN=1 libs

When I execute it, I'm happen to have this error:

clang: error: unknown argument: '-ru'
clang: error: no such file or directory: '../libopenblas_armv8p-r0.3.6.a'
clang: warning: no such sysroot directory: '/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/-ar' [-Wmissing-sysroot]

How can this be fixed? Thanks!

@martin-frbg
Copy link
Collaborator

Try adding "AR=ar" to your make command line (the "-ru" is an option for the ar program which creates the .a file). Not sure why this goes wrong though...

@L1onKing
Copy link
Author

Sorry, my bad. I found my mistake. I forgot to add to c_check this line:

$cross_suffix = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/";

But new problem arised. Now I have those errors:

<instantiation>:7:2: note: while in macro instantiation
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=10.0 -O2 -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_sgemm_oncopy -DASMFNAME=_sgemm_oncopy_ -DNAME=sgemm_oncopy_ -DCNAME=sgemm_oncopy -DCHAR_NAME=\"sgemm_oncopy_\" -DCHAR_CNAME=\"sgemm_oncopy\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -c -UDOUBLE -UCOMPLEX ../kernel/arm64/../generic/gemm_ncopy_4.c -o sgemm_oncopy.o
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:14:22: error: unknown token in expression
KERNEL_F1_SCALE_GE_X_\@:
                     ^
<instantiation>:7:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:14:22: error: invalid operand
KERNEL_F1_SCALE_GE_X_\@:
                     ^
<instantiation>:7:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
../kernel/arm64/nrm2.S:87:16: error: unknown token in expression
KERNEL_F1_NEXT_\@:
               ^
<instantiation>:7:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
../kernel/arm64/nrm2.S:87:16: error: invalid operand
KERNEL_F1_NEXT_\@:
               ^
<instantiation>:7:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:4:21: error: unexpected token in argument list
 beq KERNEL_F1_NEXT_\@
                    ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:7:27: error: unexpected token in argument list
 bge KERNEL_F1_SCALE_GE_X_\@
                          ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:13:19: error: unexpected token in argument list
 b KERNEL_F1_NEXT_\@
                  ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:14:22: error: unknown token in expression
KERNEL_F1_SCALE_GE_X_\@:
                     ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:14:22: error: invalid operand
KERNEL_F1_SCALE_GE_X_\@:
                     ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
../kernel/arm64/nrm2.S:87:16: error: unknown token in expression
KERNEL_F1_NEXT_\@:
               ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
../kernel/arm64/nrm2.S:87:16: error: invalid operand
KERNEL_F1_NEXT_\@:
               ^
<instantiation>:8:2: note: while in macro instantiation
 KERNEL_F1
 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:146:2: note: while in macro instantiation
 KERNEL_F8
 ^
<instantiation>:4:21: error: unexpected token in argument list
 beq KERNEL_F1_NEXT_\@
                    ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
<instantiation>:7:27: error: unexpected token in argument list
 bge KERNEL_F1_SCALE_GE_X_\@
                          ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
<instantiation>:13:19: error: unexpected token in argument list
 b KERNEL_F1_NEXT_\@
                  ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
<instantiation>:14:22: error: unknown token in expression
KERNEL_F1_SCALE_GE_X_\@:
                     ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
<instantiation>:14:22: error: invalid operand
KERNEL_F1_SCALE_GE_X_\@:
                     ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
../kernel/arm64/nrm2.S:87:16: error: unknown token in expression
KERNEL_F1_NEXT_\@:
               ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
../kernel/arm64/nrm2.S:87:16: error: invalid operand
KERNEL_F1_NEXT_\@:
               ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-bd3346.s:159:2: note: while in macro instantiation
 KERNEL_F1

That's a part of it, but it's basically all the same. I think I saw a fix for that somewhere in Issues list

@martin-frbg
Copy link
Collaborator

Pretty sure it was in the closed ticket #1531 you commented on earlier...though I think I had put in a fix for this bac then (maybe it was affected by later changes)

@L1onKing
Copy link
Author

#569 I found a fix for that here. But now when I try to build my XCode project with new library, I have those errors:

ld: warning: directory not found for option '-L/Users/user/Documents/Work/AlgofaceWork/algoface-ios-209-landmarks-tracker/Algoface-Landmarks-Tracker/tracker/3rdparty/eigen/OpenBLAS_iOS_0.2.21'

Undefined symbols for architecture arm64:
  "_sdot_k", referenced from:
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      _strmv_TUU in libopenblas.a(strmv_TUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TUN.o)
      ...
  "_scopy_k", referenced from:
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      ...
  "_strmm_kernel_RN", referenced from:
      _strmm_RNUN in libopenblas.a(strmm_RNUN.o)
      _strmm_RNUU in libopenblas.a(strmm_RNUU.o)
      _strmm_RTLN in libopenblas.a(strmm_RTLN.o)
      _strmm_RTLU in libopenblas.a(strmm_RTLU.o)
  "_strmm_kernel_LN", referenced from:
      _strmm_LNUN in libopenblas.a(strmm_LNUN.o)
      _strmm_LNUU in libopenblas.a(strmm_LNUU.o)
      _strmm_LTLN in libopenblas.a(strmm_LTLN.o)
      _strmm_LTLU in libopenblas.a(strmm_LTLU.o)
  "_strmm_kernel_LT", referenced from:
      _strmm_LNLN in libopenblas.a(strmm_LNLN.o)
      _strmm_LNLU in libopenblas.a(strmm_LNLU.o)
      _strmm_LTUN in libopenblas.a(strmm_LTUN.o)
      _strmm_LTUU in libopenblas.a(strmm_LTUU.o)
  "_sscal_k", referenced from:
      _sgemv_ in libopenblas.a(sgemv.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NUN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLU.o)
      ...
  "_sgemm_kernel", referenced from:
      _sgemm_nn in libopenblas.a(sgemm_nn.o)
      _sgemm_nt in libopenblas.a(sgemm_nt.o)
      _inner_thread in libopenblas.a(sgemm_thread_nn.o)
      _inner_thread in libopenblas.a(sgemm_thread_nt.o)
      _inner_thread in libopenblas.a(sgemm_thread_tn.o)
      _inner_thread in libopenblas.a(sgemm_thread_tt.o)
      _sgemm_tn in libopenblas.a(sgemm_tn.o)
      ...
  "_strmm_kernel_RT", referenced from:
      _strmm_RNLN in libopenblas.a(strmm_RNLN.o)
      _strmm_RNLU in libopenblas.a(strmm_RNLU.o)
      _strmm_RTUN in libopenblas.a(strmm_RTUN.o)
      _strmm_RTUU in libopenblas.a(strmm_RTUU.o)
  "_sgemv_n", referenced from:
      l___const.sgemv_.gemv in libopenblas.a(sgemv.o)
      _gemv_kernel in libopenblas.a(sgemv_thread_n.o)
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      ...
  "_sgemv_t", referenced from:
      l___const.sgemv_.gemv in libopenblas.a(sgemv.o)
      _gemv_kernel in libopenblas.a(sgemv_thread_t.o)
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      _strmv_TUU in libopenblas.a(strmv_TUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      ...
     (maybe you meant: _sgemv_thread_n, _sgemv_thread_t )
  "_saxpy_k", referenced from:
      _saxpy_ in libopenblas.a(saxpy.o)
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _strmv_thread_NLN in libopenblas.a(strmv_thread_NLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      ...
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Can that be an issue that XCode can't compile assembly files?

I think it's pretty significant moment, because I forced XCode to build C versions of functions previously and the library is not fast as I think it should be

@martin-frbg
Copy link
Collaborator

Hard to tell from the partial log (there should have been earlier errors from trying to compile the respective OpenBLAS sources), but the previous reports (e.g. #1531) made it look like xcode only has problems with some of the assembly files. On the other hand, some enhancements have been made since #1531 was last updated (in particular, #1821 brought in more assembly kernels including one for SAXPY) so the situation may have become worse on IOS since it was last tried.

@L1onKing
Copy link
Author

@martin-frbg are there any suggestions of what should I do? Maybe I can download a clang compiler and compile assembly files with those?

I understand that the best solution would be to optimize assembly files just so XCode compiler could compile them, but unfortunately I don't have such experience.

@L1onKing
Copy link
Author

I thought that I might try older version of XCode. But I highly doubt the older version of compiler will do a better job

@martin-frbg
Copy link
Collaborator

No suggestion at the moment unfortunately - I am still trying to set up an equivalent cross-build on the Travis service for experimenting. As far as I know, the standard compiler in xcode is clang, so if anything one would have to try building with gcc (and the assembler from gnu binutils).
Note the whole issue could be something silly - actually i do not know if there is any special meaning to the use of the @ sign as the last character in a label (which seems to be what xcode hates most about our code) or if this was just one contributor's coding style.

@L1onKing
Copy link
Author

As far as I'm aware, that XCode's clang is somehow optimized/changed. I saw few articles where developers prefer to use manually downloaded clang, like here for instance: https://embeddedartistry.com/blog/2017/2/20/installing-clangllvm-on-osx

So I will try to intall clang via Homebrew and compile with it. Let's see what happens

@martin-frbg
Copy link
Collaborator

Probably easier to just strip all the "@" from the offending .S files and see what blows up.

@L1onKing
Copy link
Author

@martin-frbg I noticed that all @ files are used along with \ dash. Does it mean I should strip the dash as well?

Sorry, I know almost nothing about Assembler :(

@L1onKing
Copy link
Author

I removed \@ stuff and now I have such errors:

<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
/usr/local/opt/llvm/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.5\" -march=armv8-a -DASMNAME=_ssymv_L -DASMFNAME=_ssymv_L_ -DNAME=ssymv_L_ -DCNAME=ssymv_L -DCHAR_NAME=\"ssymv_L_\" -DCHAR_CNAME=\"ssymv_L\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX -UDOUBLE -DLOWER ../kernel/arm64/../generic/symv_k.c -o ssymv_L.o
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: /usr/local/opt/llvm/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.5\" -march=armv8-a -DASMNAME=_sger_k -DASMFNAME=_sger_k_ -DNAME=sger_k_ -DCNAME=sger_k -DCHAR_NAME=\"sger_k_\" -DCHAR_CNAME=\"sger_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UDOUBLE ../kernel/arm64/../generic/ger.c -o sger_k.o
error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
<instantiation>:14:1: error: invalid symbol redefinition
KERNEL_F1_SCALE_GE_X_:
^
../kernel/arm64/nrm2.S:87:1: error: invalid symbol redefinition
KERNEL_F1_NEXT_:
^
/usr/local/opt/llvm/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.5\" -march=armv8-a -DASMNAME=_sgemm_kernel -DASMFNAME=_sgemm_kernel_ -DNAME=sgemm_kernel_ -DCNAME=sgemm_kernel -DCHAR_NAME=\"sgemm_kernel_\" -DCHAR_CNAME=\"sgemm_kernel\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -c -UDOUBLE -UCOMPLEX ../kernel/arm64/sgemm_kernel_16x4.S -o sgemm_kernel.o
make[1]: *** [snrm2_k.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [libs] Error 1
MacBook-Pro-Mojo:OpenBLAS-release_0.3.5 user$ 

Do you have any comments on how it can be fixed in Assembly?

@martin-frbg
Copy link
Collaborator

Sorry, probably that was just a silly suggestion. Not clear to me where the "redefinition" happens (unless the labels are local to the file when the @ is present, and become global ones when not).

@martin-frbg
Copy link
Collaborator

Found a possible explanation (though for an entirely different compiler/assembler):

If a macro containing a label is used more than once, unique label names need to be generated to avoid multiple definition errors. A backslash followed by an at sign (@) appearing in a label within a macro expansion is replaced with a macro expansion serial number.

if this is widely used, wonder why the xcode assembler does not understand it (and what its Apple equivalent would be)

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Just tried to build with gcc instead of clang. And the same mistake :

../kernel/arm64/nrm2.S:87:16: error: unknown token in expression
KERNEL_F1_NEXT_\@:
               ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-e25b4c.s:159:2: note: while in macro instantiation
 KERNEL_F1
 ^
../kernel/arm64/nrm2.S:87:16: error: invalid operand
KERNEL_F1_NEXT_\@:

A lot of these errors. I won't provide the full text, it was in issues list too many times now. I'm trying to figure out what this @ do in Assembly, but no luck so far.

@martin-frbg
Copy link
Collaborator

As I understand it now, the problem is not exactly the compiler, but the assembler that xcode uses. (I think clang also has an option -integrated-as to use assembler functionality built into it). And from what I found, the "@" is a placeholder that gets replaced by a number whenever the macro is instantiated in the code, so when KERNEL_F1 is used e.g. in KERNEL_F8, the first gets its internal
label expanded to KERNEL_F1_NEXT_1, the second invocation gets KERNEL_F1_NEXT_2 and so on, so that all actual label names in the preprocessed code stay unique.
(So a tedious workaround could be to replace all KERNEL_F1 calls with the complete macro text
and do the substitution in the labels manually - this should be functionally equivalent but will make
the code much harder to read for a human.)

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg is there an article where I could read about this '@' placeholder? There must be an alternative for that which XCode will read and understand

@brada4
Copy link
Contributor

brada4 commented Oct 1, 2019

Please try adding this parameter to the long CC patamter : -fno-integrated-as
Or its opposite, to force clang assembler. As a minimum those should change output errors. And run make clean before builds of v0.3.7

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Hello @brada4!

Now my openBlasBuild.sh looks like this:

TOOLCHAIN_PATH=/usr/bin
SYSROOT_PATH=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk
make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC="$TOOLCHAIN_PATH/clang -isysroot $SYSROOT_PATH -arch arm64 -miphoneos-version-min=8.0 -O2 -fno-integrated-as" NOFORTRAN=1 libs

I ran make clean and launched this script. I got those errors in the output:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/as: can't specifiy -Q with -arch arm64
clangclang: error: clangassembler command failed with exit code 1 (use -v to see invocation): 
error: assembler command failed with exit code 1 (use -v to see invocation)
clang: error: assembler command failed with exit code 1 (use -v to see invocation)clang
: error: assembler command failed with exit code 1 (use -v to see invocation)
clang: error: assembler command failed with exit code 1 (use -v to see invocation): 
error: assembler command failed with exit code 1 (use -v to see invocation)
clang: error: assembler command failed with exit code 1 (use -v to see invocation)
clang: error: assembler command failed with exit code 1 (use -v to see invocation)
make[1]: *** [sasum.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [dsdot.o] Error 1
make[1]: *** [sdsdot.o] Error 1
make[1]: *** [sdot.o] Error 1
make[1]: *** [sscal.o] Error 1
make[1]: *** [scopy.o] Error 1
make[1]: *** [sswap.o] Error 1
make[1]: *** [saxpy.o] Error 1
make: *** [libs] Error 1

@martin-frbg
Copy link
Collaborator

What I quoted yesterday was from some other compiler manual, but I did not keep the reference.
I see now that the topic was also discussed here https://www.avrfreaks.net/forum/labels-inside-macros (not all contributions there were helpful, but one suggestions was to use numeric labels and a different jump instruction that tells to jump back to the previous occurence of the label).

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

If I use -integrated-as I have the same output as without it. I guess that's because integrated assembler is enabled by default.

And for the record, I'm building 0.3.6, not 0.3.7. Could that make a difference?

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg thank you for the link. I found out that symbol %= is also can be used for macro counting, but that didn't work either. And I saw the comment you're referring to but I have very little knowledge to apply this approach to OpenBLAS code. But I'll try anyway :)

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg @brada4 Ok, I think I make some progress.

Here's how nrm2.S looks like. That's one of files who cause trouble. That's only the beginning of the file because I think that part matters now:

.macro KERNEL_F1
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_\@
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_\@
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_\@
KERNEL_F1_SCALE_GE_X_\@:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_\@
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_\@
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_\@
KERNEL_F1_SCALE_GE_X_\@:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_\@:
.endm

.macro KERNEL_S1
#if !defined(DOUBLE)
	ldr	s4, [X]
	fcmp	s4, REGZERO
	beq	KERNEL_S1_NEXT
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_S1_SCALE_GE_X
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_S1_NEXT
KERNEL_S1_SCALE_GE_X:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X]
	fcmp	d4, REGZERO
	beq	KERNEL_S1_NEXT
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_S1_SCALE_GE_X
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_S1_NEXT
KERNEL_S1_SCALE_GE_X:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_S1_NEXT:
	add	X, X, INC_X
.endm

.macro KERNEL_F8
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
	KERNEL_F1
.endm

Here's how I changed it. @martin-frbg please verify if I understood your advice correctly:

.macro KERNEL_F1_0
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_0
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_0
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_0
KERNEL_F1_SCALE_GE_X_0:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_0
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_0
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_0
KERNEL_F1_SCALE_GE_X_0:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_0:
.endm

.macro KERNEL_F1_1
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_1
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_1
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_1
KERNEL_F1_SCALE_GE_X_1:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_1
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_1
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_1
KERNEL_F1_SCALE_GE_X_1:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_1:
.endm

.macro KERNEL_F1_2
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_2
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_2
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_2
KERNEL_F1_SCALE_GE_X_2:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_2
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_2
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_2
KERNEL_F1_SCALE_GE_X_2:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_2:
.endm

.macro KERNEL_F1_3
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_3
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_3
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_3
KERNEL_F1_SCALE_GE_X_3:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_3
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_3
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_3
KERNEL_F1_SCALE_GE_X_3:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_3:
.endm

.macro KERNEL_F1_4
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_4
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_4
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_4
KERNEL_F1_SCALE_GE_X_4:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_4
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_4
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_4
KERNEL_F1_SCALE_GE_X_4:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_4:
.endm

.macro KERNEL_F1_5
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_5
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_5
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_5
KERNEL_F1_SCALE_GE_X_5:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_5
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_5
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_5
KERNEL_F1_SCALE_GE_X_5:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_5:
.endm

.macro KERNEL_F1_6
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_6
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_6
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_6
KERNEL_F1_SCALE_GE_X_6:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_6
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_6
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_6
KERNEL_F1_SCALE_GE_X_6:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_6:
.endm

.macro KERNEL_F1_7
#if !defined(DOUBLE)
	ldr	s4, [X], #4
	fcmp	s4, REGZERO
	beq	KERNEL_F1_NEXT_7
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_F1_SCALE_GE_X_7
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_F1_NEXT_7
KERNEL_F1_SCALE_GE_X_7:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X], #8
	fcmp	d4, REGZERO
	beq	KERNEL_F1_NEXT_7
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_F1_SCALE_GE_X_7
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_F1_NEXT_7
KERNEL_F1_SCALE_GE_X_7:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_F1_NEXT_7:
.endm

.macro KERNEL_S1
#if !defined(DOUBLE)
	ldr	s4, [X]
	fcmp	s4, REGZERO
	beq	KERNEL_S1_NEXT
	fabs	s4, s4
	fcmp	SCALE, s4
	bge	KERNEL_S1_SCALE_GE_X
	fdiv	s2, SCALE, s4
	fmul	s2, s2, s2
	fmul	s3, SSQ, s2
	fadd	SSQ, REGONE, s3
	fmov	SCALE, s4
	b	KERNEL_S1_NEXT
KERNEL_S1_SCALE_GE_X:
	fdiv	s2, s4, SCALE
	fmla	SSQ, s2, v2.s[0]
#else
	ldr	d4, [X]
	fcmp	d4, REGZERO
	beq	KERNEL_S1_NEXT
	fabs	d4, d4
	fcmp	SCALE, d4
	bge	KERNEL_S1_SCALE_GE_X
	fdiv	d2, SCALE, d4
	fmul	d2, d2, d2
	fmul	d3, SSQ, d2
	fadd	SSQ, REGONE, d3
	fmov	SCALE, d4
	b	KERNEL_S1_NEXT
KERNEL_S1_SCALE_GE_X:
	fdiv	d2, d4, SCALE
	fmla	SSQ, d2, v2.d[0]
#endif
KERNEL_S1_NEXT:
	add	X, X, INC_X
.endm

.macro KERNEL_F8
	KERNEL_F1_0
	KERNEL_F1_1
	KERNEL_F1_2
	KERNEL_F1_3
	KERNEL_F1_4
	KERNEL_F1_5
	KERNEL_F1_6
	KERNEL_F1_7
.endm

When I try to compile the library with those changes, I get next error. That's the whole error log:

/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/nrm2-c640f1.s:306:2: error: unrecognized instruction mnemonic
 KERNEL_F1
 ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_srot_k -DASMFNAME=_srot_k_ -DNAME=srot_k_ -DCNAME=srot_k -DCHAR_NAME=\"srot_k_\" -DCHAR_CNAME=\"srot_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX -UCOMPLEX -UDOUBLE  ../kernel/arm64/rot.S -o srot_k.o
make[1]: *** [snrm2_k.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [libs] Error 1

I see that KERNEL_F1 is also used here in nrm2.S:

.Lnrm2_kernel_F10:

	KERNEL_F1

	subs	I, I, #1
	bne	.Lnrm2_kernel_F10

	b	.Lnrm2_kernel_L999

Error is pretty straight-forward. Compiler can't find KERNEL_F1 because there's none, there're new 8 KERNEL_F1_[number] macroses.

What should I do to fix that? Should I create another copy of KERNEL_F1 in order to use here?

I understand that I'm solving the problem with an "axe". But this approach I can at least understand at that point :)

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

I did the same trick for znrm2.S. Now my errors look like this:

<instantiation>:4:21: error: unexpected token in argument list
 beq KERNEL_S1_NEXT_\@
                    ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:7:28: error: unexpected token in argument list
 bge KERNEL_S1_SCALE_GE_XR_\@
                           ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:13:19: error: unexpected token in argument list
 b KERNEL_S1_NEXT_\@
                  ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:14:23: error: unknown token in expression
KERNEL_S1_SCALE_GE_XR_\@:
                      ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:14:23: error: invalid operand
KERNEL_S1_SCALE_GE_XR_\@:
                      ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:17:16: error: unknown token in expression
KERNEL_S1_NEXT_\@:
               ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:17:16: error: invalid operand
KERNEL_S1_NEXT_\@:
               ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:20:20: error: unexpected token in argument list
 beq KERNEL_S1_END_\@
                   ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:23:28: error: unexpected token in argument list
 bge KERNEL_S1_SCALE_GE_XI_\@
                           ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:29:18: error: unexpected token in argument list
 b KERNEL_S1_END_\@
                 ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:30:23: error: unknown token in expression
KERNEL_S1_SCALE_GE_XI_\@:
                      ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
<instantiation>:30:23: error: invalid operand
KERNEL_S1_SCALE_GE_XI_\@:
                      ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
../kernel/arm64/znrm2.S:740:15: error: unknown token in expression
KERNEL_S1_END_\@:
              ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
../kernel/arm64/znrm2.S:740:15: error: invalid operand
KERNEL_S1_END_\@:
              ^
/var/folders/xt/7v6bxk2n19zft1y3yzymdc580000gq/T/znrm2-f2371d.s:503:2: note: while in macro instantiation
 KERNEL_S1
 ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_caxpby_k -DASMFNAME=_caxpby_k_ -DNAME=caxpby_k_ -DCNAME=caxpby_k -DCHAR_NAME=\"caxpby_k_\" -DCHAR_CNAME=\"caxpby_k\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UCONJ -UDOUBLE ../kernel/arm64/../arm/zaxpby.c -o caxpby_k.o
make[1]: *** [cnrm2_k.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [libs] Error 1

What's suspicious to me here is that this macro is used only here in znrm2.S:

.Lznrm2_kernel_S10:

	KERNEL_S1

	subs	I, I, #1
	bne	.Lznrm2_kernel_S10

Now it makes me thinking - how often this .Lznrm2_kernel_S10 gets called? Maybe this automatic labels counter important after all.

Or was it done this way just to keep coding style the same?

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg Here're modified nrm2.S and znm2.S for 0.3.6 version. With those library successfuly compiles

Archive.zip

But, when I try to compile the project with new the library I have following errors:

Undefined symbols for architecture arm64:
  "_sdot_k", referenced from:
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      _strmv_TUU in libopenblas.a(strmv_TUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TUN.o)
      ...
  "_scopy_k", referenced from:
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      ...
  "_strmm_kernel_RN", referenced from:
      _strmm_RNUN in libopenblas.a(strmm_RNUN.o)
      _strmm_RNUU in libopenblas.a(strmm_RNUU.o)
      _strmm_RTLN in libopenblas.a(strmm_RTLN.o)
      _strmm_RTLU in libopenblas.a(strmm_RTLU.o)
  "_strmm_kernel_LN", referenced from:
      _strmm_LNUN in libopenblas.a(strmm_LNUN.o)
      _strmm_LNUU in libopenblas.a(strmm_LNUU.o)
      _strmm_LTLN in libopenblas.a(strmm_LTLN.o)
      _strmm_LTLU in libopenblas.a(strmm_LTLU.o)
  "_strmm_kernel_LT", referenced from:
      _strmm_LNLN in libopenblas.a(strmm_LNLN.o)
      _strmm_LNLU in libopenblas.a(strmm_LNLU.o)
      _strmm_LTUN in libopenblas.a(strmm_LTUN.o)
      _strmm_LTUU in libopenblas.a(strmm_LTUU.o)
  "_sscal_k", referenced from:
      _sgemv_ in libopenblas.a(sgemv.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NUN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLU.o)
      ...
  "_sgemm_kernel", referenced from:
      _sgemm_nn in libopenblas.a(sgemm_nn.o)
      _sgemm_nt in libopenblas.a(sgemm_nt.o)
      _inner_thread in libopenblas.a(sgemm_thread_nn.o)
      _inner_thread in libopenblas.a(sgemm_thread_nt.o)
      _inner_thread in libopenblas.a(sgemm_thread_tn.o)
      _inner_thread in libopenblas.a(sgemm_thread_tt.o)
      _sgemm_tn in libopenblas.a(sgemm_tn.o)
      ...
  "_strmm_kernel_RT", referenced from:
      _strmm_RNLN in libopenblas.a(strmm_RNLN.o)
      _strmm_RNLU in libopenblas.a(strmm_RNLU.o)
      _strmm_RTUN in libopenblas.a(strmm_RTUN.o)
      _strmm_RTUU in libopenblas.a(strmm_RTUU.o)
  "_sgemv_n", referenced from:
      l___const.sgemv_.gemv in libopenblas.a(sgemv.o)
      _gemv_kernel in libopenblas.a(sgemv_thread_n.o)
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      ...
  "_sgemv_t", referenced from:
      l___const.sgemv_.gemv in libopenblas.a(sgemv.o)
      _gemv_kernel in libopenblas.a(sgemv_thread_t.o)
      _strmv_TLN in libopenblas.a(strmv_TLN.o)
      _strmv_TLU in libopenblas.a(strmv_TLU.o)
      _strmv_TUN in libopenblas.a(strmv_TUN.o)
      _strmv_TUU in libopenblas.a(strmv_TUU.o)
      _trmv_kernel in libopenblas.a(strmv_thread_TLN.o)
      ...
     (maybe you meant: _sgemv_thread_n, _sgemv_thread_t )
  "_saxpy_k", referenced from:
      _saxpy_ in libopenblas.a(saxpy.o)
      _strmv_NLN in libopenblas.a(strmv_NLN.o)
      _strmv_NLU in libopenblas.a(strmv_NLU.o)
      _strmv_NUN in libopenblas.a(strmv_NUN.o)
      _strmv_NUU in libopenblas.a(strmv_NUU.o)
      _strmv_thread_NLN in libopenblas.a(strmv_thread_NLN.o)
      _trmv_kernel in libopenblas.a(strmv_thread_NLN.o)
      ...
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@martin-frbg
Copy link
Collaborator

Yes your changes look correct (if ugly), but all the other "missing" functions like sdot, caxpy, etc. are also assembler files (dot.S etc, the assignments are in KERNEL.ARMV8) and the xcode assembler probably has some problem with them as well. (There is no @ used in any of them, so the problem must be different - but it should be somewhere in the make output)

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

I found a proposed fix here -#569 (comment) . But unfortunately, no luck, functions are still missing.

Could you suggest what kind of difference I can find in the make output? And by make output you mean log during building process?

@ashwinyes

@martin-frbg
Copy link
Collaborator

I had assumed there would be actual error messages from the compilation of those files. But re-reading #569 it could be that the functions just got "wrong" names in the respective object files (and the final library). If OSX has the nm command, you could run this on your libopenblas.a to see how function names are formed in general (the entries marked "T" are the actual functions,
those with "U" are calls to them) and if e.g. sdot is actually present among them. And also if that entry looks like the others, on Linux you would see it as "sdot_" on OSX there may be leading and/or trailing underscores.

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Here's the output of nm command. Here's how log for sgemm_kernel looks like:

libopenblas.a(sgemm_kernel.o):
0000000000001d90 t .Lsgemm_kernel_L1_BEGIN
00000000000024ec t .Lsgemm_kernel_L1_END
0000000000001ff0 t .Lsgemm_kernel_L1_M16_100
0000000000001db8 t .Lsgemm_kernel_L1_M16_20
0000000000001de0 t .Lsgemm_kernel_L1_M16_22
0000000000001fa8 t .Lsgemm_kernel_L1_M16_40
0000000000001fb0 t .Lsgemm_kernel_L1_M16_42
0000000000001da8 t .Lsgemm_kernel_L1_M16_BEGIN
0000000000002010 t .Lsgemm_kernel_L1_M16_END
00000000000024d8 t .Lsgemm_kernel_L1_M1_100
00000000000023f8 t .Lsgemm_kernel_L1_M1_20
000000000000240c t .Lsgemm_kernel_L1_M1_22
00000000000024b4 t .Lsgemm_kernel_L1_M1_40
00000000000024bc t .Lsgemm_kernel_L1_M1_42
00000000000023f0 t .Lsgemm_kernel_L1_M1_BEGIN
00000000000023dc t .Lsgemm_kernel_L1_M2_100
00000000000022fc t .Lsgemm_kernel_L1_M2_20
0000000000002310 t .Lsgemm_kernel_L1_M2_22
00000000000023b8 t .Lsgemm_kernel_L1_M2_40
00000000000023c0 t .Lsgemm_kernel_L1_M2_42
00000000000022e8 t .Lsgemm_kernel_L1_M2_BEGIN
00000000000023f0 t .Lsgemm_kernel_L1_M2_END
00000000000022d0 t .Lsgemm_kernel_L1_M4_100
00000000000021c4 t .Lsgemm_kernel_L1_M4_20
00000000000021e0 t .Lsgemm_kernel_L1_M4_22
00000000000022a8 t .Lsgemm_kernel_L1_M4_40
00000000000022b0 t .Lsgemm_kernel_L1_M4_42
00000000000021b0 t .Lsgemm_kernel_L1_M4_BEGIN
00000000000022e8 t .Lsgemm_kernel_L1_M4_END
0000000000002198 t .Lsgemm_kernel_L1_M8_100
000000000000202c t .Lsgemm_kernel_L1_M8_20
0000000000002060 t .Lsgemm_kernel_L1_M8_22
0000000000002168 t .Lsgemm_kernel_L1_M8_40
0000000000002170 t .Lsgemm_kernel_L1_M8_42
0000000000002018 t .Lsgemm_kernel_L1_M8_BEGIN
00000000000021b0 t .Lsgemm_kernel_L1_M8_END
0000000000001450 t .Lsgemm_kernel_L2_BEGIN
0000000000001d8c t .Lsgemm_kernel_L2_END
0000000000001760 t .Lsgemm_kernel_L2_M16_100
0000000000001480 t .Lsgemm_kernel_L2_M16_20
00000000000014c0 t .Lsgemm_kernel_L2_M16_22
0000000000001708 t .Lsgemm_kernel_L2_M16_40
0000000000001710 t .Lsgemm_kernel_L2_M16_42
0000000000001470 t .Lsgemm_kernel_L2_M16_BEGIN
000000000000179c t .Lsgemm_kernel_L2_M16_END
0000000000001d6c t .Lsgemm_kernel_L2_M1_100
0000000000001c8c t .Lsgemm_kernel_L2_M1_20
0000000000001ca0 t .Lsgemm_kernel_L2_M1_22
0000000000001d48 t .Lsgemm_kernel_L2_M1_40
0000000000001d50 t .Lsgemm_kernel_L2_M1_42
0000000000001c84 t .Lsgemm_kernel_L2_M1_BEGIN
0000000000001c60 t .Lsgemm_kernel_L2_M2_100
0000000000001b58 t .Lsgemm_kernel_L2_M2_20
0000000000001b70 t .Lsgemm_kernel_L2_M2_22
0000000000001c38 t .Lsgemm_kernel_L2_M2_40
0000000000001c40 t .Lsgemm_kernel_L2_M2_42
0000000000001b44 t .Lsgemm_kernel_L2_M2_BEGIN
0000000000001c84 t .Lsgemm_kernel_L2_M2_END
0000000000001b18 t .Lsgemm_kernel_L2_M4_100
00000000000019a4 t .Lsgemm_kernel_L2_M4_20
00000000000019e0 t .Lsgemm_kernel_L2_M4_22
0000000000001ae8 t .Lsgemm_kernel_L2_M4_40
0000000000001af0 t .Lsgemm_kernel_L2_M4_42
0000000000001990 t .Lsgemm_kernel_L2_M4_BEGIN
0000000000001b44 t .Lsgemm_kernel_L2_M4_END
0000000000001960 t .Lsgemm_kernel_L2_M8_100
00000000000017b8 t .Lsgemm_kernel_L2_M8_20
00000000000017e0 t .Lsgemm_kernel_L2_M8_22
0000000000001928 t .Lsgemm_kernel_L2_M8_40
0000000000001930 t .Lsgemm_kernel_L2_M8_42
00000000000017a4 t .Lsgemm_kernel_L2_M8_BEGIN
0000000000001990 t .Lsgemm_kernel_L2_M8_END
0000000000000054 t .Lsgemm_kernel_L4_BEGIN
0000000000001444 t .Lsgemm_kernel_L4_END
0000000000000c60 t .Lsgemm_kernel_L4_M16_100
0000000000000080 t .Lsgemm_kernel_L4_M16_20
0000000000000360 t .Lsgemm_kernel_L4_M16_22
0000000000000620 t .Lsgemm_kernel_L4_M16_22a
00000000000008e0 t .Lsgemm_kernel_L4_M16_32
0000000000000b9c t .Lsgemm_kernel_L4_M16_40
0000000000000bdc t .Lsgemm_kernel_L4_M16_44
0000000000000c00 t .Lsgemm_kernel_L4_M16_46
000000000000006c t .Lsgemm_kernel_L4_M16_BEGIN
0000000000000d20 t .Lsgemm_kernel_L4_M16_END
0000000000001408 t .Lsgemm_kernel_L4_M1_100
0000000000001300 t .Lsgemm_kernel_L4_M1_20
0000000000001318 t .Lsgemm_kernel_L4_M1_22
00000000000013e0 t .Lsgemm_kernel_L4_M1_40
00000000000013e8 t .Lsgemm_kernel_L4_M1_42
00000000000012f8 t .Lsgemm_kernel_L4_M1_BEGIN
00000000000012b4 t .Lsgemm_kernel_L4_M2_100
0000000000001180 t .Lsgemm_kernel_L4_M2_20
00000000000011a0 t .Lsgemm_kernel_L4_M2_22
0000000000001288 t .Lsgemm_kernel_L4_M2_40
0000000000001290 t .Lsgemm_kernel_L4_M2_42
000000000000116c t .Lsgemm_kernel_L4_M2_BEGIN
00000000000012f8 t .Lsgemm_kernel_L4_M2_END
0000000000001128 t .Lsgemm_kernel_L4_M4_100
0000000000000fc8 t .Lsgemm_kernel_L4_M4_20
0000000000001040 t .Lsgemm_kernel_L4_M4_22
0000000000001080 t .Lsgemm_kernel_L4_M4_22a
00000000000010b0 t .Lsgemm_kernel_L4_M4_32
00000000000010f4 t .Lsgemm_kernel_L4_M4_40
0000000000001104 t .Lsgemm_kernel_L4_M4_44
000000000000110c t .Lsgemm_kernel_L4_M4_46
0000000000000fb4 t .Lsgemm_kernel_L4_M4_BEGIN
000000000000116c t .Lsgemm_kernel_L4_M4_END
0000000000000f60 t .Lsgemm_kernel_L4_M8_100
0000000000000d3c t .Lsgemm_kernel_L4_M8_20
0000000000000de0 t .Lsgemm_kernel_L4_M8_22
0000000000000e48 t .Lsgemm_kernel_L4_M8_22a
0000000000000e9c t .Lsgemm_kernel_L4_M8_32
0000000000000f08 t .Lsgemm_kernel_L4_M8_40
0000000000000f28 t .Lsgemm_kernel_L4_M8_44
0000000000000f30 t .Lsgemm_kernel_L4_M8_46
0000000000000d28 t .Lsgemm_kernel_L4_M8_BEGIN
0000000000000fb4 t .Lsgemm_kernel_L4_M8_END
00000000000024ec t .Lsgemm_kernel_L999
0000000000000000 t .Lsgemm_kernel_begin
0000000000000000 t ltmp0

Also, there's sdot_k log:

libopenblas.a(sdot_k.o):
0000000000000050 t .Ldot_kernel_F1
0000000000000058 t .Ldot_kernel_F10
0000000000000028 t .Ldot_kernel_F4
000000000000001c t .Ldot_kernel_F_BEGIN
00000000000000d8 t .Ldot_kernel_L999
00000000000000bc t .Ldot_kernel_S1
00000000000000c4 t .Ldot_kernel_S10
0000000000000084 t .Ldot_kernel_S4
0000000000000070 t .Ldot_kernel_S_BEGIN
0000000000000000 t ltmp0

So, in both cases there's no capital T function..

@martin-frbg
Copy link
Collaborator

Hmm, thats just the locally defined labels. I guess others like "snrm2" do show up with a capital "T" - any underscores on their names ?

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

I'm comparing two outputs, one for .C function and one for .S function:

/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk 
-arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE 
-DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_ismin_k -DASMFNAME=_ismin_k_ 
-DNAME=ismin_k_ -DCNAME=ismin_k -DCHAR_NAME=\"ismin_k_\" -DCHAR_CNAME=\"ismin_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX 
-UDOUBLE -UUSE_ABS  -DUSE_MIN ../kernel/arm64/../arm/imin.c -o ismin_k.o


/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk 
-arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE 
-DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_sasum_k -DASMFNAME=_sasum_k_ 
-DNAME=sasum_k_ -DCNAME=sasum_k -DCHAR_NAME=\"sasum_k_\" -DCHAR_CNAME=\"sasum_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX 
-UDOUBLE ../kernel/arm64/asum.S -o sasum_k.o

I noticed that output for .C function contains those two macroses -UUSE_ABS -DUSE_MIN. I don't know if it relates to the problem, but worth noting.

@martin-frbg
Copy link
Collaborator

No, nothing to do with the problem at hand - in some cases, the same source file is used to build several variants of a function via suitable #ifdef sections. So imin.c is also used to create imin.o (when USE_MIN is unset) as well as iamin.o (when USE_ABS is defined) and iamax.o.

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg @ashwinyes guys, I did a little experiment just out of curiosity.

I modified dot.S file (where _sdot_k gets built):

.... some code
/*******************************************************************************
* End of macro definitions
*******************************************************************************/

	.text
    .align 2
    .globl _sdot_k
_sdot_k:
	fmov	DOTF, REG0
...some other code

As you can see, I hardcoded PROLOGUE and most importantly, I hard-coded function name.

After that I did nm [name].a and the output for sdot_k.o is next:

libopenblas_armv8p-r0.3.6.a(sdot_k.o):
0000000000000050 t .Ldot_kernel_F1
0000000000000058 t .Ldot_kernel_F10
0000000000000028 t .Ldot_kernel_F4
000000000000001c t .Ldot_kernel_F_BEGIN
00000000000000d8 t .Ldot_kernel_L999
00000000000000bc t .Ldot_kernel_S1
00000000000000c4 t .Ldot_kernel_S10
0000000000000084 t .Ldot_kernel_S4
0000000000000070 t .Ldot_kernel_S_BEGIN
0000000000000000 T _sdot_k
0000000000000000 t ltmp0

I tried to build a project with that .a library, error for sdot_k disappeared which proves that function is there. There must be something wrong with REALNAME, I guess.

Any idea how to debug it?

@martin-frbg
Copy link
Collaborator

REALNAME should be (from common_arm64.h) what the build log shows as ASMNAME on the make command line for sdot_k.o. Did you ever see a warning like "ASMNAME redefined" during the build ?

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

No, never. I even checked building log, there's no mention of any redefinition. Well, -DASMNAME=_sdot_k seems to be correct. Is there a chance that some of other names were applied to the function?

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@martin-frbg
Could you tell me where I can tune an output log in order to see what value REALNAME carries?

@ashwinyes
Copy link
Contributor

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_sdot_k -DASMFNAME=_sdot_k_ -DNAME=sdot_k_ -DCNAME=sdot_k -DCHAR_NAME=\"sdot_k_\" -DCHAR_CNAME=\"sdot_k\" -DNO_AFFINITY -I.. -UDOUBLE -UCOMPLEX -UCOMPLEX -UDOUBLE ../kernel/arm64/dot.S -o sdot_k.o

You can run the above command by adding a '-S' option and remove the '-c' and '-o sdot_k.o' to see the assembly output. There will be sdot_k.s file generated. You will need to run it from inside the kernel directory.

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@ashwinyes correct me please if I build a command correctly:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -arch arm64 -miphoneos-version-min=8.0 -O2 -S -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_LAPACK -DNO_LAPACKE -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.6\" -march=armv8-a -DASMNAME=_sdot_k -DASMFNAME=_sdot_k_ -DNAME=sdot_k_ -DCNAME=sdot_k -DCHAR_NAME=\"sdot_k_\" -DCHAR_CNAME=\"sdot_k\" -DNO_AFFINITY -I.. -UDOUBLE -UCOMPLEX -UCOMPLEX -UDOUBLE ../kernel/arm64/dot.S

And that's an output:

# 1 "../kernel/arm64/dot.S"
# 1 "<built-in>" 1
# 1 "../kernel/arm64/dot.S" 2
# 29 "../kernel/arm64/dot.S"
# 1 "../common.h" 1
# 62 "../common.h"
# 1 "../config.h" 1
# 63 "../common.h" 2
# 437 "../common.h"
# 1 "../common_arm64.h" 1
# 438 "../common.h" 2
# 542 "../common.h"
# 1 "../param.h" 1
# 543 "../common.h" 2
# 1 "../common_param.h" 1
# 544 "../common.h" 2
# 784 "../common.h"
# 1 "../common_interface.h" 1
# 785 "../common.h" 2



# 1 "../common_macro.h" 1
# 42 "../common_macro.h"
# 1 "../common_s.h" 1
# 43 "../common_macro.h" 2
# 1 "../common_d.h" 1
# 44 "../common_macro.h" 2
# 1 "../common_q.h" 1
# 45 "../common_macro.h" 2

# 1 "../common_c.h" 1
# 47 "../common_macro.h" 2
# 1 "../common_z.h" 1
# 48 "../common_macro.h" 2
# 1 "../common_x.h" 1
# 49 "../common_macro.h" 2
# 789 "../common.h" 2
# 1 "../common_level1.h" 1
# 790 "../common.h" 2
# 1 "../common_level2.h" 1
# 791 "../common.h" 2
# 1 "../common_level3.h" 1
# 792 "../common.h" 2
# 1 "../common_lapack.h" 1
# 793 "../common.h" 2
# 30 "../kernel/arm64/dot.S" 2
# 71 "../kernel/arm64/dot.S"
.macro KERNEL_F1
 ldr s2, [x1], #4
 ldr s3, [x3], #4

 fmadd s0, s2, s3, s0






.endm

.macro KERNEL_F4

 ld1 {v2.4s}, [x1], #16
 ld1 {v3.4s}, [x3], #16

 fmla v0.4s, v2.4s, v3.4s
# 108 "../kernel/arm64/dot.S"
 PRFM PLDL1KEEP, [x1, #1024]
 PRFM PLDL1KEEP, [x3, #1024]
.endm

.macro KERNEL_F4_FINALIZE


 ext v1.16b, v0.16b, v0.16b, #8
 fadd v0.2s, v0.2s, v1.2s
 faddp s0, v0.2s






.endm

.macro INIT_S

 lsl x2, x2, #2
 lsl x4, x4, #2




.endm

.macro KERNEL_S1
 ld1 {v2.s}[0], [x1], x2
 ld1 {v3.s}[0], [x3], x4

 fmadd s0, s2, s3, s0






.endm





 .text ; .align 2 ; .globl _sdot_k ;_sdot_k:

 fmov s0, wzr




 cmp x0, xzr
 ble .Ldot_kernel_L999

 cmp x2, #1
 bne .Ldot_kernel_S_BEGIN
 cmp x4, #1
 bne .Ldot_kernel_S_BEGIN

.Ldot_kernel_F_BEGIN:

 asr x5, x0, #2
 cmp x5, xzr
 beq .Ldot_kernel_F1

.Ldot_kernel_F4:

 KERNEL_F4

 subs x5, x5, #1
 bne .Ldot_kernel_F4

 KERNEL_F4_FINALIZE

.Ldot_kernel_F1:

 ands x5, x0, #3
 ble .Ldot_kernel_L999

.Ldot_kernel_F10:

 KERNEL_F1

 subs x5, x5, #1
        bne .Ldot_kernel_F10

 ret

.Ldot_kernel_S_BEGIN:

 INIT_S

 asr x5, x0, #2
 cmp x5, xzr
 ble .Ldot_kernel_S1

.Ldot_kernel_S4:

 KERNEL_S1
 KERNEL_S1
 KERNEL_S1
 KERNEL_S1

 subs x5, x5, #1
 bne .Ldot_kernel_S4

.Ldot_kernel_S1:

 ands x5, x0, #3
 ble .Ldot_kernel_L999

.Ldot_kernel_S10:

 KERNEL_S1

 subs x5, x5, #1
        bne .Ldot_kernel_S10

.Ldot_kernel_L999:

 ret

@ashwinyes
Copy link
Contributor

So I guess -S just outputs into stdout if you are giving it for an assembler file instead of C file.

Your output has the following. So looks like its replacing appropriately.
.text ; .align 2 ; .globl _sdot_k ;_sdot_k:

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Can ; be the problem? I already had an issue today when compiler didn't understand what I want from him

@ashwinyes
Copy link
Contributor

It could be an issue. You could hard code it in the assembly file and see whether it causes an issue.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 1, 2019

Hmm. He hardcoded it before, and that appeared to work. Perhaps just remove all ; from the (multi-line) definition of PROLOGUE in common_arm64.h
Unfortunately my IOS job on Travis is still failing for all the wrong reasons..

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Tried next thing on dot.S file:

/*******************************************************************************
* End of macro definitions
*******************************************************************************/

    .text
    .align 2
    .globl REALNAME
REALNAME:

	fmov	DOTF, REG0
#if defined(DOUBLE)

nm output for sdot_k.o:

libopenblas_armv8p-r0.3.6.a(sdot_k.o):
0000000000000050 t .Ldot_kernel_F1
0000000000000058 t .Ldot_kernel_F10
0000000000000028 t .Ldot_kernel_F4
000000000000001c t .Ldot_kernel_F_BEGIN
00000000000000d8 t .Ldot_kernel_L999
00000000000000bc t .Ldot_kernel_S1
00000000000000c4 t .Ldot_kernel_S10
0000000000000084 t .Ldot_kernel_S4
0000000000000070 t .Ldot_kernel_S_BEGIN
0000000000000000 T _sdot_k
0000000000000000 t ltmp0

Function is there. It just must be ;, there's no other way around.

But simply removing ; won't do the trick. Apparently new line is important and it's hard to achieve inside a macro.

If I just remove all ; I will have such error output:

../kernel/arm64/amax.S:132:8: error: unexpected token in section switching directive
 .text .align 2 .globl _samax_k _samax_k:
       ^
../kernel/arm64/iamax.S:157:8: error: unexpected token in section switching directive
 .text .align 2 .globl _isamax_k _isamax_k:
       ^
make[1]: *** [samax_k.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [isamax_k.o] Error 1
make: *** [libs] Error 1

@ashwinyes
Copy link
Contributor

I meant to hard code ";" in the assembly file and see whether its not working.

You cant remove ";" from the #define , it would cause an error.

@ashwinyes
Copy link
Contributor

You can try changing PROLOGUE to a macro

.macro PROLOGUE
	.text
	.align
	.globl	REALNAME
REALNAME:
.endm

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

Tried:

    .text;
    .align 2;
    .globl REALNAME;
REALNAME:

Working. Here's nm output:

libopenblas_armv8p-r0.3.6.a(sdot_k.o):
0000000000000050 t .Ldot_kernel_F1
0000000000000058 t .Ldot_kernel_F10
0000000000000028 t .Ldot_kernel_F4
000000000000001c t .Ldot_kernel_F_BEGIN
00000000000000d8 t .Ldot_kernel_L999
00000000000000bc t .Ldot_kernel_S1
00000000000000c4 t .Ldot_kernel_S10
0000000000000084 t .Ldot_kernel_S4
0000000000000070 t .Ldot_kernel_S_BEGIN
0000000000000000 T _sdot_k
0000000000000000 t ltmp0

But apparently, new line really matters. I tried two other options:

.text; .align 2; .globl REALNAME ;REALNAME: - that's how it was converted by the command @ashwinyes proposed
.text; .align 2; .globl REALNAME; REALNAME: - properly aligned.

Both of them don't work

@L1onKing
Copy link
Author

L1onKing commented Oct 1, 2019

@ashwinyes I tried your GAS macro (that is GAS macro, right? :). It looks like this:

common_arm64.h

/*#define PROLOGUE \
	.text ;\
	.align	2 ;\
	.globl	REALNAME ;\
REALNAME:*/

.macro PROLOGUE
    .text
    .align 2
    .globl REALNAME
REALNAME:
.endm

AND IT TOTALLY WORKED!!!!!!! :) Library is running like a charm! :)

COMPARING RESULTS:

I also decided to build 0.3.6 with C implementations instead of Assembly and I did comparison. I'm using OpenBLAS in real time and with Assembly "ON" I got 10 fps increase.

@ashwinyes @martin-frbg guys, thank you SO SO MUCH that you've been here all day with me, I really appreciate that! I wouldn't over come all those issues if not your presence here :)

@brada4
Copy link
Contributor

brada4 commented Oct 1, 2019

You can always use Apple Accelerate Framework which is BLAS and consistent between OSX and iOS

@martin-frbg
Copy link
Collaborator

@L1onKing thanks for persevering , getting OpenBLAS built on IOS has come up several times in the past but all the others gave up at a partial solution. @brada4 initial comparison of a generic C build to Accelerate was in #1531, the idea here was to see what speedup could be gained from making the optimized assembly kernels work.

@L1onKing
Copy link
Author

L1onKing commented Oct 2, 2019

@brada4 Unfortunately, using Accelerate framework was not an option for me. I needed openBLAS library for this specific project.

@ashwinyes
Copy link
Contributor

ashwinyes commented Oct 2, 2019

@L1onKing Thanks for the perseverance shown to solve this issue. However you should keep in mind that the iOS ARM64 ABI has certain deviations from the generic ABI (https://developer.apple.com/library/archive/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCallingConventions.html) and the ARM64 assembly implementations in OpenBLAS dont take that into account. So there may be some runtime issues. But I dont see the current assembly implementations being affected by it. Anyways it would be good if you keep that in mind.

@martin-frbg
Converting

#define PROLOGUE \
	.text ;\
	.align	2 ;\
	.globl	REALNAME ;\
REALNAME:

to

.macro PROLOGUE
    .text
    .align 2
    .globl REALNAME
REALNAME:
.endm

i.e changing PROLOGUE from #define to a macro can be used for all arm64 cases I believe. Can this change be pushed after a sanity test?

@L1onKing
Copy link
Author

L1onKing commented Oct 2, 2019

@ashwinyes I understand that there's a lot of tinkering involved when it comes to using custom libraries on Mobile OS, considering that it's constantly evolving. But a lot of really good libraries is offering BLAS optimizations, and unfortunately, Apple doesn't allow direct BLAS calls since it's considered as private API.

When it comes to performance I'm doing my best to research every option I have, even if it means to go the hard way.

@martin-frbg
Copy link
Collaborator

I intend to create a PR from this tomorrow (which is a public holiday here), I just have other work testing my sanity today.
BTW with OSX on x86 it was found that the Apple build tools misunderstand the .align directive, leading to a drop in performance that can be avoided by using the equivalent .p2align (with the value converted to the corresponding power of two). Could be worth checking if this effect also exists on IOS.

@L1onKing
Copy link
Author

L1onKing commented Oct 2, 2019

I saw your suggestion about using .p2align in the other thread.

I found this Apple documentation related to Assembly - https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html

They're using in their examples .align, but there's indeed .p2align as well. I didn't experiment with it since I don't really understand what is it doing. I understand that it has something to do with the memory placement/alignment of Assembly label, but it's not very clear for me. Perhaps I should dig out more on this topic in order to experiment with it "consciously"

@L1onKing
Copy link
Author

L1onKing commented Oct 3, 2019

@martin-frbg could you please elaborate on how to use .p2align? If I use .p2align instead of .align, should I change the number (in my case it's 2) as well?

@martin-frbg
Copy link
Collaborator

Yes, the number changes to the appropriate exponent of 2 (so .align 2 ----> .p2align 1, .align 4 ---->.p2align 2 , .align 8 ----> .p2align 3 etc) - I suspect the reason that ashwinyes' tentative fix from #1531 had .align 2 in the PROLOGUE while the mainline develop has .align 4 may stem from just this problem that the xcode assembler misreads ".align" as if it was a ".p2align" of the same value, leading to inefficient alignments.
BTW I am thinking about using the other option for the "_@" label numbering problem: It should keep the code more readable when the labels inside the KERNEL_F1 macro are changed to 1:,
2:, 3:,4: (with the previous names preserved as comments and all jumps then going to "1f" etc)
instead of having nine or ten identical copies of the entire macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants