Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM Polly optimizer support #864

Merged
merged 3 commits into from
Apr 6, 2020
Merged

LLVM Polly optimizer support #864

merged 3 commits into from
Apr 6, 2020

Conversation

yukoba
Copy link
Contributor

@yukoba yukoba commented Apr 4, 2020

This is a pull request for LLVM Polly optimizer.
This pull request depends on bytedeco/javacpp#389 .
LLVM 10 uses C++14.

After many trials, I decided not to add Java API.
For my opinion, the public API of Polly is only command line arguments.
Many important variables are in the file scope and cannot write from outside.
Therefore, I control through LLVMParseCommandLineOptions.

LLVM uses dynamic loading on UNIX and static loading on Windows.
So I added -DLLVM_POLLY_LINK_INTO_TOOLS=ON to the Windows build option.
Also I added polly/LinkAllPasses.h . This is necessary to avoid the compiler elimination.

I tested Linux and Windows, but I have not tested on macOS and ARM.

Please see samples/MatMulBenchmark/MatMulBenchmark.java for the usage.
As I want to add pom.xml for MatMulBenchmark,
I created samples/Fac/ and samples/Clang/ and moved sample codes to there.

@saudet
Copy link
Member

saudet commented Apr 4, 2020

LLVM 10 uses C++14, yes, but since we're mapping only the C API, it shouldn't affect anything JavaCPP is doing. Where do we need C++14? Is it just for polly/LinkAllPasses.h?

If LLVMParseCommandLineOptions() is the way to access Polly's functionality from the C API, that looks fine to me.

BTW, instead of using JNA, we could use JavaCPP to map that function pointer. We could also create presets for libffi itself, which unlike JNA we could manage with JavaCPP. It's a pretty small library.

/cc @Neiko2002

@saudet saudet requested a review from Neiko2002 April 4, 2020 23:57
@yukoba
Copy link
Contributor Author

yukoba commented Apr 5, 2020

LLVM 10 uses C++14, yes, but since we're mapping only the C API, it shouldn't affect anything JavaCPP is doing. Where do we need C++14? Is it just for polly/LinkAllPasses.h?

Yes. polly/LinkAllPasses.h includes C++14 header files.
I don't need to change clang.java to C++14 cef020c#diff-f71d00b84ee3dc122ff4dc9bba03a2faR32 , but I changed it for consistency.

instead of using JNA, we could use JavaCPP to map that function pointer

Is it possible without libffi or do I need to port libffi?
Sorry, I'm not understanding what you want to say.

@saudet
Copy link
Member

saudet commented Apr 5, 2020

instead of using JNA, we could use JavaCPP to map that function pointer

Is it possible without libffi or do I need to port libffi?
Sorry, I'm not understanding what you want to say.

Yes, with FunctionPointer as shown in this unit test for strlen(): https://github.com/bytedeco/javacpp/blob/master/src/test/java/org/bytedeco/javacpp/PointerTest.java#L87

Since we're defining a new native function though, we have to execute JavaCPP on that, ideally via the Maven plugin.

@yukoba
Copy link
Contributor Author

yukoba commented Apr 5, 2020

I see. This is the solution yukoba@114bdbc .
But it is complicated. Could forgive me to use JNA?

@saudet
Copy link
Member

saudet commented Apr 5, 2020

Yes, that works. I don't think it's complicated, but let's put both versions? :) Maybe someone else will find this complicated and that will motivate them in helping making this easier to use... What do you think?

@Neiko2002
Copy link
Member

I just have read the changes without compiling them. For me the version without JNA looks better since it is adding some type-safety when calling MatMulFunction. It might be nice to have a static function in the MatMulFunction class which handles the instantiation and storing of the pointer.

public static MatMulFunction create(Pointer address) {  
  MatMulFunction func = new MatMulFunction();  
  func.put(address);  
  return func;  
}  

@yukoba
Copy link
Contributor Author

yukoba commented Apr 5, 2020

If I use Builder.build(), is it possible to run without Visual Studio 2017 on Windows?
yukoba@114bdbc#diff-fd9ea1da4821f979dec59387d963ee99R69

@saudet
Copy link
Member

saudet commented Apr 5, 2020

If I use Builder.build(), is it possible to run without Visual Studio 2017 on Windows?
yukoba@114bdbc#diff-fd9ea1da4821f979dec59387d963ee99R69

No, it's not possible. That's what a library like libffi is for. I find it surprising that LLVM itself doesn't offer this kind of functionality, but it doesn't.

@saudet
Copy link
Member

saudet commented Apr 5, 2020

BTW, the reason why your benchmark is slower than natively compiled code with clang is probably because of JNA and libffi. Those are known to be very slow. Since your goal is to increase performance, you shouldn't try to use libraries like libffi and JNA, and compile everything with a C++ compiler. That's why JavaCPP does that, it's for performance reasons.

@yukoba
Copy link
Contributor Author

yukoba commented Apr 5, 2020

Fix build on Mac

Thank you!

No, it's not possible. That's what a library like libffi is for.

Then, if I do not use JNA, you need the JavaCPP building environment to run this sample. I think most people do not prepare the JavaCPP building environment and impossible to run.

BTW, the reason why your benchmark is slower

clang -O3 -march=native -mllvm -polly -mllvm -polly-vectorizer=stripmine
can use Intel AVX-512 related optimization on Intel Skylake Xeon.

However, I have to add -march=native thing to here
https://github.com/bytedeco/javacpp-presets/pull/864/files#diff-fd9ea1da4821f979dec59387d963ee99R240
but I cannot find a way to do so. I searched for many hours but I cannot find.

By -O3 -march=native, they convert
s += a[m * K + k] * b[k * N + n]
to this,

s += a[m * K + k] * b[k * N + n] +
a[m * K + (k + 1)] * b[(k + 1) * N + n] +
a[m * K + (k + 2)] * b[(k + 2) * N + n] +
a[m * K + (k + 3)] * b[(k + 3) * N + n];

but this optimization is impossible on my sample code.
Polly and LLVM do not understand that my CPU can use Intel AVX-512.

However, I think this optimization part is using Intel AVX-512.
https://github.com/bytedeco/javacpp-presets/pull/864/files#diff-fd9ea1da4821f979dec59387d963ee99R263

@saudet
Copy link
Member

saudet commented Apr 5, 2020 via email

@saudet
Copy link
Member

saudet commented Apr 6, 2020

@saudet saudet merged commit c19c45e into bytedeco:ci Apr 6, 2020
@yukoba
Copy link
Contributor Author

yukoba commented Apr 9, 2020

Thank you for merging.

That functionality may be missing from the C API:

I solved the optimization problem.
I have to do the same thing of main() in llvm/tools/opt/opt.cpp .

acbfeeb

I will use this personally for a week, and if there is no problem I want to send a pull request.

@saudet
Copy link
Member

saudet commented Apr 9, 2020

Thanks! Please consider contributing this to the C API upstream though.

@saudet
Copy link
Member

saudet commented Apr 15, 2020

I know it's hard to contribute new code to LLVM, so if you're not able to, please do send a pull request to have this patch added to the presets here at least. Thanks!

@yukoba
Copy link
Contributor Author

yukoba commented Apr 16, 2020

if you're not able to, please do send a pull request to have this patch added to the presets here at least.

@saudet OK. I sent #869.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants