Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sad_32x32 and 64x64 AVX2 has poor cache locality #3247

Open
shssoichiro opened this issue Jul 25, 2023 · 3 comments
Open

sad_32x32 and 64x64 AVX2 has poor cache locality #3247

shssoichiro opened this issue Jul 25, 2023 · 3 comments

Comments

@shssoichiro
Copy link
Collaborator

This at least applies to the HBD ASM, I have not tested against LBD. Benchmarking is showing a large number of cache read misses. Noting this as a possible area for performance improvement.

@lu-zero
Copy link
Collaborator

lu-zero commented Jul 25, 2023

Could you please add how you determined that so willing people can repeat the exercise? :)

@shssoichiro
Copy link
Collaborator Author

shssoichiro commented Jul 25, 2023

Yes, this was measured using valgrind, specifically in this case valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes target/release/rav1e -s 2 --no-scene-detection -i 0 -I 0 ~/xiph-media-files/objective-1-fast-10bit/speed_bag_640x360_60f.y4m -o /dev/null --limit 20. valgrind measures cache misses as one of its metrics and this can be viewed in kcachegrind. (The downside is that valgrind is quite a bit slower than perf.)

@tdaede
Copy link
Collaborator

tdaede commented Jul 25, 2023

This might not be the SAD itself really but rather the nature of e.g. motion compensation. Is this specific to AVX2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants