-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1D inverse type-II DCT AVX2 optimisations #114
base: 20240120
Are you sure you want to change the base?
Conversation
@@ -172,16 +172,22 @@ typedef struct VVCDSPContext { | |||
VVCALFDSPContext alf; | |||
} VVCDSPContext; | |||
|
|||
void ff_vvc_dsp_init(VVCDSPContext *hpc, int bit_depth); | |||
void ff_vvc_dsp_init(VVCDSPContext *hpc, int bit_depth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be more elegant to pass VVCFrameParamSets
, VVCFrameContext
or something else more general here (and the call site for ff_vvc_dsp_init
supports it), however including vvc_ps.h in vvcdsp.h introduced a lot of compilation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, at this level, we'd better include a small set of headers
629db28 disables the optimisations when |
Did you check the hevc idct checkasm output? is it aligned with your result?
|
Here are the relevant entries from the HEVC IDCT checkasm benchmark:
Note that the HEVC optimisations are performed at the 2D level rather than the 1D level. Many of the instructions in the SIMD optimisations are spent loading data into and extracting data from the SIMD registers. This is all the more true for FFVVC due to the strides in the IDCT function signature. The FFVVC IDCT can be optimised at the 2D level in the future to get performance gains closer to HEVC's, but for now the 1D optimisations work alone and they provide the backbone needed for any future optimisation. |
how about dav1d, it has similar 1d function. or 2d only |
dav1d uses 2D and then some, incorporating some of the vectorisation as well to save a transpose operation. According to this lecture, this allowed them to double performance compared to only 1D SIMD optimisations. It's worth noting that doing these higher-level optimisations comes at a cost in terms of complexity though. dav1d has over 10,000 lines of inverse transform assembly for AVX2 alone! |
It was worth it. dav1d is most fast decoder in we see so far. and the current vvc transform function for some files cost 10% cpu. |
I will look into this. I don't think it will be quite this simple - some internal data representations in FFVVC will need to be changed as dav1d relies on packed input data but it looks like there is only one place non-packed transform coefficients are actually used in FFVVC. |
👍, we can start with 2x2 or 4x4 block. zero the entire block and set the fireset coeff to 1
|
m21, m22, m23, m24, \ | ||
m11, m12, m13, m14 | ||
|
||
const vvc_dct2_8_odd_mat, dw matvec_mul_4_permute(dct2_8_odd_mat_permute( \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vvc_itx_1d.asm:49: warning: single-line macro `matvec_mul_4_permute' exists, but not taking 1 parameter [-w+pp-macro-params-single]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug in NASM - the inner macro is expanded. These macros could be removed and these permutations applied directly to the transform matrices at the cost of readability, or we could look at adding -w-pp-macro-params-single
, if we really want to get rid of the warning.
YASM does not supported unnamed contexts, so give all contexts names
Replace `x%[y]` with `x %+ y`
Rebase and re-target onto main. |
Reset to 6105322. Work done porting FFmpeg HEVC ASM can now be found at |
This PR adds AVX2 optimisations for the type-II DCT. For now, these optimisations are only implemented at the 1D level.
Performance results: