Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Parity with aomenc #2759

Open
2 of 19 tasks
shssoichiro opened this issue Jul 9, 2021 · 14 comments
Open
2 of 19 tasks

[Meta] Parity with aomenc #2759

shssoichiro opened this issue Jul 9, 2021 · 14 comments
Labels
compression performance needs discussion research Difficult, needs research and an in-depth discussion. speed performance

Comments

@shssoichiro
Copy link
Collaborator

shssoichiro commented Jul 9, 2021

I wanted to create a meta issue to track features or changes we can implement to reach quality parity with aomenc. Right now, our speed 6 still trails aomenc's cpu-used 6 by about 30% BD-rate, while also being slower (assuming one tile and no parallel encoding) (AWCY).

We come closer if we up rav1e to s0 (AWCY), where some clips even win over aomenc, but at the cost of rav1e being 2700% slower.

There's also the notable outlier of dark720, which is 200% worse MSSSIM BD Rate even at speed 0.

Here are the ideas so far:

@shssoichiro shssoichiro added research Difficult, needs research and an in-depth discussion. compression performance needs discussion labels Jul 9, 2021
@tmatth
Copy link
Member

tmatth commented Jul 9, 2021

Refs #845

@BlueSwordM
Copy link
Contributor

IIRC, the biggest reason that rav1e is slower than aomenc is because aomenc does a massive amount of search space pruning at the higher speed preset, particularly when it comes to motion estimation.

You can see it in low motion clips vs high motion clips: aomenc and rav1e at speed 6 are similar in high motion clips in terms of speed and visual quality, but once a low motion scenes comes in, aomenc speeds up a lot more than rav1e. rav1e has no search pruning in any manner.

That pruning also applies to block size selection and transform size partitioning, especially with rectangular partitions: at speed 6, they restriction partition selection from 8x8-32x32 transforms.

Another factor is that rav1e's scene-detection and frame type selection is fully done during the encoding process. aomenc does this as well, but not as heavily as it can rely on its default 1st pass to do a lot of the heavy lifting. That why using the no-scene-detection flag with a very fast external scene-detection program or with master-of-zen's work(which should be merged IMO) nicely speeds up the encoder.

And last of all, >CPU-5 disables all loop restoration in aomenc. That alone gives it an absolutely massive speed boost at 50-70% average encoding framerates at a cost to metrics.

Finally, default aomenc parameters in video coding tends to favor artifact prevention over raw detail and psycho-visual optimizations, which means most metrics usually prefer aomenc over rav1e's performance.

All in all, some suggestions:

  1. Starting to implement search pruning: motion estimation, partition search, and transform size.
  2. Higher speed scene-detection via merging of master-of-zen's patch.
  3. Implementing a stronger CDEF implementstion, and perhaps low aggressivity wiener filtering if someone has the time to do it.
  4. Improving luma coding performance is a good goal, but it must not come obviously at hurting rav1e's strengths. For example, while not using grain synthesis and with default parameters, rav1e's low light performance is better than both SVT-AV1 and aomenc. Same thing as with color: it handily trounces anything but CJXL intra coding.

That is all from me, for now.

@tdaede
Copy link
Collaborator

tdaede commented Jul 9, 2021

I think we'll need some sort of solution to solve the vastly different luma/chroma balance if we want to benchmark ourselves against libaom. I'd rather not just tune the balance to win the benchmark, but rather change our benchmark for this particular case, e.g. run on grayscale, or have a special tune option specifically choosing quantizers similar to libaom.

@shssoichiro
Copy link
Collaborator Author

shssoichiro commented Jul 12, 2021

From what I can see on the case of dark720, it looks like the source is very noisy and aomenc smooths out the noise more than rav1e, resulting in a significantly smaller file. So in this case, rav1e is producing a file that is closer to the original, but at a much higher filesize, which is not good for BD Rate. Not immediately sure what the solution for that case is.

@kornelski
Copy link
Contributor

kornelski commented Jul 12, 2021

In my tests partition_range had huge impact on speed, so I think fast heuristics for block split will be very helpful.

I suggest being careful with luma/chroma balance, because visual metrics usually handle color badly. I'm not entirely sure, but I think libvmaf ignores color entirely for SSIM. The SSIM algorithm has a luminance component, so it would be absurd if applied to Cb/Cr channels.

If you're going to change color balance, verify with butteraugli at very high bitrates. My DSSIM should be OK too, especially at lower bitrates (it does SSIM without the luminance component when comparing color).

@ghost
Copy link

ghost commented Jul 12, 2021

@BlueSwordM took the words right out my mouth

@BlueSwordM
Copy link
Contributor

There's also another very important factor to take into account when comparing aomenc and SVT-AV1 against rav1e:
unless I missed something while parsing the code, rav1e never voluntarily denoises the input.

aomenc and SVT-AV1 use temporal denoising on the input over some types of frames, with aomenc giving specific control over it with arnr-strength=X, with a range of 0-6, with 5 being the default.

I've yet to do an AWCY run detailing what happens when you disable ARNR denoising entirely, but from my subjective and anecdotal tests, it can have a large impact on quality, speed and metrics, especially in some hard content like video games.

Just a tip.

@BlueSwordM
Copy link
Contributor

BlueSwordM commented Mar 17, 2022

So basically, one of the 1st steps we should do to improve quality is implement the full set of CDEF search strengths.

The current method, which is picking CDEF strength from the current quantizer(so CDEF Pick from Q) is good for higher fidelity encoding, but certainly not optimal for keeping clean edges at lower bitrates.

However, the full set of CDEF search strength is a bit problematic for fidelity, as it can result in slight blurring in high frequency AC blocks(hair, skin, grass, noise, etc).

Therefore, my idea would be to separate the CDEF tuning 2 categories:

  1. With the PSNR tune, the full set of CDEF search strengths per speed level will always be available.
  2. With the psychovisual tune, the full set of CDEF search strengths per speed level will still be available, but as you decrease the quantizer/block, a curve could be used to limit what strengths the implemented CDEF algorithm can choose in the 1st place.

Furthermore, since CDEF can actually hurt fidelity when a lot of noise is present, a simple noise estimation algorithm could be used to disable CDEF filtering once enough noise reaches the threshold(also based on quantizer somewhat).

@shssoichiro
Copy link
Collaborator Author

A couple of items that came up today:

  • Quantization matrices. In aomenc, these provide a substantial compression improvement basically for free.
  • Delta-q coding. Haven't looked in depth much, just curious what we can do with it.

@doctortheemh
Copy link
Contributor

doctortheemh commented May 20, 2022

I went through the task list and found several items that are tagged compression performance that seem to referencing tools that aren't implemented yet. This might account for some of the delta, too. It might be valuable to triage these based on their potential.

@CartoonFan
Copy link

@shssoichiro Do you think it would be a good idea to pin this issue? It seems pretty important, IMO.

@shssoichiro shssoichiro pinned this issue Sep 25, 2022
@shssoichiro
Copy link
Collaborator Author

Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.

@CartoonFan
Copy link

Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.

Maybe? I've seen it on some other repos, but I can't really say when they started popping up.

Thanks for pinning and replying!

@redzic redzic unpinned this issue Nov 9, 2022
@redzic redzic pinned this issue Nov 9, 2022
@redzic
Copy link
Collaborator

redzic commented Nov 9, 2022

Sorry for unpinning 😅 I accidentally clicked the button, I pinned it back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression performance needs discussion research Difficult, needs research and an in-depth discussion. speed performance
Projects
None yet
Development

No branches or pull requests

8 participants