Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Add support for opaque, alpha mask, and alpha blend modes #3072

Conversation

superdump
Copy link
Contributor

@superdump superdump commented Nov 5, 2021

EDIT: The depth prepasses performed the same or worse on a Radeon Pro 5500M, mobile RTX 3080, and GTX 1070 and as such the depth prepass was removed from the PR prior to merge. Depth prepass can be revisited when it is needed and provides a benefit.

Objective

Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.

Solution

NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂

There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!

  • An AlphaMode component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
  • Two depth prepasses are added:
    • Opaque does a plain vertex stage
    • Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
    • Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
  • Three main passes are added:
    • Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
    • The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
    • The opaque fragment shading is as before except that alpha is explicitly set to 1.0
    • Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
    • Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
    • Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator

@superdump
Copy link
Contributor Author

As a note, I have tested this with FlightHelmet, Sponza, and AlphaBlendModeTest from https://github.com/KhronosGroup/glTF-Sample-Models/ . Here is a screenshot from AlphaBlendModeTest:
image
(Ignore the shadows from the directional light. I haven't done anything sensible with the shadow map projection.)

@superdump superdump force-pushed the alpha-modes-depth-prepass branch 2 times, most recently from 3502c68 to 8bacd0c Compare November 7, 2021 22:28
@cart
Copy link
Member

cart commented Nov 12, 2021

I think its worth calling out that this adds 20ms to many_cubes_pipelined, dropping FPS from ~30 to ~19.5 on my machine. We're already "too slow" when it comes to drawing cubes with StandardMaterial. I don't think further regressing is a good idea. We should sort out whats costing us perf here.

@cart
Copy link
Member

cart commented Nov 12, 2021

Pushed some small (hopefully uncontroversial) style / naming changes. Feel free to push back on anything though.

@cart
Copy link
Member

cart commented Nov 12, 2021

We should sort out whats costing us perf here.

Almost certainly just the additional depth prepass. many_cubes_pipelined is a worst case scenario: no overdraw, no alpha materials, etc. So we're just drawing things twice. Have you tested this with something like the Bistro scene? Is the performance better or worse?

@cart
Copy link
Member

cart commented Nov 12, 2021

Yeah perf is tanking due to the depth prepass command encoding:
image

@cart
Copy link
Member

cart commented Nov 12, 2021

This seems like a case where we could encode in parallel (as long as we submit in the right order). But thats kind of just punting the problem until we have saturated available cores.

@superdump
Copy link
Contributor Author

Pushed some small (hopefully uncontroversial) style / naming changes. Feel free to push back on anything though.

While I am obviously fine with using let () = if .. {} else {}; I'm also fine with not using it. And the rest is all good.

@superdump
Copy link
Contributor Author

Yeah perf is tanking due to the depth prepass command encoding: image

What are the vertical lines denoting in terms of time intervals in the graph? Definitely looks like parallel building of command buffers would help there, but indeed also that maybe a depth prepass is not good for that scene. I will do some testing with and without a depth prepass also with scenes that are at the other extreme like the Emerald Square City scene from NVIDIA's ORCA scenes which has a bunch of trees with lots of alpha mask leaves where without a depth prepass it will have to discard in the main pass fragment shader.

@superdump
Copy link
Contributor Author

superdump commented Nov 12, 2021

…maybe a depth prepass is not good for that scene. I will do some testing with and without a depth prepass also with scenes that are at the other extreme like the Emerald Square City scene from NVIDIA's ORCA scenes which has a bunch of trees with lots of alpha mask leaves where without a depth prepass it will have to discard in the main pass fragment shader.

I’ve done some testing in macOS with an i9 9980HK and Radeon Pro 5500M of more advanced scenes with alpha cutout and transparency like Amazon Bistro and Emerald Square City. In Bistro the frame rate was possibly very slightly worse without a depth prepass (~5.4-6.0fps) versus with (~5.8-6.0), but it was very close. In many_cubes_pipelined I got ~9.8fps without depth prepass and ~5.7fps with. And in Emerald City Square looking over the square across the trees I got ~5.4fps without and ~4.7fps with. I want to do a bit of testing in Windows on the Ryzen 5900HS + mobile RTX 3080 to see how that performs as I’ve had problems with vsync on macOS previously. It’s looking like for now the command encoding is the bottleneck rather than overdraw or shader divergence though.

bors bot pushed a commit that referenced this pull request Nov 12, 2021
Adds new `EntityRenderCommand`, `EntityPhaseItem`, and `CachedPipelinePhaseItem` traits to make it possible to reuse RenderCommands across phases. This should be helpful for features like #3072 . It also makes the trait impls slightly less generic-ey in the common cases.

This also fixes the custom shader examples to account for the recent Frustum Culling and MSAA changes (the UX for these things will be improved later).
This allows differentiation between opaque / mask / blend.
@cart
Copy link
Member

cart commented Nov 13, 2021

What are the vertical lines denoting in terms of time intervals in the graph?

IIRC it was ~1.8ms (because each node took ~17ms to run)

@cart cart changed the title Add depth prepass and support for opaque, alpha mask, and alpha blend modes Add support for opaque, alpha mask, and alpha blend modes Nov 13, 2021
cart and others added 2 commits November 13, 2021 14:06
many_cubes_pipelined performance takes a big hit from doing a depth prepass.
@superdump
Copy link
Contributor Author

Rebased on top of pipelined-rendering, hopefully correctly.

use bevy_ecs::reflect::ReflectComponent;
use bevy_reflect::Reflect;

// FIXME: This should probably be part of bevy_render2!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cart Where do you think this should live?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set of available values here is specific to the bevy_pbr2 implementation, so I think the current location is good for now.

flags |= StandardMaterialFlags::NORMAL_MAP_TEXTURE;
}
let has_normal_map = material.normal_map_texture.is_some();
// NOTE: 0.5 is from the glTF default - do we want this?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following the glTF "0.5 is the default cutoff" is reasonable. If people want a different cutoff then they just have to overwrite the cutoff value. @cart

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this seems reasonable to me. We can adjust later as needed.

),
// TODO: change this to StandardMaterialUniformData::std140_size_static once crevice fixes this!
// Context: https://github.com/LPGhatguy/crevice/issues/29
min_binding_size: BufferSize::new(64),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also no longer needed after the crevice rewrite.

@superdump
Copy link
Contributor Author

Hrm, a bunch of scenes worked fine, but then I tried the AlphaModeBlendTest from Khronos glTF samples and it fails with:

Nov 13 14:34:26.163 ERROR wgpu::backend::direct: Handling wgpu errors as fatal by default    
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_shader_module
    Entry point fragment at Fragment is invalid
    Required uniformity of control flow for IMPLICIT_LEVEL in [153] is not fulfilled because of Discard

I wonder why that specifically triggers this issue as I would have thought all of the textureSample above the discard could potentially trigger it. I'll have to have a look what [153] is I suppose. I don't have time right now though. Next time.

@superdump
Copy link
Contributor Author

I realised the problem is the textureSample that comes after the discard as the discard introduces non-uniform control flow as some fragments in the pixel quad may be discarded and others not. I moved the discard down until after the normal map textureSample call and it all works fine again now.

In order for texture sampling to be able to do mipmapping, it must be
done uniformly across a pixel quad. As such, the discard must come after
all textureSample calls. Note that one can use textureSampleLevel to
remove the requirement for uniformity, but then one has to supply a
specific mip level.
load: LoadOp::Clear(0.0),
store: true,

{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, but this seems ripe for parallel command encoding (once we start doing that). Breaking this up into 3 separate nodes might make that easier.

Copy link
Member

@cart cart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I just pushed two small commits:

  • Removes an unnecessary scope
  • Factored out ViewTarget color attachment construction boilerplate

@cart
Copy link
Member

cart commented Nov 16, 2021

bors r+

bors bot pushed a commit that referenced this pull request Nov 16, 2021
# Objective

Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.

## Solution

NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂 

There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!

* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
  * Opaque does a plain vertex stage
  * Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
  * Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
  * Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
  * The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
  * The opaque fragment shading is as before except that alpha is explicitly set to 1.0
  * Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
  * Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
  * Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors
Copy link
Contributor

bors bot commented Nov 16, 2021

@bors bors bot changed the title Add support for opaque, alpha mask, and alpha blend modes [Merged by Bors] - Add support for opaque, alpha mask, and alpha blend modes Nov 16, 2021
@bors bors bot closed this Nov 16, 2021
bors bot pushed a commit that referenced this pull request Nov 19, 2021
# Objective

Allow shadow mapping to be enabled/disabled per-light.

## Solution

- NOTE: This PR is on top of #3072
- Add `shadows_enabled` boolean property to `PointLight` and `DirectionalLight` components.
- Do not update the frusta for the light if shadows are disabled.
- Do not check for visible entities for the light if shadows are disabled.
- Do not fetch shadows for lights with shadows disabled.
- I reworked a few types for clarity: `ViewLight` -> `ShadowView`, the bulk of `ViewLights` members -> `ViewShadowBindings`, the entities Vec in `ViewLights` -> `ViewLightEntities`, the uniform offset in `ViewLights` for `GpuLights` -> `ViewLightsUniformOffset`

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Enhancement A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants