Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fog crashes on certain mobile devices in WASM #8506

Closed
oscrim opened this issue Apr 27, 2023 · 13 comments · Fixed by #8508
Closed

Fog crashes on certain mobile devices in WASM #8506

oscrim opened this issue Apr 27, 2023 · 13 comments · Fixed by #8508
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior P-Crash A sudden unexpected crash

Comments

@oscrim
Copy link

oscrim commented Apr 27, 2023

Bevy version

670f3f0

Relevant system information

`AdapterInfo { name: "Adreno (TM) 660", vendor: 20803, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Gl }`

What you did

I ran a basic 3D Shapes example on WASM on my phone.

What went wrong

I expected the example the example to run like it did on my computer but instead it crashes with the error

Shader translation error for stage VERTEX | FRAGMENT | VERTEX_FRAGMENT:  log.target = "wgpu::backend::direct";log.module_path = "wgpu::backend::direct";
panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
      note: label = `pbr_opaque_mesh_pipeline`
    Internal error in VERTEX | FRAGMENT | VERTEX_FRAGMENT shader: 

Additional information

I first asked about it on your discord here.

Using git bisect I was able to find the commit that introduced the bug which is 1a96d82.

Further by removing the lines

    if (fog.mode != FOG_MODE_OFF && (material.flags & STANDARD_MATERIAL_FLAGS_FOG_ENABLED_BIT) != 0u) {
        output_color = apply_fog(output_color, in.world_position.xyz, view.world_position.xyz);
    }

in pbr.wgsl, in bevy_pbr, the example once again runs on my phone.

What is interesting however is that my colleague ran it on his phone without issue, his adapter info is the following.

AdapterInfo { name: "Mali-G78", vendor: 5045, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Gl }

Furthermore I tried running it on my tablet which also crashes, I unfortunately cannot read logs from the tablet but it uses the Adreno 619 GPU.

@oscrim oscrim added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Apr 27, 2023
@oscrim
Copy link
Author

oscrim commented Apr 27, 2023

After some more testing Ive found that removing

scattering += pow(
                max(
                    dot(view_to_world_normalized, light.direction_to_light),
                    0.0
                ),
                fog.directional_light_exponent
            ) * light.color.rgb;

from apply_fog in pbr_functions.wgsl also makes the example run

@oscrim
Copy link
Author

oscrim commented Apr 27, 2023

@coreh Do you have any idea what could be the issue?

@coreh
Copy link
Contributor

coreh commented Apr 28, 2023

Unfortunately, I don't... Can you try removing specific parts of that expression to see what could be tripping up the shader compiler?

Like the dot(), or the pow(), or specific terms within them?

What browser are you using? Firefox, Chrome?

@oscrim
Copy link
Author

oscrim commented Apr 28, 2023

I have tried it on Chrome and Firefox without success. I tried removing both dot() and pow() without any success, still crashes when only doing

scattering += light.color.rgb;

@oscrim
Copy link
Author

oscrim commented Apr 28, 2023

Also found out that after removing the scattering, transparency is also broken as shown here when running the transparency_3d example. You can see when the shadows disappear is when the object is supposed to be completely see-through. But this bug was not introduced in the same commit but instead 603cb43

2023-04-28.09-47-22.mp4

Edit: Ill edit this comment as its not really connected to this issue, but from running the blend_modes example it seems the materials arent properly being updated.

image

As we can see we are able to see through the Add and Multiply but when I change the alpha value on my phone or try any of the other functionalities the materials dont change (but the alpha value top right does). Also changing the alpha to a lower value from the start does nothing.

@TotalKrill
Copy link
Contributor

I assume this is actually the same or similar bug as in #8047 and maybe #4582. The common theme seems to be that Adreno chipset, so I assume there is something that the adreno does differently. Looking at where they landed with the issues, it also seems that the WGSL stuff is actually the error.

This issue has gone throught the git bisect though and has narrowed down basically the lines of wgsl code that seems to break. I dont know if the same lines would fix the other issues, or even if both issues should still be open

@TotalKrill
Copy link
Contributor

TotalKrill commented Apr 28, 2023

@coreh Since you have some more experience with WGSL than us, could you point us in a direction on how we could proceed in trying to debug this? Basically now, we are just acting like monkeys and adding removing, snippets of code from that, recompiling the bevy examples and loading them on the phone. But the only feedback we get is basically works/not works. Its, a bit suboptimal...

Can one get some debug information out from what crashes, I am working on the assumptions that the Adreno chipsets (very common it seems) are to blame.

from the the gfx-rs/wgpu#4487, it seems this could be a workaround:

At least some adreno compilers don't like returning an element of a UBO array that is a structure in the vertex shader. 
To work this around we have to copy the each of the structure fields.

however I do not know how to proceed in doing that

@oscrim
Copy link
Author

oscrim commented Apr 28, 2023

it turns out that changing it to

            //let light = lights.directional_lights[i];
            scattering += pow(
                max(
                    0.0,
                    0.0
                ),
                fog.directional_light_exponent
            ) * vec3<f32>(1.0);

also makes the example run

@TotalKrill
Copy link
Contributor

Presumably, this means that the lights.directional_lights array, is not the same size as the lights.n_directional_lights indicates. And that trying to access values in the lights.directional_light[i] will crash....

@oscrim
Copy link
Author

oscrim commented Apr 28, 2023

Another update, the shader will run without modification with the following change in light.rs

pub const MAX_DIRECTIONAL_LIGHTS: usize = 1;

A value of 2 or more will cause the crash

@coreh
Copy link
Contributor

coreh commented Apr 29, 2023

Hmm. What I don't fully understand is if the problem is the amount of directional lights, and they are being read from elsewhere in the code (within the lighting code) why does only the read in the fog code cause the translation issue, and not the other one?

Since you have some more experience with WGSL than us

I really don't have that much more experience with it, for the most part I have been just winging it 😜 For debugging I also just comment out random things and return different colors.

@james7132 james7132 added A-Rendering Drawing game state to the screen P-Crash A sudden unexpected crash and removed S-Needs-Triage This issue needs to be labelled labels May 9, 2023
@gnargfu
Copy link

gnargfu commented May 10, 2023

Working with @TotalKrill I've looked into this and from what I've found is that the data seems to be mangled at some point. I have as of yet not been able to determine precisely where that is as my knowledge of shaders, wgpu, bevy, naga and the Adreno architecture is quite infantile as of yet.

I managed to fix the issue with the transparency by moving the standard alpha blending bitflags to be lower in the u32 holding them. This tells me that there is probably something going on with how the wgsl gets the data. But were in the chain of things it happens is beyond me at this time. In the end I just added the fix that was proposed for this (limiting the directional lights) as a feature that can be added if the end user is targeting mobile webgl.

@gnargfu
Copy link

gnargfu commented May 11, 2023

It seems as though the Adreno series are alone in having an alignment of 64 bytes. According to the 'best practices' on the Qualcomm developer network each UBO should be no more than 8k. Even though they report that the maxUniformBufferRange is 65536 I've seen comments stating that the driver reports an inaccurate value and people are getting anomalies in the data much sooner. With 10 lights we are currently at 13120 bytes (aligned) which should be safe but that is the lights alone. I'm currently trying to understand the pipeline to see exactly how much is stuffed into the buffer in our examples.

github-merge-queue bot pushed a commit that referenced this issue Jun 29, 2023
# Objective

- This fixes a crash when loading shaders, when running an Adreno GPU
and using WebGL mode.
- Fixes #8506 
- Fixes #8047

## Solution

- The shader pbr_functions.wgsl, will fail in apply_fog function, trying
to access values that are null on Adreno chipsets using WebGL, these
devices are commonly found in android handheld devices.

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Co-authored-by: François <mockersf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior P-Crash A sudden unexpected crash
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants