Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use DXGI to present OpenGL frames on Windows #94503

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

alvinhochun
Copy link
Contributor

@alvinhochun alvinhochun commented Jul 18, 2024

godotengine/godot-proposals#10242

This is an experiment using the WGL_NV_DX_INTEROP2 extension to present frames rendered by the native OpenGL renderer. There are some compile-time options near the top of gl_manager_windows_native.cpp to change the behaviour of the code. The default tries to aim for the least latency, but this needs proper testing.

Notes:

  • The way it works is: We first create a D3D11 device and a DXGI swap chain. Then for each frame we get the DXGI back buffer, pass it to OpenGL, then bind it to an FBO, and set GLES3::TextureStorage::system_fbo so that RasterizerGLES3 blits the render targets onto it. On swap buffers, we present the frame with DXGI and, if vsync is enabled, wait for the frame latency waitable object.
  • Because D3D11 has the screen origin on the top-left as opposed to bottom-left in OpenGL, I am using a hack to flip the screen Y in RasterizerGLES3. Thankfully render targets are already rendered on their own FBOs so this introduces no additional overhead compared to presenting with native OpenGL.
  • This does not use DirectComposition. I've read that it may have some benefits, and is supposedly the only way we can get smooth resize with flip model swap chains, so perhaps it will be worth trying.

On Windows 10, when independent flip is engaged (according to PresentMon), I seem to be able to get 2 frames 1 frame latency with v-sync on, less than 2 ms latency with v-sync off. (No variable refresh rate.)

Please test this if you are able to. To verify that it is using DXGI, pass --verbose and you should see "GLManagerNative_Windows: Presenting with D3D11 DXGI swap chain." on the output.

I like testing with this project: spinning-cube_multiwindow.zip

@dsnopek
Copy link
Contributor

dsnopek commented Jul 18, 2024

I am using a hack to flip the screen Y in RasterizerGLES3

I haven't had a chance to really look at the code changes here, but we already have some hacks to flip Y in RasterizerGLES3 (which is already a source of debugging headaches), so if it's possible to consolidate these hacks in any way, that would be great :-)

@Calinou
Copy link
Member

Calinou commented Jul 18, 2024

On Windows 10, when independent flip is engaged (according to PresentMon), I seem to be able to get 2 frames latency with v-sync on, less than 2 ms latency with v-sync off. (No variable refresh rate.)

This checks out with triple buffering being used. If you were able to opt into using double buffering instead, you could get a single frame of latency at the cost of less stable framerates when the framerate drops below the monitor refresh rate.

The NVIDIA Control Panel has a setting to control whether triple buffering is used in OpenGL applications.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, I get a crash on startup with the message Segmentation fault when the project manager starts (the crash handler doesn't run). The window is created, but it only ever renders black before crashing a second later.

Vulkan-based rendering methods work fine with the same binary I compiled. I used MSVC 2022 to compile Godot.

WinDbg says:

(3bf0.175c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
nvwgf2umx!NVAPI_DirectMethods+0x6594d:
00007fff`a77254ed 488b12          mov     rdx,qword ptr [rdx] ds:feeefeee`feeefeee=????????????????
PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090 (driver 555.85)
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Windows 11 23H2

@alvinhochun
Copy link
Contributor Author

On Windows 10, when independent flip is engaged (according to PresentMon), I seem to be able to get 2 frames latency with v-sync on, less than 2 ms latency with v-sync off. (No variable refresh rate.)

This checks out with triple buffering being used. If you were able to opt into using double buffering instead, you could get a single frame of latency at the cost of less stable framerates when the framerate drops below the monitor refresh rate.

Turns out the reason for 2 frames is stupid: I forgot to wait for the frame latency waitable object before rendering the first frame. Fixing that gives 1 frame of latency (actually less than 1).

The number of frames in the present queue is controlled by IDXGISwapChain2::SetMaximumFrameLatency, currently set to 1, which is double buffering (if the code had waited properly.) For actual triple buffering it should be set to 2.

Tested locally, I get a crash on startup with the message Segmentation fault when the project manager starts (the crash handler doesn't run). The window is created, but it only ever renders black before crashing a second later.

So, it was also crashing the NVIDIA driver on my Optimus setup. I am not sure if it is the same crash that you have, but from what I have managed to gather this is a use-after-free inside NVIDIA's driver inside the wglDXRegisterObjectNV call after resizing the swap chain, so the crash can only happen after resizing the window, (which the Project Manager does once during startup).

I just tried to reorder some calls. Maybe that's all it needed to fix the crash?

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally again, it mostly works as expected.

The main issue I noticed is that saving scenes will crash the editor when using the Compatibility rendering method. I couldn't consistently reproduce this though - it only happened once.

I tested this on the Material Testers demo after switching it to use the Compatibility rendering method:

ERROR: Wait for frame latency waitable failed, WaitForSingleObject returned 0x00000102
   at: GLManagerNative_Windows::DxgiSwapChain::present (platform\windows\gl_manager_windows_native.cpp:1623)
ERROR: Wait for frame latency waitable failed, WaitForSingleObject returned 0x00000102
   at: GLManagerNative_Windows::DxgiSwapChain::present (platform\windows\gl_manager_windows_native.cpp:1623)
ERROR: Present failed, HRESULT: 0x887A0005
   at: GLManagerNative_Windows::DxgiSwapChain::present (platform\windows\gl_manager_windows_native.cpp:1631)
ERROR: Failed to connect D3D11 color buffer to WGL for interop. Error: Error 3221684334:
   at: GLManagerNative_Windows::DxgiSwapChain::setup_render_target (platform\windows\gl_manager_windows_native.cpp:1467)

================================================================
CrashHandlerException: Program crashed
Engine version: Godot Engine v4.3.beta.custom_build (0d88147bdb5e4b3e74505c9318e3e9d0a830b111)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[0] <couldn't map PC to fn name>
[1] <couldn't map PC to fn name>
[2] <couldn't map PC to fn name>
[3] GLManagerNative_Windows::DxgiSwapChain::lock_for_opengl (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\gl_manager_windows_native.cpp:1526)
[4] GLManagerNative_Windows::DxgiSwapChain::present (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\gl_manager_windows_native.cpp:1637)
[5] GLManagerNative_Windows::swap_buffers (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\gl_manager_windows_native.cpp:856)
[6] DisplayServerWindows::swap_buffers (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\display_server_windows.cpp:3049)
[7] RasterizerGLES3::end_viewport (C:\Users\Hugo\Documents\Git\godotengine\godot\drivers\gles3\rasterizer_gles3.cpp:116)
[8] RendererViewport::draw_viewports (C:\Users\Hugo\Documents\Git\godotengine\godot\servers\rendering\renderer_viewport.cpp:832)
[9] RenderingServerDefault::_draw (C:\Users\Hugo\Documents\Git\godotengine\godot\servers\rendering\rendering_server_default.cpp:88)
[10] RenderingServerDefault::draw (C:\Users\Hugo\Documents\Git\godotengine\godot\servers\rendering\rendering_server_default.cpp:410)
[11] Main::iteration (C:\Users\Hugo\Documents\Git\godotengine\godot\main\main.cpp:4118)
[12] ProgressDialog::_update_ui (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\progress_dialog.cpp:134)
[13] ProgressDialog::task_step (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\progress_dialog.cpp:225)
[14] EditorNode::progress_task_step (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\editor_node.cpp:4876)
[15] EditorProgress::step (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\editor_node.h:949)
[16] EditorNode::_save_scene_with_preview (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\editor_node.cpp:1657)
[17] EditorNode::_menu_option_confirm (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\editor_node.cpp:2723)
[18] EditorNode::_menu_option (C:\Users\Hugo\Documents\Git\godotengine\godot\editor\editor_node.cpp:1438)
[19] call_with_variant_args_helper<EditorNode,int,0> (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\binder_common.h:304)
[20] call_with_variant_args<EditorNode,int> (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\binder_common.h:418)
[21] CallableCustomMethodPointer<EditorNode,int>::call (C:\Users\Hugo\Documents\Git\godotengine\godot\core\object\callable_method_pointer.h:103)
[22] Callable::callp (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\callable.cpp:57)
[23] Object::emit_signalp (C:\Users\Hugo\Documents\Git\godotengine\godot\core\object\object.cpp:1188)
[24] Node::emit_signalp (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\node.cpp:3896)
[25] Object::emit_signal<int> (C:\Users\Hugo\Documents\Git\godotengine\godot\core\object\object.h:936)
[26] PopupMenu::activate_item (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\gui\popup_menu.cpp:2435)
[27] PopupMenu::activate_item_by_event (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\gui\popup_menu.cpp:2357)
[28] MenuBar::shortcut_input (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\gui\menu_bar.cpp:166)
[29] Node::_call_shortcut_input (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\node.cpp:3361)
[30] SceneTree::_call_input_pause (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\scene_tree.cpp:1242)
[31] Viewport::_push_unhandled_input_internal (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\viewport.cpp:3303)
[32] Viewport::push_input (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\viewport.cpp:3265)
[33] Window::_window_input (C:\Users\Hugo\Documents\Git\godotengine\godot\scene\main\window.cpp:1690)
[34] call_with_variant_args_helper<Window,Ref<InputEvent> const &,0> (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\binder_common.h:304)
[35] call_with_variant_args<Window,Ref<InputEvent> const &> (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\binder_common.h:418)
[36] CallableCustomMethodPointer<Window,Ref<InputEvent> const &>::call (C:\Users\Hugo\Documents\Git\godotengine\godot\core\object\callable_method_pointer.h:103)
[37] Callable::callp (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\callable.cpp:57)
[38] Callable::call<Ref<InputEvent> > (C:\Users\Hugo\Documents\Git\godotengine\godot\core\variant\variant.h:876)
[39] DisplayServerWindows::_dispatch_input_event (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\display_server_windows.cpp:3554)
[40] DisplayServerWindows::_dispatch_input_events (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\display_server_windows.cpp:3524)
[41] Input::_parse_input_event_impl (C:\Users\Hugo\Documents\Git\godotengine\godot\core\input\input.cpp:775)
[42] Input::flush_buffered_events (C:\Users\Hugo\Documents\Git\godotengine\godot\core\input\input.cpp:1056)
[43] DisplayServerWindows::process_events (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\display_server_windows.cpp:3020)
[44] OS_Windows::run (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\os_windows.cpp:1665)
[45] widechar_main (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\godot_windows.cpp:180)
[46] _main (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\godot_windows.cpp:206)
[47] main (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\godot_windows.cpp:220)
[48] WinMain (C:\Users\Hugo\Documents\Git\godotengine\godot\platform\windows\godot_windows.cpp:234)
[49] __scrt_common_main_seh (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288)
[50] <couldn't map PC to fn name>
-- END OF BACKTRACE --
================================================================

PS: An indirect consequence of this PR is that it makes RTX HDR work out of the box with the Compatibility rendering method, whereas you previously had to go to NVIDIA Control Panel and set the present method to Prefer layered over DXGI swapchain. This is nice to see 🙂

@Calinou
Copy link
Member

Calinou commented Jul 22, 2024

It might be worth trying this out with this PR to compare it to master and other rendering methods: https://gpuopen.com/learn/frame-latency-meter-flm-1-0/

@alvinhochun
Copy link
Contributor Author

Thanks for testing. The crash is because I haven't yet written the code to gracefully handle the DXGI_ERROR_DEVICE_REMOVED case. I don't have any clue why this would happen during save, but it can happen in other situations so I will have to handle it eventually.

@alvinhochun
Copy link
Contributor Author

My idea on how to use the frame latency waitable:

  • Wait inside DisplayServerWindows::process_events before processing input.
    • Supposedly Godot can run CPU-bound processing before the waitable is signalled, but I don't know any processing in Godot that does not depend on handling user input.
  • If the current focused window has v-sync enabled, we wait for this window to present.
  • If the current focused window have v-sync disabled:
    • If there are other windows with v-sync enabled, wait for any one of these to present (wait until the earliest one has presented).
    • If none of the other windows have v-sync enabled, don't wait at all.

@KeyboardDanni
Copy link
Contributor

Nice! Will we have a sister PR for DXGI and Vulkan?

@alvinhochun
Copy link
Contributor Author

Nice! Will we have a sister PR for DXGI and Vulkan?

It's not really on my list yet, no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants