Skip to content

Latest commit

 

History

History
678 lines (484 loc) · 20.5 KB

IndirectDrawing.md

File metadata and controls

678 lines (484 loc) · 20.5 KB

D3D12 Indirect Drawing


Contents


Summary

This document describes D3D12 features needed to allow applications to generate command buffers on the GPU.

Motivation

Some game developers see significant performance advantages by moving scene-traversal and culling onto the GPU. This is hard to do with the D3D API because D3D requires command buffers to be generated by the CPU. This proposal contains additions to D3D12 which would allow a limited degree of GPU-based command buffer generation.


Detailed Design


Overview

A new API object is added to D3D12, the command signature. This object enables applications to specify:

  • The indirect argument buffer format

  • The command type that will be used (DrawInstanced, DrawIndexedInstanced, Dispatch)

  • The set of resource bindings which will change per-command call versus the set which will be inherited

At startup, an application would create a small set of command signatures. At runtime, the application would fill a buffer with commands (via whatever means that application chooses). The application would then use D3D12 command list APIs to set state (render target bindings, PSO, etc), and then use a command list API to cause the GPU to interpret the contents of the indirect argument buffer according to the format defined by a particular command signature.

For example, suppose an application wants a unique root constant to be specified per-draw call in the indirect argument buffer. The application would create a command signature that enables the indirect argument buffer to specify the following parameters per draw call:

  • Draw arguments (Vertex Count, Instance Count, ...)

  • The value of 1 root constant

The indirect argument buffer generated by the application would contain an array of fixed-size records. Each structure corresponds to 1 draw call. Each structure contains the drawing arguments, and the value of the root constant. The number of draw calls is specified in a separate GPU-visible buffer.

An example command buffer generated by the application would look like:

Command Buffer Format
RootConstant (RootParameterIndex=1) Draw structure #1
VertexCount
InstanceCount
StartVertexLocation
StartInstanceLocation
RootConstant (RootParameterIndex=1) Draw structure #2
VertexCount
InstanceCount
StartVertexLocation
StartInstanceLocation
RootConstant (RootParameterIndex=1) Draw structure #3
VertexCount
InstanceCount
StartVertexLocation
StartInstanceLocation

Indirect Argument Buffer Structures

The following structures define how particular arguments appear in an indirect argument buffer. These structures do not appear in any D3D12 API. Applications use these definitions when writing to an indirect argument buffer (with the CPU or GPU)

typedef struct D3D12_DRAW_ARGUMENTS
{
    UINT VertexCountPerInstance;
    UINT InstanceCount;
    UINT StartVertexLocation;
    UINT StartInstanceLocation;
} D3D12_DRAW_ARGUMENTS;

typedef struct D3D12_DRAW_INDEXED_ARGUMENTS
{
    UINT IndexCountPerInstance;
    UINT InstanceCount;
    UINT StartIndexLocation;
    INT BaseVertexLocation;
    UINT StartInstanceLocation;
} D3D12_DRAW_INDEXED_ARGUMENTS;

typedef struct D3D12_DISPATCH_ARGUMENTS
{
    UINT ThreadGroupCountX;
    UINT ThreadGroupCountY;
    UINT ThreadGroupCountZ;
} D3D12_DISPATCH_ARGUMENTS;

typedef struct D3D12_VERTEX_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    UINT StrideInBytes;
} D3D12_VERTEX_BUFFER_VIEW;

typedef struct D3D12_INDEX_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    DXGI_FORMAT Format;
} D3D12_INDEX_BUFFER_VIEW;

typedef struct D3D12_CONSTANT_BUFFER_VIEW
{
    D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
    UINT SizeInBytes;
    UINT Padding;
} D3D12_CONSTANT_BUFFER_VIEW;

Command signature Creation

Applications use the following API to create a command signature.

typedef enum D3D12_INDIRECT_PARAMETER_TYPE
{
    D3D12_INDIRECT_PARAMETER_DRAW,
    D3D12_INDIRECT_PARAMETER_DRAW_INDEXED,
    D3D12_INDIRECT_PARAMETER_DISPATCH,
    D3D12_INDIRECT_PARAMETER_VERTEX_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_INDEX_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_CONSTANT,
    D3D12_INDIRECT_PARAMETER_CONSTANT_BUFFER_VIEW,
    D3D12_INDIRECT_PARAMETER_SHADER_RESOURCE_VIEW,
    D3D12_INDIRECT_PARAMETER_UNORDERED_ACCESS_VIEW,
} D3D12_INDIRECT_PARAMETER_TYPE;

typedef struct D3D12_INDIRECT_PARAMETER
{
    D3D12_INDIRECT_PARAMETER_TYPE Type;
    union
    {
        struct
        {
            UINT Slot;
        } VertexBuffer;

        struct
        {
            UINT RootParameterIndex;
            UINT DestOffsetIn32BitValues;
            UINT Num32BitValuesToSet;
        } Constant;

        struct
        {
            UINT RootParameterIndex;
        } ConstantBufferView;

        struct
        {
            UINT RootParameterIndex;
        } ShaderResourceView;

        struct
        {
            UINT RootParameterIndex;
        } UnorderedAccessView;
    };
} D3D12_INDIRECT_PARAMETER;

typedef struct D3D12_COMMAND_SIGNATURE
{
    // The number of bytes between each drawing structure
    UINT ByteStride;
    UINT ParameterCount;
    const D3D12_INDIRECT_PARAMETER* pParameters;
} D3D12_COMMAND_SIGNATURE;

HRESULT ID3D12Device::CreateCommandSignature(
    const D3D12_COMMAND_SIGNATURE* pDesc,
    ID3D12RootSignature* pRootSignature,
    REFIID riid, // Expected: ID3D12CommandSignature
    void** ppCommandSignature
);

The ordering of arguments within an indirect argument buffer is defined to exactly match the order of arguments specified in D3D12_COMMAND_SIGNATURE::pArguments. All of the arguments for 1 draw/dispatch call within an indirect argument buffer are tightly packed. However, applications are allowed to specify an arbitrary byte stride between draw/dispatch commands in an indirect argument buffer.

The root signature must be specified if and only if the command signature changes one of the root arguments.

For root SRV/UAV/CBV, the application specified size in in bytes. The debug layer will validate the following restrictions on the sizes and address:

  1. CBV -- Address and size must be a multiple of 256 bytes

  2. Raw UAV -- Address and size must be a multiple of 4 bytes

  3. Typed UAV -- Address and size must be a multiple of the UAV format size

  4. Structured UAV -- Address and size must be a multiple of the structure byte stride (declared in the shader)

  5. SRV - Address and size must be a multiple of the SRV format size

A given command signature is either a draw or a compute command signature. If a command signature contains a drawing operation, then it is a graphics command signature. Otherwise, the command signature must contain a dispatch operation, and it is a compute command signature.

Graphics command signatures only affect graphics root arguments. Likewise, compute command signatures only affect compute root arguments.


Example Command signatures


Plain MultiDrawIndirect

In this example, the indirect argument buffer generated by the application holds an array of 36-byte structures. Each structure only contains the 5 parameters passed to DrawIndexedInstanced (plus padding).

The code to create the command signature description is:

D3D12_INDIRECT_PARAMETER Args[1];
Args[0].Type = D3D12_INDIRECT_PARAMETER_DRAW_INDEXED_INSTANCED;

D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 36;
ProgramDesc.ArgumentCount = 1;
ProgramDesc.pArguments = Args;

The layout of a single structure within an indirect argument buffer is:

Bytes 0:3 IndexCountPerInstance
Bytes 4:7 InstanceCount
Bytes 8:11 StartIndexLocation
Bytes 12:15 BaseVertexLocation
Bytes 16:19 StartInstanceLocation
Bytes 20:35 Padding

Root Constants + Vertex Buffers

In this example, each structure in an indirect argument buffer changes 2 root constants, changes 1 vertex buffer binding, and performs 1 drawing non-indexed operation. There is no padding between structures.

The code to create the command signature description is:

D3D12_INDIRECT_PARAMETER Args[4];
Args[0].Type = D3D12_INDIRECT_PARAMETER_CONSTANT;
Args[0].Constant.RootParameterIndex = 2;
Args[1].Type = D3D12_INDIRECT_PARAMETER_CONSTANT;
Args[1].Constant.RootParameterIndex = 6;
Args[2].Type = D3D12_INDIRECT_PARAMETER_VERTEX_BUFFER;
Args[2].VertexBuffer.VBSlot = 3;
Args[3].Type = D3D12_INDIRECT_PARAMETER_DRAW_INSTANCED;

D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 40;
ProgramDesc.ArgumentCount = 4;
ProgramDesc.pArguments = Args;

The layout of a single structure within the indirect argument buffer is:

Bytes 0:3 Data for root parameter index 2
Bytes 4:7 Data for root parameter index 6
Bytes 8:15 Virtual address of VB (64-bit)
Bytes 16:19 VB stride
Bytes 20:23 VB size
Bytes 24:27 VertexCountPerInstance
Bytes 28:31 InstanceCount
Bytes 32:35 StartVertexLocation
Bytes 36:39 StartInstanceLocation

Validation

The runtime will validate the following:

  • There is exactly 1 entry defining the draw/dispatch parameters (either D3D12_INDIRECT_PARAMETER_DRAW_INSTANCED or D3D12_INDIRECT_PARAMETER_DRAW_INDEXED_INSTANCED or D3D12_INDIRECT_PARAMETER_DISPATCH). This entry must come last.

  • A D3D12_INDIRECT_PARAMETER_INDEX_BUFFER entry can only be present if there is also a D3D12_INDIRECT_PARAMETER_DRAW_INDEXED_INSTANCED present.

  • ByteStride is 4-byte aligned

  • ByteStride is large enough to hold all data

  • All resource bindings are compatible with the root signature.

  • Each root parameter slot is defined no more than once

  • Root parameter slots do not exceed range defined by root signature

  • If there are index buffer changes, the index buffer format is valid

  • If there are vertex buffer changes, the vb slot index is within the range allowed by D3D

  • Entries that reference root parameter slots are sorted from smallest to largest root parameter index

  • Root constant entries are sorted from smallest to largest DestOffsetIn32BitValues (including no overlap

  • If D3D12_INDIRECT_PARAMETER_DISPATCH is used, then VB and IB bindings cannot be changed

  • The root signature is specified if and only if the command signature indicates that 1 of the root arguments changes

  • For root constants DestOffsetIn32BitValues + Num32BitValuesToSet is within range defined by the root signature

  • No VB slot is changed more than once

  • Only 1 IB change per command signature is allowed

  • Num32BitValuesToSet must be greater than 0

  • ByteStride must be a multiple of 4 bytes


Drawing

Applications perform indirect draws/dispatches via the following API:

void ID3D12CommandList::ExecuteIndirect(
    ID3D12CommandSignature* pCommandSignature,
    UINT MaxCommandCount,
    ID3D12Resource* pArgumentBuffer,
    UINT64 ArgumentBufferOffset,
    ID3D12Resource* pCountBuffer,
    UINT64 CountBufferOffset
);

There are 2 ways that command counts can be specified:

If pCountBuffer is not NULL, then MaxCommandCount specifies the maximum number of operations which will be performed. The actual number of operations to be performed are defined by the minimum of this value, and a 32-bit unsigned integer contained in pCountBuffer (at the byte offset specified by CountBufferOffset).

If pCounterBuffer is NULL, the MaxCommandCount specifies the exact number of operations which will be performed.

The semantics of this API are defined with the following pseudo-code:

Non-NULL pCountBuffer:

// Read draw count out of count buffer
UINT CommandCount = pCountBuffer->ReadUINT32(CountBufferOffset);
CommandCount = min(CommandCount, MaxCommandCount)

// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;

for(UINT CommandIndex = 0; CommandIndex \< CommandCount; CommandIndex++)
{
    // Interpret the data contained in *Arguments
    // according to the command signature
    pCommandSignature->Interpret(Arguments);
    Arguments += pCommandSignature ->GetByteStride();\
}

NULL pCountBuffer:

// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;

for(UINT CommandIndex = 0; CommandIndex \< MaxCommandCount;CommandIndex++)
{
  // Interpret the data contained in *Arguments
  // according to the command signature
  pCommandSignature->Interpret(Arguments);
  Arguments += pCommandSignature ->GetByteStride();\
}

The debug layer will issue an error if either the count buffer or the argument buffer are not in the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state.

The core runtime will validate:

  • CountBufferOffset and ArgumentBufferOffset are 4-byte aligned

  • pCountBuffer and pArgumentBuffer are buffer resources (any heap type)

  • The offset implied by MaxCommandCount, ArgumentBufferOffset, and the drawing program stride do not exceed the bounds of pArgumentBuffer (similarly for count buffer)

  • The command list is a direct command list or a compute command list (not a copy or JPEG decode command list)

The debug layer will validate:

  • The root signature of the command list matches the root signature of the command signature

ID3D12CommandList::DrawInstancedIndirect and ID3D12CommandList::DrawIndexedInstancedIndirect are removed from the D3D12 API because they can be implemented with the features described here.


Bundles

ID3D12CommandList::ExecuteIndirect is allowed inside of bundle command lists only if all of the following are true:

  1. CountBuffer is NULL (CPU-specified count only)

  2. The command signature contains exactly 1 operation. This implies that the command signature does not contain root arguments changes, nor contain VB/IB binding changes.


State leakage

ExecuteIndirect is defined to reset all bindings affected by the ExecuteIndirect to known values. In particular.

  • If the command signature binds a VB to a particular slot, then after ExecuteIndirect is called, a NULL VB is bound to that slot

  • If the command signature binds an IB, then after ExecuteIndirect, a NULL IB is bound.

  • If the command signature sets a root constant, then after ExecuteIndirect is called, the root constant value is set to 0

  • If the command signature sets a root view (CBV/SRV/UAV), then after ExecuteIndirect is called, the root view is set to a NULL view.

This enables drivers to easily track bindings. This is implemented by the D3D12 runtime by making a series of DDI calls after the ExecuteIndirect is called.


Obtaining buffer virtual addresses

A new API is added whereby an application can retrieve the GPU virtual address of a buffer.

typedef UINT64 D3D12_GPU_VIRTUAL_ADDRESS;

D3D12_GPU_VIRTUAL_ADDRESS ID3D12Resource::GetGPUVirtualAddress();

Applications are free to apply byte offsets to virtual addresses before placing them in an indirect argument buffer. Note that all of the D3D12 alignment requirements for VB/IB/CB still apply to the resulting GPU virtual address.

This API returns 0's for non-buffer resources.


Implementation Details

Both of the following implementations are acceptable:

  • Make the GPU command processor interpret the indirect argument buffer contents in the application-defined format

  • Allocate enough storage in the current command buffer to hold MaxCommandCount draws. Execute a compute shader to transform from the application-specified format to a hardware-specific format in the allocated command buffer space.

Note that in the 2^nd^ approach, the hidden compute shader invocations associated with many ExecuteIndirect calls can be combined together. If a command list has no resource transition barrier to the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state, then it is safe to move all of the hidden compute shader invocations to the beginning of the command list.


GPU Validation

In order to achieve consistent behavior across machines, GPUs are expected to perform the following validation:

  1. The draw count specified in the indirect argument buffer is guaranteed to not exceed the MaxCommandCount specified in the ExecuteIndirect API call. This is achieved by having the GPU compute min(MaxCommandCount, CommandCount).

Test Plan


Runtime Functional Tests

  • Validation during command signature creation √

  • Validation during ExecuteIndirect √

  • GetGPUVirtualAddress returns 0 for non-buffer resources √

  • Debug layer validation of buffer states √

  • Debug layer validation of buffer contents

  • The runtime sets state to correct default values after ExecuteIndirect (also the runtime should correctly select compute vs graphics root arguments to change) √

  • Debug layer validation of buffer alignments (taking into account the SRV/UAV format/structure byte stride).

  • Indirect argument structures have no padding √

  • ComandListAPITestBase::Execute is updated to call ExecuteIndirect √

  • CommandListTest::InvalidBundleAPI √

  • CommandListTest::SetCommandListError √

  • CommandSignature object takes a reference on root signature object √

  • Debug layer warning if a command signature is destroyed while there is outstanding work queued against it (like a PSO)

  • Runtime validation in GetGPUVA √

  • GetGPUVA works with placed resources √

  • GetGPUVA works with reserved resources √

  • CreateCommandSignature correctly handles the case where the first parameter changes a root argument (validation of increasing orders handles the first case). √

  • Debug layer validation during ExecuteIndirect

  • Debug layer validation of index buffer formats

  • Debug layer validation of tiled constant buffers not being supported

  • The runtime hard-coded array size of 64 constants is the correct size √


Driver Conformance Tests

  • Drivers work correctly when CountBuffer is NULL and when CountBuffer is non-NULL

  • ExecuteIndirect works correctly inside of a bundle

  • Applications can add byte offsets to GPU virtual addresses

  • Arbitrary command signatures produce the same results as corresponding non-indirect rendering API calls

  • Arbitrary 4-byte aligned byte strides are supported

  • Indirect argument buffer & Count buffer can be in any heap type

  • GPU computes min(MaxCommandCount, CommandCount)

  • Drivers do not elide hidden shaders across ResourceBarrier(->INDIRECT_ARGS) state.

  • Predication works correctly

  • Queries work correctly

  • GetGPUVA works with committed resources, placed resources, and reserved resources

  • GetGPUVA works with opened shared resources

  • CountBufferOffset & ArgumentBufferOffset are interpreted correctly

  • When setting root constants, DestOffsetIn32BitValues and Num32BitValuesToSet are interpreted correctly

  • Changing root arguments works with both compute and graphics

  • Drivers correctly handle shader-visibility and deny flags when setting root arguments via ExecuteIndirect

  • Out-of-bounds behavior correctly applies based on the app-specified buffer sizes (for CBV/SRV/UAV)

  • ExecuteIndirect works on compute queues

  • MaxCommandCount==0 works correctly

  • Root SRVs and UAVs work with all supported formats

  • Tiled root SRVs and UAVs work correctly (with offsets)