Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP #14442

Merged
merged 44 commits into from
Feb 22, 2023

Conversation

smk2007
Copy link
Member

@smk2007 smk2007 commented Jan 26, 2023

Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP

Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence

With the exception of ConcatFromSequence, all of the above operators were implemented with CPU kernels that a) required all of the contained tensors to also be on CPU, and b) would clone each tensor into a new sequence as a side effect of each operator. The implementation of sequences are backend agnostic, as they dont affect actual tensor layout or manipulate the contents of the tensors. In addition, with the exception of SequenceAt, the other operators need not make copies of the underlying referenced tensors.

Consequently, this change does the following:

  1. Sequence* operators (except SequenceAt) no longer copies the contents of a sequence of tensors on every kernel execution.
  2. SequenceAt uses the DataTransferManager to copy tensors agnostic to backend.
  3. The internal container implemented by TensorSeq has changed from onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor does not support copy or assignment construction, so it must have a singular owner. However, is same tensor participates in multiple containers it would have multiple container "owners" and this would not be possible.
  4. Other code that accessed values from TensorSeq have associated changes to extract Tensors from OrtValues now.

In addition, DirectML execution was very slow when the above Sequence operators were added to a graph, as this caused MemcpyToHost and MemcpyFromHost kernels to be inserted between the graph and the sequence operators. To optimize DirectML,

  1. The CPU implementations for the Sequence* ops were registered as DML implementations. Since the above changes also includes making the CPU kernel implementations EP agnostic, the CPU kernels can be added as is.
  2. The ConcatFromSequence operator needed to be implemented on DirectML. However, there was little DirectML EP operator framework support for operators that accept/output sequences of tensors. This change has modified the internal COM interfaces to include new apis to interrogate for sequence shapes, and extract the needed tensors from TensorSeq.

@fdwr fdwr requested a review from jeffbloo January 28, 2023 03:20
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@PatriceVignola
Copy link
Contributor

@yuslepukhin Thank you for the review. I'll just need a final approval if everything looks good.

@yuslepukhin
Copy link
Member

yuslepukhin commented Feb 21, 2023

We also need re-work the places where SetElements({}) is called. Let's add a Clear() method to make sure the intent is clear. We can still use the same method of releasing the buffer, or using the good old swap() idom.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PatriceVignola PatriceVignola merged commit 1b7f654 into main Feb 22, 2023
@PatriceVignola PatriceVignola deleted the user/sheilk/sequence-ops-on-dml branch February 22, 2023 02:08
PatriceVignola added a commit that referenced this pull request Feb 22, 2023
…ions agnostic to backend EP (#14442)

Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP

Opset 11 introduced the following sequence related operators:
    - SequenceAt
    - SequenceConstruct
    - SequenceEmpty
    - SequenceLength
    - SequenceErase
    - SequenceInsert
    - ConcatFromSequence

With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.

Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.

In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.

---------

Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants