Add support for ragged inputs to model #666

oliverholworthy · 2023-04-05T10:11:59Z

Part of NVIDIA-Merlin/Merlin#255

Goals ⚽

Enable Transformers4Rec model to be called with ragged input representation.

Implementation Details 🚧

Adds pre-processing step to the first part of the forward method of the model that pads any tensors in the ragged representation.
- Where there are two tensors with names {feature}__values {feature}__offests.
- Pads all to minimum of the maximum sequence in batch or the model max_sequence_length (if defined)

Testing Details 🔍

Adds a test for model with sequence inputs and passing ragged representation inputs

transformers4rec/torch/model/base.py

Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>

github-actions · 2023-04-05T10:33:30Z

Documentation preview

https://nvidia-merlin.github.io/Transformers4Rec/review/pr-666

oliverholworthy · 2023-04-05T17:19:22Z

tests/unit/torch/model/test_model.py

+    )
+    model_output = model(inference_inputs)
+
+    # if the model is traced with ragged inputs it can only be called with ragged inputs


note that when tracing the model, the representation used as the input determines what the inputs to the traced model expects. (padded vs ragged)

oliverholworthy · 2023-04-05T17:22:17Z

transformers4rec/torch/utils/padding.py

    batch_padded = {}
-    for col_name, col in TensorTable(batch).items():


TensorTable is not currently compatible with torch.jit.script compilation.

example of one of the errors that shows up (I don't think it prints out all errors, only the first it encounters -> there may be more unsupported things apart from the below example)

E torch.jit.frontend.UnsupportedNodeError: SetComp aren't supported: E File "/workspace/merlin/core/merlin/table/tensor_table.py", line 61 E def _validate_columns(self, cols_dict): E col_types = {type(col_obj) for col_obj in cols_dict.values()} E ~ <--- HERE E if len(col_types) >= 2: E raise TypeError( E '__torch__.merlin.table.tensor_table.TensorTable' is being compiled since it was called from 'pad_batch'

oliverholworthy · 2023-04-05T17:26:36Z

tests/unit/utils/test_padding.py

+                ]
+            ),
+        )
+        assert torch.equal(


dense sequence inputs are not padded as part of this pad_inputs currently. Assuming we'll either have ragged or padded sequence inputs, not a mix of both

oliverholworthy · 2023-04-05T17:28:12Z

transformers4rec/torch/model/base.py

@@ -481,6 +482,7 @@ def __init__(
        head_reduction: str = "mean",
        optimizer: Type[torch.optim.Optimizer] = torch.optim.Adam,
        name: str = None,
+        max_sequence_length: Optional[int] = None,


Added a max_sequence_length to limit the size of the padding when receiving ragged inputs.

marcromeyn · 2023-04-06T10:05:58Z

transformers4rec/torch/utils/padding.py

+
+
+@torch.jit.script
+def pad_inputs(inputs: Dict[str, torch.Tensor], max_sequence_length: Optional[int] = None):


Suggested change

def pad_inputs(inputs: Dict[str, torch.Tensor], max_sequence_length: Optional[int] = None):

def pad_inputs(

inputs: Dict[str, torch.Tensor], max_sequence_length: Optional[int] = None

) -> Dict[str, torch.Tensor]:

oliverholworthy added 3 commits April 5, 2023 10:59

Update padding to handle truncation to smaller sequence length

89f0be1

Add ragged argument to enable returning ragged representation

11c02ef

Add support for ragged inputs in model and add test for model

d86e638

oliverholworthy added the enhancement New feature or request label Apr 5, 2023

oliverholworthy added this to the Merlin 23.04 milestone Apr 5, 2023

oliverholworthy self-assigned this Apr 5, 2023

Add max_sequence_length to Model and move input padding to method

c867dee

marcromeyn reviewed Apr 5, 2023

View reviewed changes

transformers4rec/torch/model/base.py Outdated Show resolved Hide resolved

Use len("__offsets") to get feature name

919a570

Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>

oliverholworthy added 9 commits April 5, 2023 11:41

Reformat padding_lengths line

eec9df0

Add torch.jit.script decorator to pad_batch

cc1d6fb

Move pad_inputs to function and make jit scriptable

ab4019e

Only call pad_batch if is a dict of tensors

8bafdf5

Add test of model tracing to ragged inputs test

ee28762

Move pad_inputs into padding module

92925b2

update example inputs to demonstrate different batch size

b3b81d5

Update docstring for pad_inputs

29fa413

Add tests for pad_inputs

08c79d6

oliverholworthy marked this pull request as ready for review April 5, 2023 17:18

oliverholworthy commented Apr 5, 2023

View reviewed changes

Update test of pad_inputs to make test clearer

686696c

oliverholworthy commented Apr 5, 2023

View reviewed changes

oliverholworthy requested a review from marcromeyn April 5, 2023 17:47

marcromeyn reviewed Apr 6, 2023

View reviewed changes

marcromeyn approved these changes Apr 6, 2023

View reviewed changes

oliverholworthy merged commit feda04b into NVIDIA-Merlin:main Apr 7, 2023

oliverholworthy mentioned this pull request Apr 11, 2023

Have a padding operator and add Support for ragged inputs that we can use consistently with Transformers for rec NVIDIA-Merlin/systems#322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ragged inputs to model #666

Add support for ragged inputs to model #666

oliverholworthy commented Apr 5, 2023 •

edited

Loading

github-actions bot commented Apr 5, 2023

oliverholworthy Apr 5, 2023

oliverholworthy Apr 5, 2023

oliverholworthy Apr 5, 2023

oliverholworthy Apr 5, 2023

marcromeyn Apr 6, 2023

		batch_padded = {}
		for col_name, col in TensorTable(batch).items():



		@torch.jit.script
		def pad_inputs(inputs: Dict[str, torch.Tensor], max_sequence_length: Optional[int] = None):

Add support for ragged inputs to model #666

Add support for ragged inputs to model #666

Conversation

oliverholworthy commented Apr 5, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

github-actions bot commented Apr 5, 2023

Documentation preview

oliverholworthy Apr 5, 2023

Choose a reason for hiding this comment

oliverholworthy Apr 5, 2023

Choose a reason for hiding this comment

oliverholworthy Apr 5, 2023

Choose a reason for hiding this comment

oliverholworthy Apr 5, 2023

Choose a reason for hiding this comment

marcromeyn Apr 6, 2023

Choose a reason for hiding this comment

oliverholworthy commented Apr 5, 2023 •

edited

Loading