Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataloader pre-trained embeddings support to Merlin Models #1083

Merged
merged 16 commits into from
May 17, 2023

Conversation

gabrielspmoreira
Copy link
Member

@gabrielspmoreira gabrielspmoreira commented May 4, 2023

Fixes #1070, Fixes #1071, Fixes #1068, Fixes #1072, Fixes #1073

Goals ⚽

Tasks

Implementation Details 🚧

  • Creates the PretrainedEmbeddings block that takes pre-trained embedding features and optionally projects them to a dim with a linear layer, applies a sequence aggregator and normalizes (e.g. with l2-norm). All of these options are configurable
  • The InputBlockV2 was changed to accept an optional pretrained_embeddings argument, which by default selects features tagged with the EMBEDDING tag,
  • PrepareListFeatures was changed to define the shape of the last dim the pre-trained embeddings provided by the Loader, as that dim is None in graph mode.

Testing Details 🔍

  • Created many tests demonstrating how pre-trained embeddings can be used with models like DLRM, DCN, sequential Transformer models with BroadcastFeatures and with causal and masked language modeling SequenceMasking classes
  • I took this opportunity also to try and speed up many tests by reducing drastically the cardinality of categorical features in some dataset schemas used for synthetic data generation. Many tests had to be updated to match the new cardinalities.

@gabrielspmoreira gabrielspmoreira self-assigned this May 4, 2023
@gabrielspmoreira gabrielspmoreira added the enhancement New feature or request label May 4, 2023
@gabrielspmoreira gabrielspmoreira added this to the Merlin 22.05 milestone May 4, 2023
@gabrielspmoreira gabrielspmoreira marked this pull request as draft May 4, 2023 03:36
@github-actions
Copy link

github-actions bot commented May 4, 2023

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1083

Copy link
Contributor

@sararb sararb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @gabrielspmoreira! This looks good to me, I've just left some minor comments.

@@ -96,7 +96,12 @@ def MLPBlock(

for idx, dim in enumerate(dimensions):
dropout_layer = None
activation_idx = activation if isinstance(activation, str) else activation[idx]
activation = activation or "linear"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that if activation is None, we will use "linear" by default. If yes, can we set the default value of activation in the method's args to None ?

aggregation: Optional[TabularAggregationType], optional
Transformation block to apply for aggregating the inputs, by default None
block_name: str, optional
Name of the block, by default "embeddings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name of the block, by default "embeddings"
Name of the block, by default "pretrained_embeddings"

@@ -1238,6 +1335,8 @@ def process_str_sequence_combiner(
combiner = tf.keras.layers.Lambda(lambda x: tf.reduce_mean(x, axis=1))
elif combiner == "sum":
combiner = tf.keras.layers.Lambda(lambda x: tf.reduce_sum(x, axis=1))
elif combiner == "max":
combiner = tf.keras.layers.Lambda(lambda x: tf.reduce_max(x, axis=1))
else:
raise ValueError(
"Only 'mean' and 'sum' str combiners is implemented for dense"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only 'mean' and 'sum' str combiners is implemented for dense"
"Only 'mean', 'sum', and 'max' str combiners is implemented for dense"

else:
inputs = tf.linalg.l2_normalize(inputs, axis=axis)
inputs = self._l2_norm(inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to specify the axis parameter in this line

) -> Union[tf.Tensor, tf.SparseTensor, tf.RaggedTensor]:
"""Computes L2-norm for a given axis, typically axis = -1.
Equivalent to tf.linalg.l2_normalize(), but that function
does not support tf.RaggedTensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks for the fix!

for col in schema:
table_name = col.name

tables[table_name] = NoOp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable to allow users to do things like normalization?

@gabrielspmoreira gabrielspmoreira merged commit 8efbd36 into main May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment