Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_tensor_type method to TensorTable for framework column conversion #285

Merged

Conversation

oliverholworthy
Copy link
Member

@oliverholworthy oliverholworthy commented Apr 13, 2023

Adds a convenience method to to the TensorTable for converting columns between framework types.

Example usage

table = TensorTable({"feature": np.array([1, 2, 3])})

table.as_tensor_type(torch.Tensor).to_dict()
# => {'a': torch.tensor([1, 2, 3])}

table.as_tensor_type(TorchColumn).to_dict()
# => {'a': torch.tensor([1, 2, 3])}
table = TensorTable({"feature": np.array([1, 2, 3])})
table.as_tensor_type("unknown")
# => Raises `ValueError`
# ValueError: tensor_type argument must be a type. Received: <class 'str'> 
# Supported values are: 'numpy.ndarray', 'torch.Tensor', 'tensorflow.python.framework.ops.Tensor'. 
# Or TensorColumn Types: 'merlin.table.NumpyColumn', 'merlin.table.TorchColumn', 'merlin.table.TensorflowColumn'

@oliverholworthy oliverholworthy added the enhancement New feature or request label Apr 13, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.04 milestone Apr 13, 2023
@oliverholworthy oliverholworthy self-assigned this Apr 13, 2023
@oliverholworthy
Copy link
Member Author

One of the reasons for the method being called .to was to be consistent with the Merlin DTypes API. However, after some further consideration maybe it's not that consistent because to returns a different type from the original.

from merlin.dtypes import dtype

dtype("int64")
# => DType(name='int64', element_type=<ElementType.Int: 'int'>, element_size=64, element_unit=None, signed=True, shape=Shape(dims=None))

dtype("int64").to("tensorflow")
# => tf.int64

Perhaps we could borrow the name astype from pandas which does preserve the original type. Or if we'd like to reserve that for changing the dtypes of the columns rather than the tensor types, maybe .as_tensor_type

@karlhigley
Copy link
Contributor

Those all sound like good suggestions to me, and I can see any of them working okay

@karlhigley karlhigley modified the milestones: Merlin 23.04, Merlin 23.05 Apr 25, 2023
@github-actions
Copy link

github-actions bot commented May 4, 2023

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-285

@oliverholworthy oliverholworthy changed the title Add to method to TensorTable for framework column conversion Add as_tensor_type method to TensorTable for framework column conversion May 4, 2023
@oliverholworthy oliverholworthy marked this pull request as ready for review May 4, 2023 18:11
@oliverholworthy
Copy link
Member Author

I've re-requested your review @karlhigley since I've made some updates that change things quite a bit since you last reviewed.

  • renamed .to to .as_tensor_type
  • changed the argument type from a string to Union[Type, TensorColumn]

@karlhigley karlhigley merged commit 5e75937 into NVIDIA-Merlin:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants