Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dataloader Unittest - which broke by new DL structure #1782

Merged
merged 5 commits into from
Mar 16, 2023

Conversation

bschifferer
Copy link
Contributor

No description provided.

@bschifferer
Copy link
Contributor Author

rerun tests

2 similar comments
@bschifferer
Copy link
Contributor Author

rerun tests

@bschifferer
Copy link
Contributor Author

rerun tests

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

len(x) for x in data[mh_name][idx * batch_size : idx * batch_size + n_samples]
]
assert (nnzs == np.array(lens)).all()
array, offsets = X[f"{mh_name}__values"], X[f"{mh_name}__offsets"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using merlin.table.TensorTable here would simplify this and also make this code work with the previous version of the dataloader too.

mh_col = TensorTable(X)[f"{mh_name}"]
values, offsets = mh_col.values.numpy(), mh_col.offsets.numpy()

nested_data_col = tf.RaggedTensor.from_row_lengths(
batch[0]["data"][0][:, 0], tf.cast(batch[0]["data"][1][:, 0], tf.int32)
nested_data_col = tf.RaggedTensor.from_row_splits(
batch[0]["data__values"], tf.cast(batch[0]["data__offsets"], tf.int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using TensorTable here would remove the need to specify the particular naming convention we have adopted for the dictionary keys containing values/offsets.

X = TensorTable(batch[0])

tf.RaggedTensor.from_row_splits((X["data"].values, X["data"].offsets, tf.int32)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to leave the test as is ? I was not familiar with TensorTable

@@ -56,4 +56,9 @@ def test_example_03():

"""
)
for cell in tb.cells:
cell.source.replace(
"device_memory_limit=device_limit", "# device_memory_limit=device_limit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to disable these parameters? and is this related to the change to the dataloader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is not related to the dataloader changes.

I think we changed the CI tests. We define a CUDA Cluster with multiple GPUs. We use RMM Pool to reserve GPU memory for this CUDA Cluster. It is more efficient (faster), if we reserve a fixed size of GPU memory. The code tries to reserve X% of total GPU memory (not free). If we run multiple tests in parallel, other tests block GPU memory and we cannot reserve it.

I deactive the parameters that we use the default behavior (not using RMM)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth adding something along these lines as a comment above these lines that may help our future selves recovering context of why these parameters need to be changed

@bschifferer bschifferer merged commit fdb8715 into NVIDIA-Merlin:main Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants