Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit tests for PyTorch classes #477

Open
2 tasks
atolopko-czi opened this issue May 12, 2023 · 3 comments
Open
2 tasks

Add unit tests for PyTorch classes #477

atolopko-czi opened this issue May 12, 2023 · 3 comments
Assignees
Labels
backlog-2023H2 Work considered for 2023 H2 but de-prioritized due to resources P1 Priority 1 - Improvement with wide impact, fix within 1 week python api Related to the API pytorch

Comments

@atolopko-czi
Copy link
Collaborator

atolopko-czi commented May 12, 2023

Census PyTorch ExperimentDataPipe unit test coverage is already substantial. However, there are a few stubbed tests that could implemented.

In addition add tests for:

  • The obs / X data is joined correctly (X rows correspond to their respective obs soma_joinids)
  • The var / X data joined correctly (gene ordering matches var query filter)
@pablo-gar pablo-gar added the python api Related to the API label May 16, 2023
@atolopko-czi
Copy link
Collaborator Author

Added unit tests for various batching cases, sparse/dense output, and encoders

@atolopko-czi atolopko-czi self-assigned this May 20, 2023
@atolopko-czi atolopko-czi added P1 Priority 1 - Improvement with wide impact, fix within 1 week labels Jun 8, 2023
@pablo-gar pablo-gar added backlog-2023H2 Work considered for 2023 H2 but de-prioritized due to resources pytorch labels Oct 11, 2023
@prathapsridharan
Copy link
Contributor

@pablo-gar (cc: @atolopko-czi @ebezzi ) - Is this intentionally left open to serve as a placeholder for unit test tasks for pytorch? I ask because the ticket is was created in May 2023.

If so, should we create tickets for specific test cases for pytorch? For example, we don't have any testing for the scatter-gather-shuffle algorithm - It is a non-deterministic algorithm and testing it needs some thought. Should we make a specific ticket for this case and link it to this ticket?

@atolopko-czi
Copy link
Collaborator Author

This is definitely an old ticket!

There are two stubbed/ignored tests that are intended to test coverage for

if (
self.return_sparse_X
and torch.utils.data.get_worker_info()
and torch.utils.data.get_worker_info().num_workers > 0
):
raise NotImplementedError(
"torch does not work with sparse tensors in multi-processing mode "
"(see https://github.com/pytorch/pytorch/issues/20248)"
)
. But honestly, we need to eliminate the multi-worker support entirely (it's redundant w/TileDB threaded I/O), so IMO those test stubs can be deleted.

Regarding the other two test suggestions, above:

 The obs / X data is joined correctly (X rows correspond to their respective obs soma_joinids)

This appears to be implicitly tested by various methods such as test_batching__all_batches_full_size, test_batching__exactly_one_batch, and test_experiment_dataloader__batched, where both the obs and X contents of a batch are asserted in tandem. Could still be worth adding a test that is named appropriately to explicitly capture the requirement, e.g. test_obs_and_X_are aligned.

 The var / X data joined correctly (gene ordering matches var query filter)

Similarly, this is implicitly tested by tests that assert the ordered values within a row of X, but I would add an explicit test for this. And particularly for sparse tensors, I would test that the expected data at a given var coordinate is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog-2023H2 Work considered for 2023 H2 but de-prioritized due to resources P1 Priority 1 - Improvement with wide impact, fix within 1 week python api Related to the API pytorch
Projects
None yet
Development

No branches or pull requests

3 participants