diff --git a/docs/articles/2024/20240709-pytorch-fig-benchmark.png b/docs/articles/2024/20240709-pytorch-fig-benchmark.png index 76ee30103..83492e385 100644 Binary files a/docs/articles/2024/20240709-pytorch-fig-benchmark.png and b/docs/articles/2024/20240709-pytorch-fig-benchmark.png differ diff --git a/docs/articles/2024/20240709-pytorch.md b/docs/articles/2024/20240709-pytorch.md index 477ae981b..3038d09d0 100644 --- a/docs/articles/2024/20240709-pytorch.md +++ b/docs/articles/2024/20240709-pytorch.md @@ -2,6 +2,8 @@ *Published:* *July 11th, 2024* +*Updated:* *July 19th, 2024*. Figure 3 has been improved for readability. + *By:* *[Emanuele Bezzi](mailto:ebezzi@chanzuckerberg.com), [Pablo Garcia-Nieto](mailto:pgarcia-nieto@chanzuckerberg.com), [Prathap Sridharan](mailto:psridharan@chanzuckerberg.com), [Ryan Williams](mailto:ryan.williams@tiledb.com)* The Census team is excited to share the release of Census PyTorch loaders that work out-of-the-box for memory-efficient training across any slice of the >70M cells in Census. @@ -106,14 +108,14 @@ The balance between memory usage, efficiency, and level of randomness can be adj We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud. -In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). For this reason, we have disable the intermediate data processing in sparse format unless Torch Sparse Tensors are requested via the `ExperimentDataPipe` parameter `return_sparse_X`. +In our benchmarks, we found that densifying data increases training speed while maintaining relatively constant memory usage (Figure 3). For this reason, we have disabled the intermediate data processing in sparse format unless Torch Sparse Tensors are requested via the `ExperimentDataPipe` parameter `return_sparse_X`. ```{figure} ./20240709-pytorch-fig-benchmark.png :alt: Census PyTorch loaders benchmark :align: center :figwidth: 80% -**Figure 3. Benchmark of memory usage and speed of data processing during modeling, default parameters lead to 3K+ samples/sec with 27GB of memory.** The benchmark was done processing 4M cells out of a 10M-cell Census, data was fetched from the cloud (S3). "Method" indicates the expression matrix encoding, circles are dense (np.array) and squares are sparse (scipy.csr). Size indicates the total number of cells per processing block (max cells materialized at any given time) and color is the number of individual randomly grabbed chunks composing a processing block, higher chunks per block lead to better shuffling. Data was fetched until modeling step, but no model was trained. +**Figure 3. Benchmark of memory usage and speed of data processing during modeling, default parameters lead to ≈2,500 samples/sec with 27GB of memory use.** The benchmark was done processing 4M cells out of a 10M-cell Census, with data streamed from the cloud (S3). "Method" indicates the expression matrix encoding: circles are dense ("np.array", now the default behavior) and squares are sparse ("scipy.csr"). Size indicates the total number of cells per processing block (max cells materialized at any given time) and color is the number of individual randomly grabbed chunks composing a processing block; higher chunks per block lead to better shuffling. Data was fetched until modeling step, but no model was trained. ``` We repeated the benchmark in Figure 3 in different conditions encompassing varying number of total cells and multiple epochs, please [follow this link for the full benchmark report and code.](https://github.com/ryan-williams/arrayloader-benchmarks).