Skip to content

v0.7.0

Compare
Choose a tag to compare
@snarayan21 snarayan21 released this 06 Nov 01:23
· 129 commits to main since this release
4e8c944

🚀 Streaming v0.7.0

Streaming v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml-streaming==0.7.0

📈 Better Defaults for StreamingDataset (#479)

  • The default values for StreamingDataset have been updated to be more performant and are applicable for most use cases, detailed below:
Parameter Old Value New Value Benefit
shuffle_algo py1s py1e Better shuffle and balanced downloading
num_canonical_nodes 64 * physical nodes if py1s or py2s, 64 * physical_nodes, otherwise physical_nodes Consistently good shuffle for all shuffle algos
shuffle_block_size 262,144 4,000,000 / num_canonical_nodes Consistently good shuffle for all num_canonical_nodes values
predownload max(batch_size, 256 * batch_size // num_canonical_nodes) 8 * batch_size Better balanced downloading
partition_algo orig relaxed More flexible deterministic resumptions on nodes

💎 New Features

🤖 Streaming Simulator: Easily simulate the performance of training configurations. (#385)

  • After installing this version of streaming, simply run the command simulator in your terminal to open the simulation interface.
  • Simulate throughput, network downloads, shuffle quality, and cache limit requirements for configurations.
  • Easily de-risk runs and find performant parameter settings.
  • Check out the docs for more information!

🔢 More flexible deterministic training and resumption (#476)

  • Deterministic training and resumptions are now possible on more numbers of nodes!
  • Previously, the num_canonical_nodes parameter had to divide or be a multiple of the number of physical nodes for determinism.
  • Now, deterministic training is possible on any number of nodes that also evenly divides your run's global batch size.

🐛 Bug Fixes

  • Check for invalid hash algorithm names (#486)

What's Changed

Full Changelog: v0.6.1...v0.7.0