Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🍱 Extra data and pre-batch shuffle on train datapipe #14

Merged
merged 3 commits into from
Jun 1, 2023

Commits on May 30, 2023

  1. Configuration menu
    Copy the full SHA
    2115641 View commit details
    Browse the repository at this point in the history
  2. 👔 Shuffle chips before batching instead of in-batch shuffling

    Randomizing the order of the chips before creating mini-batches, because the train_eval.hdf5 contains all the non-zero labels while california_*.hdf5 contain all zero labels. The shuffling causes a roughly 2x slowdown from 1it/s to 2it/s. Also cherry-picked a9b3b95 to have a buffer_size of -1 in the demux DataPipe.
    weiji14 committed May 30, 2023
    Configuration menu
    Copy the full SHA
    8efb5f2 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2023

  1. 🔀 Merge branch 'main' into extra-data-and-pre-batch-shuffle

    Commented out the extra california_*.hdf5 data for now.
    weiji14 committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    e9b7255 View commit details
    Browse the repository at this point in the history