Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dataset_processes configurable #651

Merged
merged 1 commit into from
Sep 29, 2023
Merged

Commits on Sep 28, 2023

  1. Make dataset_processes configurable

    I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
    
    This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
    corbt committed Sep 28, 2023
    Configuration menu
    Copy the full SHA
    66b3d5f View commit details
    Browse the repository at this point in the history