Feature to automatically choose batch size #1615

williamFalcon · 2020-04-26T14:52:21Z

Let's add a flag:

# default False
Trainer(auto_scale_batch_size=True)

This should do binary search on batch size:

Run a few train steps using the current batch size.
if OOM batch_size / 2.
If no OOM, batch_size = batch_size * 1.5.

And so on until we find the optimal batch size. At this point log it so the user knows (including tensorboard), and continue training with the new batch size.

Ideally the user fixes the batch size in future runs to tune the learning rate.

quinor · 2020-04-27T10:53:35Z

I'd recommend a proper binsearch instead, so more like:

start with batch_size = 1
double batch_size until OOM
binsearch between batch_size//2 and batch_size

Alternative mode: just use the highest power of 2 batch size that fits.

LoicGrobol · 2020-04-27T12:04:04Z

I am doing this right now

def max_gpu_batch_size(
    dataset: data.TextDataset,
    finetuner: pl.LightningModule,
    n_samples: int = 128,
    device: Union[torch.device, int] = 0,
) -> int:
    """
    Tries to find a maximal batch size for a device, assuming only that the memory usage of the
    model and the total available memory are both stable.

    Should be reliable, but slow, you probably only want to run it once.
    """
    device = torch.device(device)  # type: ignore
    device_max_mem = torch.cuda.get_device_properties(device.index).total_memory

    def test_run(batch_size):
        logger.debug(f"Trying a run with batch size {batch_size}")
        with tempfile.TemporaryDirectory() as temp_dir:
            torch.cuda.empty_cache()
            torch.cuda.reset_peak_memory_stats(device)
            loader = data.TextLoader(dataset, batch_size=batch_size)
            trainer = pl.Trainer(
                default_save_path=temp_dir,
                overfit_pct=n_samples / len(loader),
                gpus=[device.index],
                max_epochs=2,
            )
            try:
                trainer.fit(finetuner, train_dataloader=loader)
            except RuntimeError as e:
                if "CUDA out of memory" in str(e):
                    logger.debug("Exceeded memory capacity")
                    return None
                else:
                    raise e
        usage = torch.cuda.max_memory_allocated(device)
        logger.debug(f"Registered usage: {usage} / {device_max_mem} B")
        return usage

    # Find a majoration of max batch size as a power of two
    usage_with_min_size = 0
    for exponent in range(math.floor(math.log2(n_samples)) + 1):
        max_size = 2 ** exponent
        usage_with_max_size = test_run(max_size)
        if usage_with_max_size is None:
            break
        # This will only change as long as we don't break out, at which point it will
        # equal the usage for the previous test run
        usage_with_min_size = usage_with_max_size
    if usage_with_max_size is not None:
        logger.warning(
            f"Ran out of examples without finding a match batch size (max tried: {max_size})"
            ", you probably want to try with more examples"
        )

    # Bissect to find the max batch size
    min_size = max_size // 2
    while max_size - min_size > 1:
        try_size = (max_size + min_size) // 2
        usage_with_try_size = test_run(try_size)
        if usage_with_try_size is None:
            max_size = try_size
        else:
            min_size = try_size
            usage_with_min_size = usage_with_try_size
    logger.debug(
        f"Mem usage with inferred batch size: {usage_with_min_size} / {device_max_mem} B"
    )
    return min_size

However, I usually have to downsize it, since in distributed mode, you have the additional requirement that the device batch size should be a multiple of the number of devices used.

ma-batita · 2021-11-08T11:28:52Z

Sorry to comment in this closed issue, but I am kind of confused in the auto_scale_batch_size=True.
It is clear to me that we need to define the batch_size in the PL.Model to activate the auto scale process.
So at this step the batch_size is a variable and it will be calculated depending on the OOM.
But, we did not mention the batch_size that is need to be declared in the dataloaders (train, val, and test) in the LightningDataModule. I am confused here, how can we let the model decide the batch size itself with the auto_scale_batch_size=True while in the other hand we need to define a batch_size for the dataloaders? (not to mention that we need olso the batch size to calculate the warmup steps and the total training steps.)
Can you please clarify this to me, in case I did miss anything or I am confusion things!

whlteXbread · 2022-07-01T01:51:01Z

@ma-batita I am guessing you've found your answer by now, but if not (and for anybody else), the docs explain this pretty clearly here: https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#batch-size-finder

williamFalcon added feature Is an improvement or enhancement help wanted Open to be worked on labels Apr 26, 2020

SkafteNicki mentioned this issue Apr 27, 2020

Feature: auto scale batch size #1638

Merged

5 tasks

williamFalcon closed this as completed in #1638 May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature to automatically choose batch size #1615

Feature to automatically choose batch size #1615

williamFalcon commented Apr 26, 2020

quinor commented Apr 27, 2020

LoicGrobol commented Apr 27, 2020

ma-batita commented Nov 8, 2021

whlteXbread commented Jul 1, 2022

Feature to automatically choose batch size #1615

Feature to automatically choose batch size #1615

Comments

williamFalcon commented Apr 26, 2020

quinor commented Apr 27, 2020

LoicGrobol commented Apr 27, 2020

ma-batita commented Nov 8, 2021

whlteXbread commented Jul 1, 2022