LR finder broken #2101

feribg · 2020-06-07T05:05:19Z

#614 🐛 Bug

To Reproduce

Steps to reproduce the behavior:

model = TestModel()
trainer = pl.Trainer(gpus=1, default_save_path=exp_path, max_epochs=100)   

def configure_optimizers(self):
        optim = torch.optim.Adam(self.parameters(), lr=self.lr)
        sched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, 'min')
        return [optim], [sched]

# Run learning rate finder
lr_finder = trainer.lr_find(model)
# Results can be found in
lr_finder.results
# Plot with
fig = lr_finder.plot(suggest=True)
fig.show()

The following returns consistently:

 optimizer got an empty parameter list

The regular .fit method works as expected.

PL version: '0.7.6'

The text was updated successfully, but these errors were encountered:

github-actions · 2020-06-07T05:06:02Z

Hi! thanks for your contribution!, great first issue!

Borda · 2020-06-07T09:47:14Z

@SkafteNicki mind have look? 🐰

SkafteNicki · 2020-06-09T07:29:51Z

I am not sure this has anything to do with the learning rate finder. The error:
optimizer got an empty parameter list
usually indicates that self.parameters() is empty, see this link for example.

Could you provide the full model (or a colab notebook) so I can look through it?

feribg · 2020-06-09T17:30:06Z

It's a bit hard to prepare the colab, so here's an end to end model that's exactly what i have just using a toy dataset.

As an aside question, is it possible to chain the schedulers and have one take precedence, so for instance have a cosine annealing scheduler by default but apply LR reduce still or just call fit multiple times and in between change the lr parameter or any parameter really by hand ?

class Normalize(nn.Module):
    def __init__(self, mean, std):
        super(Normalize, self).__init__()
        self.mean = mean
        self.std = std
        assert self.std.min() > 0
    
    def forward(self, x):
        return (x-self.mean)/self.std

class Swish(nn.Module):
    def forward(self, input_tensor):
        return input_tensor * torch.sigmoid(input_tensor)
    
class Mish(nn.Module):
    def forward(self, input_tensor):
        return input_tensor * torch.tanh(F.softplus(input_tensor))   

def sigmoid_range(x, low, high):
    "Sigmoid function with range `(low, high)`"
    return torch.sigmoid(x) * (high - low) + low

def median_absolute_percentage_error(y_pred, y):
    e = torch.abs(y.view_as(y_pred) - y_pred) / torch.abs(y.view_as(y_pred))
    return 100.0 * torch.median(e).item()    

def r2_score(y_pred, y):
    "R2 score (coefficient of determination) between `pred` and `targ`."
    u = torch.sum((y - y_pred) ** 2)
    d = torch.sum((y - y.mean()) ** 2)
    return 1 - u / d

class TestModel(pl.LightningModule):
    def __init__(self, val_pct: float = 0.1, lr = 0.001,
                 seed: int=31, batch_size: int = 2048, scaled=True, enable_batch_norm=False):
        super().__init__()
        self.lr = lr
        self.val_pct: float = val_pct
        self.seed: int = seed
        self.batch_size: int = batch_size
        self.data_raw: pd.DataFrame = None
        self.X_train_raw: torch.Tensor = None
        self.X_train: torch.Tensor = None
        self.X_valid_raw: torch.Tensor = None    
        self.X_valid: torch.Tensor = None
        self.Y_train: torch.Tensor = None
        self.Y_valid: torch.Tensor = None
        self.X_names: List[str] = []
        self.Y_names: List[str] = []
        self.train_dataset: Dataset = None
        self.valid_dataset: Dataset = None
        self.train_idx = []
        self.valid_idx = []
        self.tfms: transforms.Compose = None       
        self.scaled = scaled
        self.enable_bn = enable_batch_norm
        self.batch_norm = None

    def _prepare_model(self, feat_n):
        if self.enable_bn:
            self.batch_norm = nn.BatchNorm1d(feat_n)
            L.info('Using batchnorm')
        #act = nn.Tanh()
        #act = Mish()
        act = nn.ReLU()
        self.layer1 = nn.Linear(feat_n,1000)
        self.layer2 = nn.Linear(1000,500)
        self.layer3 = nn.Linear(500,250)
        self.layer4 = nn.Linear(250,1)
        #Set bias to mean of target
        if self.mean_target is not None: self.layer4.bias.data.fill_(self.mean_target)
        self.layers = nn.Sequential(
            self.layer1,
            act,
            self.layer2,
            act,
            self.layer3,
            act,
            self.layer4
        )
      

    def prepare_data(self):
        # Feature engineer and convert to tensors
        X, self.X_names, Y, self.Y_names = np.random.rand(100000,3), ['x1','x2','x3'], np.random.rand(3,1), ['y1']
        # Train, valid split
        self.train_idx, self.valid_idx= train_test_split(np.arange(len(Y)), test_size=self.val_pct, random_state=self.seed, shuffle=True)
        L.info(f'Using {len(self.train_idx)} train samples and {len(self.valid_idx)} valid samples for valuing {self.payoff.name} payoff')
        L.info(f'Features: \n{self.X_names}, \n Targets: \n{self.Y_names}')
        self.X_train = X[self.train_idx]
        self.Y_train = Y[self.train_idx]
        self.X_valid = X[self.valid_idx]
        self.Y_valid = Y[self.valid_idx]
        L.info(f'X_train shape: {self.X_train.shape}, Y_train shape: {self.Y_train.shape}')
        L.info(f'X_train means: {self.X_train.mean(axis=0)}, \n X_train stds: {self.X_train.std(axis=0)}')
        L.info(f'X_valid means: {self.X_valid.mean(axis=0)}, \n X_valid stds: {self.X_valid.std(axis=0)}')
        # Normalize and transform
        self.tfms = Normalize(self.X_train.mean(axis=0), self.X_train.std(axis=0))
        L.info(f'Transforms: {self.tfms}')
        self.X_train = self.tfms(self.X_train)
        self.X_valid = self.tfms(self.X_valid)
        L.info(f'X_train means post tfms: {self.X_train.mean(axis=0)},\n X_valid means: {self.X_valid.mean(axis=0)}')
        # Construct datasets
        self.train_dataset = TensorDataset(self.X_train, self.Y_train)
        self.valid_dataset = TensorDataset(self.X_valid, self.Y_valid)
        
        # Initialize model with scaling properties, mean target is the bias of the last linear layer
        if self.scaled:
            self.y_min = self.Y_train.min()*1.01
            self.y_max = self.Y_train.max()*1.01
            self.mean_target = self.Y_train.mean().item()
            assert self.y_min != np.nan
            assert self.y_max != np.nan
            L.info(f'Using scaling of outputs: y_min: {self.y_min}, y_max: {self.y_max}. y_pred_bias: {self.mean_target}')
            
        self._prepare_model(len(self.X_names))    
        
    def forward(self, x):
        if self.batch_norm is not None: x = self.batch_norm(x)
        if self.scaled:    
            return sigmoid_range(self.layers(x), self.y_min, self.y_max)
        else:
            return self.layers(x)
    
    def step(self, y_hat, y, mode='train'):
        loss = F.mse_loss(y_hat, y) * 10e5
        mae = F.l1_loss(y_hat, y)
        mape = median_absolute_percentage_error(y_hat, y)
        r2 = r2_score(y_hat, y)
        out = {'loss': loss, 'mae': mae, 'mape': mape, 'R2': r2}
        if mode=='train':
            out['log'] = out.copy()
            return out
        elif mode =='val':
            out = {f'{mode}_{k}': v for k,v in out.items()}
            out['log'] = out.copy()
            return out
        else:
            raise Exception('Unsupported mode')

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        return self.step(y_hat, y, 'train')
    
    def configure_optimizers(self):
        optim = torch.optim.Adam(self.parameters(), lr=self.lr)
        sched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, 'min')
        return [optim], [sched]

    def train_dataloader(self):
        loader = DataLoader(self.train_dataset, batch_size=self.batch_size, num_workers=4, shuffle=False)
        return loader
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        return self.step(y_hat, y, 'val')

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        tensorboard_logs = {'val_loss': avg_loss}
        return {'val_loss': avg_loss, 'log': tensorboard_logs}

    def val_dataloader(self):
        loader = DataLoader(self.valid_dataset, batch_size=self.batch_size, num_workers=4, shuffle=False)
        return loader
    
model = TestModel()
trainer = pl.Trainer(gpus=1, default_save_path=exp_path, max_epochs=100)        

# Run learning rate finder
lr_finder = trainer.lr_find(model)
# Results can be found in
lr_finder.results
# Plot with
fig = lr_finder.plot(suggest=True)
fig.show()

     44         param_groups = list(params)
     45         if len(param_groups) == 0:
---> 46             raise ValueError("optimizer got an empty parameter list")
     47         if not isinstance(param_groups[0], dict):
     48             param_groups = [{'params': param_groups}]

ValueError: optimizer got an empty parameter list

I have a feeling that maybe lr_finder isn't calling prepare_data and hence _prepare_model so all the nn layers are None?

SkafteNicki · 2020-06-10T07:31:38Z

lr_finder does call prepare_data, however it expects (implicit) that the model parameters are initialized before. The intention behind prepare_data is only to setup/download whatever data the model is going to use, not setup the model parameters.

Your setup is very atypical, normally model parameters are always initialized in the __init__(...) method. I would suggest that you move the call to self._prepare_model(self, feat_n) to the bottom of __init__(...).

feribg · 2020-06-10T15:15:35Z

@SkafteNicki What if the model is data dependend and we want to keep that dependence integrated (instead of passing in data dependent parameters to the constructor). In my case this is the bias term of the last layer, which is set to the mean of the training set target. This is only available after the training set is loaded and determined. In these situations do you usually advice to move the training data loader outside of the model and use the production recommended approach?

SkafteNicki · 2020-06-11T09:18:27Z

I would definitely go for the production approach in your case. My take on your user case is that you want to set you bias term to a constant. Even though you choose that constant in a very specific way (as the mean of training set target), it boils down to being a hyper parameter of the model and thus should be a value given to the __init__(...) method.

All that said, I am willing to fix this in a PR, I am just trying to figure out if this is unique case or a more broader observation.

feribg · 2020-06-12T04:14:44Z

Well I guess it's a question of when does the model specification happen, before you handle your data or after. I think that there are plenty of use cases in which model parameters are set as a function of the data (min,max,biases,etc). So it's fine to say this is not supported with the prepare_data mechanism explicitly - but still the lr_find and fit methods should have consistent behaviour (and if there's no good reason to require the bias to be initialized it should probably be allowed). That's my personal feedback, what's your take on it? Im happy to use the alternative approach and process all data outside the PL module, but again seems like it breaks self containment and isolation just to make the lr finder happy. Alternatively I can set up the model in __init__ so parameters won't be empty and then add the tweaks to the layers later in prepare_data, my only issue with that is, now we define the model in a bunch of places and it's somewhat error prone.

SkafteNicki · 2020-06-12T11:23:35Z

Normally I would say that model specification and handling of data, should be independent of each other. However, I do see your point that you may want to adjust model according to data.

I agree that the behavior of the fit and lr_find method should be the same. I will include this change in PR #1998, which is a major rework of the learning rate finder.

Thanks for feedback 👍

Molaire · 2020-06-16T18:44:29Z

Hi, I'm falling on the same kind of problem here. What would be the production approach?

feribg · 2020-06-16T22:54:31Z

@Molaire Don't use prepare_data and instead call fit and lr_find with a dataloader parameter that you've processed and initialized outside of the model (I actually like to put a static method in the model for it just to keep it tidy).

Riccorl · 2020-06-18T19:35:37Z

@Molaire Don't use prepare_data and instead call fit and lr_find with a dataloader parameter that you've processed and initialized outside of the model (I actually like to put a static method in the model for it just to keep it tidy).

Do you have some code example?

hbredin · 2020-06-26T14:52:54Z

Normally I would say that model specification and handling of data, should be independent of each other.
However, I do see your point that you may want to adjust model according to data.

+1 for a way to handle this use case.

Mine is a generic supervised classification model for which the number of classes is not known before loading the data.
I ended up calling prepare_data in __init__ and defining the output dimension of the final nn.Linear layer according to the outcome of prepare_data.

edenlightning · 2020-09-16T18:31:19Z

@feribg can we close this issue? mind checking again on master?

feribg · 2020-09-19T16:42:03Z

Yep, thanks!

feribg added the help wanted Open to be worked on label Jun 7, 2020

Borda added the question Further information is requested label Jun 7, 2020

SkafteNicki mentioned this issue Jul 2, 2020

Tuner class #1998

Closed

5 tasks

rohitgr7 mentioned this issue Jul 4, 2020

Initialising model in setup not compatible with auto_scale_batch_size / auto_lr_find #2485

Closed

edenlightning added bug Something isn't working priority: 0 High priority task and removed question Further information is requested labels Aug 3, 2020

edenlightning added this to the 0.9.x milestone Sep 16, 2020

feribg closed this as completed Sep 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LR finder broken #2101

LR finder broken #2101

feribg commented Jun 7, 2020

github-actions bot commented Jun 7, 2020

Borda commented Jun 7, 2020

SkafteNicki commented Jun 9, 2020

feribg commented Jun 9, 2020 •

edited

Loading

SkafteNicki commented Jun 10, 2020

feribg commented Jun 10, 2020

SkafteNicki commented Jun 11, 2020

feribg commented Jun 12, 2020 •

edited

Loading

SkafteNicki commented Jun 12, 2020

Molaire commented Jun 16, 2020

feribg commented Jun 16, 2020

Riccorl commented Jun 18, 2020

hbredin commented Jun 26, 2020

edenlightning commented Sep 16, 2020

feribg commented Sep 19, 2020

LR finder broken #2101

LR finder broken #2101

Comments

feribg commented Jun 7, 2020

To Reproduce

github-actions bot commented Jun 7, 2020

Borda commented Jun 7, 2020

SkafteNicki commented Jun 9, 2020

feribg commented Jun 9, 2020 • edited Loading

SkafteNicki commented Jun 10, 2020

feribg commented Jun 10, 2020

SkafteNicki commented Jun 11, 2020

feribg commented Jun 12, 2020 • edited Loading

SkafteNicki commented Jun 12, 2020

Molaire commented Jun 16, 2020

feribg commented Jun 16, 2020

Riccorl commented Jun 18, 2020

hbredin commented Jun 26, 2020

edenlightning commented Sep 16, 2020

feribg commented Sep 19, 2020

feribg commented Jun 9, 2020 •

edited

Loading

feribg commented Jun 12, 2020 •

edited

Loading