Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cannot override a default config group that has been set up in a Structured Config #2039

Open
2 tasks done
jzazo opened this issue Feb 17, 2022 · 3 comments
Open
2 tasks done
Labels
question Hydra usage question

Comments

@jzazo
Copy link

jzazo commented Feb 17, 2022

🐛 Bug

Description

I set up structured config schemas with nested config groups as attributes. I want to set one config group as a default value in the structured config schema, and override it in the yaml files. I get different errors depending if I set the default in the defaults list, or directly as an initialized config.

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro (See this for tips).

To reproduce

This is my example.py file:

#!/usr/bin/env python3
from dataclasses import dataclass, field
from typing import Any, List

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING, DictConfig, OmegaConf

@dataclass
class CriterionConf:
    _target_: str = MISSING

@dataclass
class BCEConf(CriterionConf):
    _target_: str = "torch.nn.BCELoss"

@dataclass
class CrossEntropyConf(CriterionConf):
    _target_: str = "torch.nn.CrossEntropyLoss"

@dataclass
class RegressionConf:
    criterion: CriterionConf = MISSING

@dataclass
class Config:
    model: RegressionConf = MISSING

cs = ConfigStore.instance()
cs.store(name="base_config", node=Config)
cs.store(group="ml", name="regression", node=RegressionConf)
cs.store(group="ml", name="cross_entropy", node=CrossEntropyConf)
cs.store(group="ml", name="bce", node=BCEConf)

@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg, resolve=True))

if __name__ == "__main__":
    my_app()

This is my conf/config.yaml file:

defaults:
  - base_config
  - /ml/regression@model
  - /ml/bce@model.criterion

This is my conf/criterion/bce.yaml:

defaults:
  - /ml/bce

This is my conf/criterion/cross_entropy.yaml:

defaults:
  - /ml/cross_entropy

If I run ./example.py (no default list yet), everything works and I get:

$ ./example.py
model:
  criterion:
    _target_: torch.nn.BCELoss

If I modify the RegressionConf with an initialized config, e.g.,

@dataclass
class RegressionConf:
    criterion: CriterionConf = CrossEntropyConf()

I get the following error:

$ ./example.py
In 'ml/bce': Validation error while composing config:
Merge error: BCEConf is not a subclass of CrossEntropyConf. value: {'_target_': 'torch.nn.BCELoss'}
    full_key:
    object_type=Config

If I modify RegressionConf with a defaults list, e.g.,

@dataclass
class RegressionConf:
    defaults: List[Any] = field(default_factory=lambda: [{"criterion": "/ml/cross_entropy"}])
    criterion: CriterionConf = MISSING

I get the following error:

$ ./example.py
In 'ml/regression': Could not find 'ml/criterion//ml/cross_entropy'

Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///home/javier/repos/immunocam/immunocam-bolts/src/python/immunocam/bolts_sandbox/hydra/conf
        provider=schema, path=structured://

Expected Behavior

In the first case I would expect that the type of criterion remains CriterionConf and does not switch to CrossEntropyLoss.
In the second case, I would expect that the group config /ml/cross/entropy is global, rather than relative, i.e., ml/criterion//ml/cross_entropy, as the error seems to indicate.

System information

  • Hydra Version : 1.1.1
  • Python version : 3.8.10
  • Virtual environment type and version : poetry 1.1.12
  • Operating system : Ubuntu 20.04
@jzazo jzazo added the bug Something isn't working label Feb 17, 2022
@Jasha10 Jasha10 added question Hydra usage question and removed bug Something isn't working labels Feb 17, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 17, 2022

Hello again @jzazo!

First, let me visualize the tree of input configs that can be accessed by Hydra:

├── config                   # file=conf/config.yaml
├── criterion
│   ├── bce                  # file=conf/criterion/bce.yaml
│   └── cross_entropy        # file=conf/criterion/cross_entropy.yaml
├── base_config              # node=Config
└── ml
    ├── bce                  # node=BCEConf
    ├── cross_entropy        # node=CrossEntropyConf
    └── regression           # node=RegressionConf

Your first example:

If I run ./example.py (no default list yet), everything works and I get:

$ ./example.py
model:
  criterion:
    _target_: torch.nn.BCELoss

Yes indeed. In this case, the following four input configs are being merged:

  • base_config
  • ml/regression (at package model)
  • ml/bce (at package model.criterion)
  • config

This can be confirmed by using the --info=defaults flag:

 $ ./example.py --info=defaults

Defaults List
*************
| Config path                 | Package             | _self_ | Parent       |
------------------------------------------------------------------------------
...
| base_config                 |                     | False  | config       |
| ml/regression               | model               | False  | config       |
| ml/bce                      | model.criterion     | False  | config       |
| config                      |                     | True   | <root>       |
------------------------------------------------------------------------------

Note that the files conf/criterion/bce.yaml and
conf/criterion/cross_entropy.yaml are not being loaded.
Instead, the defaults list from config.yaml is loading the configs ml/regression and ml/bce directly.


Your second example:

If I modify the RegressionConf with an initialized config, e.g.,

@dataclass
class RegressionConf:
    criterion: CriterionConf = CrossEntropyConf()

I get the following error:

$ ./example.py
In 'ml/bce': Validation error while composing config:
Merge error: BCEConf is not a subclass of CrossEntropyConf. value: {'_target_': 'torch.nn.BCELoss'}
    full_key:
    object_type=Config

This is expected. The issue is that an instance of BCEConf is being merged into an instance of CrossEntropyConf.
This error can be reproduced using OmegaConf: below, I've simulated merging the four input configs that appear in the above --defaults output:

>>> input1 = Config                             # base_config
>>> input2 = {"model": RegressionConf}          # ml/regresssion@model
>>> input3 = {"model": {"criterion": BCEConf}}  # ml/bce@model.criterion
>>> input4 = {}                                 # config
>>> OmegaConf.merge(input1, input2, input3, input4)
Traceback (most recent call last):
  ...
omegaconf.errors.ValidationError: Merge error: BCEConf is not a subclass of CrossEntropyConf. value: {'_target_': 'torch.nn.BCELoss'}
    full_key:
    object_type=Config

Note that the error message exactly the same.
The failure occurs for the same reason that OmegaConf.merge(CrossEntropyConf, BCEConf) would fail: BCEConf is not a subclass of CrossEntropyConf.


Your third example:

If I modify RegressionConf with a defaults list, e.g.,

@dataclass
class RegressionConf:
    defaults: List[Any] = field(default_factory=lambda: [{"criterion": "/ml/cross_entropy"}])
    criterion: CriterionConf = MISSING

I get the following error:

$ ./example.py
In 'ml/regression': Could not find 'ml/criterion//ml/cross_entropy'

This is also expected. Basically, writing {"criterion": "/ml/cross_entropy"} means that Hydra will search for an input config called
"/ml/cross_entropy" inside of a group called "criterion". This is all done relative to the "ml" group where the
RegressionConf node is stored, so the final lookup path is "ml/criterion//ml/cross_entropy".

Instead of searching for a config at "ml/criterion//ml/cross_entropy", you need to search for "ml/cross_entropy":

@dataclass
class RegressionConf:
    defaults: List[Any] = field(default_factory=lambda: [{"/ml@criterion": "cross_entropy"}])
    criterion: CriterionConf = MISSING

The "/ml" part tells Hydra to look for a top-level group called ml; the leading slash means that the address of the "ml" group is
absolute (instead of being relative). The "cross_entropy" part means that the /ml/cross_entropy is what
Hydra should search for, and the @criterion part means that this /ml/cross_entropy node should
be placed in the criterion package.

For this to work, you will also need to change your config.yaml file as follows:

defaults:
  - base_config
  - /ml/regression@model
  - override /ml@model.criterion: bce
$ ./example.py
model:
  criterion:
    _target_: torch.nn.BCELoss

You can find more documentation on how this works in the docs on the defaults list.

Note that once again the files conf/criterion/bce.yaml and
conf/criterion/cross_entropy.yaml are not being loaded. I'd recommend deleting these yaml files to avoid confusion.


A final example:

Below is a strategy for storing RegressionConf directly in the model group (instead of in ml), and
for storing CrossEntropyConf and BCEConf directly in model/criterion:

@dataclass
class RegressionConf:
    defaults: List[Any] = field(default_factory=lambda: [{"criterion": "cross_entropy"}])
    criterion: CriterionConf = MISSING

...

cs = ConfigStore.instance()
cs.store(name="base_config", node=Config)
cs.store(group="model", name="regression", node=RegressionConf)
cs.store(group="model/criterion", name="cross_entropy", node=CrossEntropyConf)
cs.store(group="model/criterion", name="bce", node=BCEConf)

With this setup, the tree of accessible input configs looks like this:

├── config                   # file=conf/config.yaml
├── base_config              # node=Config
└── model
    ├── regression           # node=RegressionConf
    └── criterion
        ├── cross_entropy    # node=CrossEntropyConf
        └── bce              # node=BCEConf

Now, change config.yaml to look like this:

defaults:
  - base_config
  - model: regression
  - override model/criterion: bce

Running the example:

 $ ./example.py
model:
  criterion:
    _target_: torch.nn.BCELoss

@jzazo
Copy link
Author

jzazo commented Feb 18, 2022

Thank you @Jasha10 for this awesome answer. Everything is quite clear, although I am still struggling a bit (in my real problem, see message below).

Regading your proposed fixes, if I try to reproduce the third example with:

@dataclass
class RegressionConf:
    defaults: List[Any] = field(default_factory=lambda: [{"/ml@criterion": "cross_entropy"}])
    criterion: CriterionConf = MISSING
# config.yaml
defaults:
  - base_config
  - /ml/regression@model
  - override /ml@model.criterion: bce

My output becomes:

$ ./example.py
model:
  defaults:
  - /ml@criterion: cross_entropy
  criterion:
    _target_: torch.nn.BCELoss

In your final example, I get the following output as well:

$ ./example.py
model:
  defaults:
  - criterion: cross_entropy
  criterion:
    _target_: torch.nn.BCELoss

It seems that the defaults list is persistent in the returned config, why does this happen? I cannot instantiate objects with such attribute automatically.

@jzazo
Copy link
Author

jzazo commented Feb 18, 2022

I have finally identified the source of the problems I was getting in my original problem. If I change the type of model in Config from RegressorConf to ModelConf, then things break down for the default list in the structured config, but not in yamls:

The following works:

@dataclass
class ModelConf:
    _target_: MISSING

class RegressionConf(ModelConf):
    criterion: LossConf = MISSING

@dataclass
class Config:
    model: ModelConf= MISSING
# config.yaml
defaults:
  - base_config
  - model: regression
# model/regression.yaml
defaults:
  - /model/base_regression@_here_
  - /model/loss@criterion: bce  # <-- no conflict when merging this config into criterion

resulting in:

$ ./example.py
model:
  _target_: ???
  criterion:
    _target_: torch.nn.BCELoss

The follwoing breaks:

@dataclass
class ModelConf:
    _target_: str = MISSING

@dataclass
class Config:
    model: ModelConf= MISSING  # <-- conflict when assigning criterion to model because it assumes it's ModelConf!

class RegressionConf(ModelConf):
    defaults: List[Any] = field(default_factory=lambda: [{"/model/loss@criterion": "bce"}])
    criterion: LossConf = MISSING
# regression.yaml
defaults:
  - /model/base_regression@_here_

fails with:

omegaconf.errors.ConfigKeyError: Key 'criterion' not in 'ModelConf'
    full_key: model.criterion
    reference_type=ModelConf
    object_type=ModelConf

hydra.errors.ConfigCompositionException

This is the config store in both cases:

cs = ConfigStore.instance()
cs.store(name="base_config", node=Config)
cs.store(group="model", name="base_regression", node=RegressionConf)
cs.store(group="model/loss", name="cross_entropy", node=CrossEntropyConf)
cs.store(group="model/loss", name="bce", node=BCEConf)

It seems to me that the behavior is different if the default list is on yaml or in the structured config. In the yaml case the inferred type is of RegressionConf type and things work out, but in the structured config case, the structured config does not udpate the type to RegressionConf and it remains ModelConf. Is this behavior intended or a bug? Shouldn't both cases behave the same?

Thanks for all the support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Hydra usage question
Projects
None yet
Development

No branches or pull requests

2 participants