Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds hydra_zen.store #331

Merged
merged 52 commits into from
Nov 21, 2022
Merged

Adds hydra_zen.store #331

merged 52 commits into from
Nov 21, 2022

Conversation

rsokl
Copy link
Contributor

@rsokl rsokl commented Oct 26, 2022

hydra_zen.ZenStore is an abstraction over Hydra's config store that streamlines the process of generating a config and then storing it in Hydra's config store. It enables users to isolate stores from Hydra's global config store and from one another, so that many stores can be maintained within a project without "clobbering" one another.

It is designed to be highly customizable while having defaults that seldom need to be tinkered with. It also has some nice ergonomics (see the self-partialing feature 😄 ).

Basic Usage

This code:

from hydra_zen import builds
from hydra.core.config_store import ConfigStore

def repeat(x: int, y: str):
    return x * y

cs = ConfigStore.instance()
cs.store(name=repeat.__name__, node=builds(func, populate_full_signature=True))

is equivalent to:

from hydra_zen import store

@store
def repeat(x: int, y: str):
    return x * y

store.add_to_hydra_store()  # (default) have to explicitly opt-in to populating Hydra's global store

and this:

cs.store(name="repeat1", node=builds(repeat, x=1, populate_full_signature=True))
cs.store(name="repeat2", node=builds(repeat, x=2, populate_full_signature=True))

can be rewritten as:

@store(name="repeat2", x=2)
@store(name="repeat1", x=1)
def repeat(x: int, y: str):
    return x * y

or

store(repeat, name="repeat1", x=1)
store(repeat, name="repeat2", x=2)

Basic Implementation Details

There is a lot more here than meets the eye. It is useful to take a peek at the implementation, which, at its core, is quite simple. Here is a stripped-down implementation:

def store(       
    f: Optional[F] = None,
    *,
    name: str | Callable[[Any, Any], str] = lambda f, _: f.__name__,
    group: None | str | Callable[[Any, Any], str] = None,
    package: None | str | Callable[[Any, Any], str] = None,
    provider: Optional[str] = None,
    to_config: Callable[[F], Any] = lambda x, **kw: builds(x, **kw, populate_full_signature=True),
    store_fn: HydraStoreFn = ConfigStore.instance().store,
    **to_config_kw: Any) -> F:
    
    # this excludes some fancy decorator & self-partialing stuff, as well as annotation overrides

    cfg = to_config(f, **to_config_kw)

    _name = name(f, cfg) if callable(name) else name
    _group = group(f, cfg) if callable(group) else group
    _pkg = package(f, cfg) if callable(package) else package
    store_fn(
            name=_name,
            node=cfg,
            group=_group,
            package=_pkg,
            provider=provider,
        )
    return f

Note that nearly every aspect of store, including how it generates configs, can be customized. Furthermore, the name, group, and package fields can be specified as callables, which can be used to, for instance, auto-generate the node's name based on details like its module's path.

Bells & Whistles

store is both a function and a decorator

As shown above store can be used as a function or a decorator

# as a decorator
@store(group="math")
def add(x, y): 
    return x + y

# as a function
def add(x, y): 
    return x + y

store(add, group="math")

store is a passthrough

That is, store(f) is f, and extra keywords passed to store are passed to the config-creation function. This means that you can stack @store decorators to store different configs for the same target

@store(name="grid_trainer", device="gpu")  # user has to specify `num_epoch` on launch
@store(name="test_trainer", num_epoch=1, device="cpu")  # all defaults provided here
def trainer(num_epochs, device):
    ...

this also streamlines things like creating partial configs

from torch.optim import Adam

store(Adam, name="adam", zen_partial=True)  # `zen_partial=True` is passed to `builds` 

store is self-partialing

This is really cool.. you can overwrite the default arguments of store without using functools.partial

math_store = store(group="math")

@math_store(name="add_it")  # equivalent to  @store(name="add_it", group="math")
def add(x, y): return x + y

You can specify new defaults endlessly

new_store = store(name="a")(group="b")(package="c")

new_store(func)  # equivalent to store(func, name="a", group="b", package="c")

This makes it easy to customize the behavior of store while preserving its signature and type annotations. You can seriously override everything 1

Specialized support for dataclasses

The default to_config function will create configs that are proper subclasses of a target dataclass. Consider:

from dataclasses import dataclass

from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf

from hydra_zen import store


@store(name="a2", x=2)
@store(name="a1", x=1)
@store(name="a")
@dataclass
class A:
    x: int
>>> config = OmegaConf.to_object(ConfigStore.instance().repo["a1.yaml"].node)
>>> issubclass(type(config), A)
True

Thus for each application of store, builds produces a new dataclass-type, which is a subtype of A.

Responsive type annotations

pyright impresses once again (unfortunately, mypy can't catch the following true positive -- I will open an issue)

from hydra_zen import store,  just

# type checker: error  (you can't pass a dict as a target to `builds`)
# runtime: error
store({"a": 1}, name="some_dict", to_config=builds)  

# type checker: OK
# runtime: OK
store({"a": 1}, name="some_dict", to_config=just)  

Feedback

Does @store encourage an anti-pattern?

Although I like the @store syntax, I am not crazy about having library authors decorate functions throughout their library -- it leads to a "non-locality", where those functions will only be added to the store if they are accessed. E.g. if I have

# contents of my_lib/math.py
@store
def add(x, y): ...

The config for add only becomes available in Hydra's config store if my_lib.math gets imported somewhere along the way. This feels like an anti-pattern as it leads to spooky action-at-a-distance behavior for unwitting users who may wonder why add is suddenly in their config store2.

I realize that library authors could just as easily write:

# contents of my_lib/math.py
def add(x, y): ...

store(add)

but this feels like more of a deliberate design decision, whereas the availability of the decorator syntax feels like it is practically begging for this pattern.

Is there a way to discourage or even prevent this? I have considered having some sort of intermediary function, add_configs_to_store, that users have to explicitly call to actually update Hydra's config store.

I am also concerned with configs silently getting overwritten by configs of the same name.

This is just a preliminary idea, and I'd love to hear other ideas for encouraging best practices with Hydra's config store.

Footnotes

  1. A library author might be interested in patterns similar to the following:

    from collections import defaultdict
    
    from hydra_zen import store as orig_store, builds, get_target
    from hydra.core.config_store import ConfigStore
    
    from my_lib import Dense, Conv2d, Cifar10, MNIST
    
    config_repo = defaultdict(dict)  # makes for easy-access to the generated configs
    
    def override_store(
        name: str,
        node: Any,
        group: Optional[str] = None,
        package: Optional[str] = None,
        provider: Optional[str] = None,
    ):
        ConfigStore.instance().store(name, node, group, package, provider)
        target = get_target(node) 
        config_repo[target][(name, node, group, package, provider)] = node
        return None
    
    store = orig_store(provider="my_lib", store_fn=override_store)
    
    model_store = store(group="model")
    model_store(Dense)
    model_store(Conv2d)
    
    data_store = store(group="data")
    data_store(Cifar10)
    data_store(MNIST)
    
  2. That being said, it is quite nice to be able to do:

    from hydra_zen import store, zen
    
    @store(name="task2", x=1, y=2)
    @store(name="task1", x=1, y=2)
    def task_fn(x, y):
        # stuff
    
    if __name__ == "__main__":
        zen(task_fn).hydra_main(config_path=None, config_name="task1")
    

@rsokl rsokl added this to the hydra-zen 0.9.0 milestone Oct 26, 2022
@rsokl
Copy link
Contributor Author

rsokl commented Oct 26, 2022

@Jasha10 @addisonklinke @jgbos It'd be great to get your impressions of hydra_zen.store from this PR's (lengthy) descriptions. Any feedback, questions, concerns, etc. would be great when you have time! 😄

@jgbos
Copy link
Contributor

jgbos commented Oct 26, 2022

Awesome. I agree with the issues since they were my two biggest concerns for storing configs directly on functions were

For add_configs_to_store, do you mean to have store register configs internally and have the user explicitly load configs into the ConfigStore when they need them? This might also allow us to do some checks between registered configs and the existing config store repo.

@Jasha10
Copy link
Contributor

Jasha10 commented Oct 26, 2022

I am also concerned with configs silently getting overwritten by configs of the same name.

Perhaps this could be handled on the Hydra side? Hydra's ConfigStore could issue a UserWarning if a previously-stored config node would be overwritten.

Although I like the @store syntax, I am not crazy about having library authors decorate functions throughout their library -- it leads to a "non-locality", where those functions will only be added to the store if they are accessed.

Agreed. In my own applications I've gravitated towards the following pattern:

# contents of my_lib/math.py
def add(x, y): ...

def store_configs():
    store(add)
    ...

Then in my main module I'd call my_lib.math.store_configs(). This allows me to mitigate spooky action at a distance -- pretending that the global state of Hydra's ConfigStore is actually local state of main.

@rsokl
Copy link
Contributor Author

rsokl commented Oct 27, 2022

Thanks for the insights @Jasha10

I was chatting with @jgbos and we are thinking of modifying this PR to introduce ZenStore, which would look roughly like:

class ZenStore:
    _internal_repo = dict()
    
    def __call__(*a, **kw):
        # same functionality as the current `store` fn/decorator, but
        # registers constructed node to `self._internal_repo`
 
        config = # make config
        name = # get name etc.
        
        assert (name, group, etc) not in self._internal_repo
        self._internal_repo[(name, group, etc)] = config
    
    def add_to_hydra_store(self):
        for key, config in self._internal_repo.items():
            ConfigStore.instance().store(**key, node=config)

    # plus nice repr/gettr for inspecting internal repo and accessing configs

This is meant to get users to think about stored configs in a stateful and local way -- local to specific instances of their ZenStores. We can still provide store, which would be an instance of ZenStore that hydra-zen provides to provide a convenient way to store configs across modules. E.g. in hydra_zen/__init__.py we would have store = ZenStore().

So someone's main.py might look like:

from my_lib.math import store as math_store
from my_lib.data import store as data_store

from hydra_zen import store, zen

@store(name="main_task", num_epochs=10)
def task(num_epochs, model, optim, trainer):
    ...

if __name__ == "__main__":
    math_store.add_to_hydra_store()
    data_store.add_to_hydra_store()
    store.add_to_hydra_store()
    zen(task).hydra_main(config_name="main_task", config_path=None, version_base="1.2")

One thing that I like about your store_configs approach is that it also avoids the config-creation process until you actually call the function, rather than users incurring that cost every time they import a given module. That makes me think that ZenStore might implement this sort of lazy config-construction pattern as well.

Lastly, one neat thing that manifests from this ZenStore approach is that people can construct different store instances for different applications.

test_store = ZenStore()
demo_store = ZenStore()
experiment_store = ZenStore()

I have seen library authors resort to some interesting tricks to switch many defaults en-masse based on different contexts. ZenStore offers a lower lever of organization that might make it more manageable for people to maintain more sophisticated combinations of configs & config defaults. (These are just musings at this point)

@addisonklinke
Copy link

addisonklinke commented Oct 28, 2022

Summary

This seems like a re-vamp of #93, right? Definitely looking more polished this time around, and I'm excited to have it as part of the Hydra-Zen library! 🎉 As you'll see, I think my custom implementation is much more clunky and I'd be happy to use something more official

Entry Script Patterns

Mine wind up being quite similar to what @Jasha10 and @rsokl propose with dedicated storage calls

from hydra.core.config_store import ConfigStore

from lib.config.entry import TrainConf  # The only manual dataclass, for top-level components
from lib.config.registration import register_modules
from lib import datamodule, model

def register_configs():
    register_modules(datamodule, model)
    cs = ConfigStore.instance()
    cs.store(name='train_conf', node=TrainConf)

@hydra.main(config_name='train_conf', config_path=None)
def train(cfg):
    ...

if __name__ == '__main__':

    register_configs()
    train()

In a given module I have something like

@my_store()
class MyModel:
    ...

Where my_store() essentially performs the same logic as the proposed ZenStore.__call__() except instead of using self._internal_repo to store the config group/name/node I add them to a __hydra_cfg__ attribute of the object. Then register_modules() iterates through all members of a module, checks to see which have the special __hydra_cfg__ attribute defined, and for those objects executes ConfigStore.instance().store

Differences / Desired Features

On the whole, I could readily convert my implementations of my_store() and register_modules() to use the interfaces added with this PR. That would be preferable since I can offload the burden of maintaining/testing those to the Hydra Zen library. A couple features that I would suggest adding

  1. Easy registration of multiple modules in one function call. I think it will be quite common to import several ZenStore objects and it'd be nice to avoid typing out the .add_to_hydra_store() for each one. Perhaps a way to merge multiple into a parent ZenStore, and then the "add to Hydra" step gets called once for the parent?
  2. Some default class attribute that gets transferred to the to_config_kw. In my code, I'd called it interpolations as originally discussed here, but I think it would be more appropriate as a dunder
  3. A standard best-practice for registering any associated OmegaConf resolvers along with the config nodes that use them. So far I haven't figured out a great solution for this
  4. Default config group and name for the to_config parameter of store(). If I have lib/model/classification.py:MyModel, the most logical config store placement seems to be group='model', name='classification.MyModel. I realize I could override this once for my library with the self-partialing feature, so maybe it is best left up to library authors to define their default storage schema rather than code it into Hydra-Zen

Details

If you're curious about my implementation details for auto-detecting config group/name (desired feature 3 above) and/or the register_modules() approach, here's the code I'm currently using. The docstrings are a bit lengthy since I'm working with team members who are new to Hydra and I wanted to give as much detail as possible for their understanding

import importlib
import inspect
import pkgutil

from hydra.core.config_store import ConfigStore
from hydra_zen import builds, just, get_target

HYDRA_CFG_ATTR = '__hydra_config__'


def my_store(*args, **kwargs):
    """Decorator to store the return of ``get_configstore_kwargs()`` in a ``__hydra_config__`` attribute

    Note: to decorate a dataclass, ``@dataclass`` must be the inner decorator

    :param args/kwargs: Matching the signature of ``get_configstore_kwargs()``. To override
        these in multiple uses within a single module, you can shadow the imported ``my_store``
        using ``functools.partial``. For example

            ```python
            from functools import partial
            from bml.config.registration import my_store

            new_my_store = partial(my_store, zen_just=True)
            ```
    """

    def wrapper(cls):
        configstore_kwargs = get_configstore_kwargs(cls, *args, **kwargs)
        setattr(cls, HYDRA_CFG_ATTR, configstore_kwargs)  # TODO read-only attr which can't be modified later
        return cls

    return wrapper


def get_configstore_kwargs(
        obj,
        populate_full_signature=True,
        group_prefix=slice(1, None),
        name_prefix=None,
        zen_just=False,
        **kwargs):
    """Auto-build config node and determine its storage location within the Hydra ConfigStore

    Used both with the ``my_store`` decorator and in standalone calls to register objects
    from 3rd party libraries. Prerequisite knowledge of Hydra structured configs is recommended
    https://hydra.cc/docs/next/tutorials/structured_config/intro/#internaldocs-banner

    To illustrate the default behavior, take the following structure

        package/
            __init__.py
            subpackage/
                __init__.py
                submodule.py  # Contains object ``SomeClass``

    Calling ``get_configstore_kwargs(SomeClass)`` would return the following dict

        {
            'group': 'subpackage/submodule',
            'name': 'SomeClass',
            'node': builds(SomeClass)
        }

    Providing a slice for ``name_prefix`` incorporates various elements of the
    object's ``__module__`` path into the returned ``name`` key. Positive
    indexing starts from the top-level package while negative indexing goes from
    the innermost submodule. Indices operate on a list like the following

        SomeClass.__module__.split('.')  # ['package', 'subpackage', 'submodule']

    Some example slices are

        None            --> 'SomeClass'  # Take the object name as-is without any prefix
        slice(-1, None) --> 'submodule.SomeClass'
        slice(-2, None) --> 'subpackage.submodule.SomeClass'
        slice(0, 1)     --> 'package.SomeClass'
        slice(1, 2)     --> 'subpackage.SomeClass'

    The same logic applies for ``group_prefix`` which affects the returned ``group``
    key. In this case, the default is to remove the top-level ``package``

    This group/name behavior can be overridden by either
        1) Setting a module-level dict called ``__hydra_zen__`` in any script
           on the target object's path. For example, submodule.py, subpackage/__init__.py,
           or package/__init__.py. The dict should have keys matching the parameters
           to ``ConfigStore.store()`` and string values only. If multiple ``__hydra_zen__``
           settings are found, the one closest to the object (i.e. submodule.py) will
           take precedence. For instance, subpackage/__init__.py could override with

                __hydra_zen__ = {'group': 'subpackage'}  # Instead of 'subpackage/submodule'

        2) Specifying any of the ``ConfigStore.store()`` parameters as keyword arguments.
           This takes precedent over any ``__hydra_zen__`` settings at the module level

    Config nodes often rely on OmegaConf interpolations to provide default values based on
    other sections of the config. However, using interpolation strings directly in the
    constructor defaults could make the class impossible to use outside a Hydra context
    (since the required resolvers might not be registered). Therefore, the recommended
    practice (and format expected by this function) is to create a class-attribute
    ``interpolations`` which is a dictionary of resolvers. This allows the class to be
    used in plain Python while also being configurable for Hydra. See the following
    discussion: https://github.com/mit-ll-responsible-ai/hydra-zen/discussions/142

    The ``interpolations`` can also be used to append to Hydra's defaults list. See discussion
    https://github.com/mit-ll-responsible-ai/hydra-zen/discussions/142#discussioncomment-2503456
    https://github.com/facebookresearch/hydra/discussions/2126#discussioncomment-2511512

        ```python

        class MyClass:
            '''Concrete example in datamodule.dataset.classfication.ImageClassification'''

            interpolations = {
                'zen_meta': {
                    'defaults': [
                        {'/config/section@_group_': 'NewChoice'}]}}

            # Need to replace '/config/section' and 'NewChoice' with real values
            # Key requires 'override' prefix if the defaults list already has an entry for that config group
        ```

    Similar work currently has feature request: https://github.com/facebookresearch/hydra/issues/1982
    Reference for class decorator with arguments: https://stackoverflow.com/a/4262943/7446465

    :param Any obj: Class, function, or class method to build config for
    :param bool populate_full_signature: Overriding the default behavior for ``hydra_zen.builds()``
    :param slice group_prefix: What portion(s) of the object's module hierarchy to
        retain in the ConfigStore ``group``. See example above for ``name_prefix``
    :param Optional[slice] name_prefix: What portion(s) of the object's module hierarchy to
        retain in the ConfigStore ``name``. See example above
    :param bool zen_just: If enabled, changes the build function for the config node from
        ``builds`` to ``just``. Useful if the instantiated node needs to return a class
        type rather than a concrete instance. See the Hydra-Zen docs for examples:
        https://mit-ll-responsible-ai.github.io/hydra-zen/generated/hydra_zen.just.html
    :param kwargs: Anything supported by Hydra-Zen's ``builds()`` or Hydra's
        ``ConfigStore.store()``. For example zen_partial, group, name, package, and
        provider. If none are provided, the behavior described above infers an
        appropriate default. Can also include overrides (i.e. interpolations or changes
        in default values) for parameters specific to the object
    :return dict store_kwargs: Intended for use with ``ConfigStore.store()``. See
        usage in ``register_modules()`` for the actual storing execution
    """

    # TODO allow ``__hydra_zen__`` to override any of the boolean parameters
    # TODO refactor ``interpolations`` class-attribute to ``__hydra__`` (or zen)
    # TODO evaluate whether ``*_prefix`` parameters would be better handled with Hydra package overrides

    # Validate/partition the keyword arguments and existing class attributes
    if 'node' in kwargs:
        raise ValueError('node is created dynamically via ``hydra_zen.builds()`` and should not be provided')
    configstore_sig = inspect.signature(ConfigStore.store)
    builds_sig = inspect.signature(builds)
    obj_sig = inspect.signature(obj.__init__) if inspect.isclass(obj) else inspect.signature(obj)
    supports_kwargs = any([True for p in obj_sig.parameters.values() if p.kind == p.VAR_KEYWORD])
    configstore_kwargs = {}
    builds_kwargs = {}
    interps = {}
    for k, v in kwargs.items():
        if k in configstore_sig.parameters:
            configstore_kwargs.update({k: v})
        elif k in builds_sig.parameters:
            builds_kwargs.update({k: v})
        elif k in obj_sig.parameters or supports_kwargs:
            interps.update({k: v})
        else:
            raise ValueError(
                f'kwarg {k} must match signature of {obj.__name__}, Hydra ConfigStore, or Hydra-Zen builds')
    if hasattr(obj, HYDRA_CFG_ATTR):
        config_from_parent = False
        cfg = getattr(obj, HYDRA_CFG_ATTR)['node']
        final_target = get_target(cfg)
        for base_cls in inspect.getmro(obj):
            if base_cls.__name__ == final_target.split('.')[-1]:
                config_from_parent = True
                break
        if not config_from_parent:
            raise RuntimeError(f'{obj.__name__}.{HYDRA_CFG_ATTR} exists, refusing to override from decorator')

    # Determine the group, package, provider parameters to ``ConfigStore.store()``
    # Done in order of increasing precedence so that the latest will overwrite prior defaults
    # The final fallback is storing based on the class/function's place in the package
    parents = obj.__module__.split('.')
    node_name = [] if name_prefix is None else parents[name_prefix]
    if hasattr(obj, '__self__'):
        node_name.append(obj.__self__.__name__)  # Classmethod targets like ``flash.ObjectDetectionData.from_coco``
    else:
        node_name.append(obj.__name__)
    store_kwargs = {
        'group': '/'.join(parents[group_prefix]),
        'name': '.'.join(node_name)}

    # If any modules have a ``__hydra_zen__`` dict, use the one closest to the object
    module_hierarchy = get_module_hierarchy(obj, deepest_first=True)
    for module in module_hierarchy:
        overrides = getattr(module, '__hydra_zen__', None)
        if overrides is not None:
            # FIXME need to allow ``__hydra_zen__`` to override ``group/name_prefix`` parameters
            store_kwargs.update(overrides)
            break

    # Kwargs specified with the decorator have the final say
    store_kwargs.update(configstore_kwargs)

    # Add the dynamic config node from Hydra-Zen
    interps.update(getattr(obj, 'interpolations', {}))
    if zen_just:
        dynamic_cfg = just(obj)
    else:
        dynamic_cfg = builds(obj, populate_full_signature=populate_full_signature, **interps, **builds_kwargs)
    store_kwargs.update({'node': dynamic_cfg})
    return store_kwargs


def get_module_hierarchy(obj, deepest_first=True):
    """Get list of each module within an object's import hierarchy

    :param object obj: The class or function
    :param bool deepest_first: Whether to return the deepest nested module
        first in the list. For example

        ```
        True --> ['pkg.sub_pkg.submod', 'pkg.sub_pkg', 'pkg']
        False --> ['pkg', 'pkg.sub_pkg', 'pkg.sub_pkg.submod']
        ```

    :return List[module] module_hierarchy:
    """
    module_hierarchy = []
    parents = obj.__module__.split('.')
    for depth in range(len(parents), 0, -1):
        module = importlib.import_module('.'.join(parents[:depth]))
        module_hierarchy.append(module)
    if not deepest_first:
        module_hierarchy.reverse()
    return module_hierarchy


def register_modules(*modules):
    """Recursively check for __hydra_config__ attributes and add them to the ConfigStore

    The ``my_store`` decorator above cannot perform this without explicitly
    importing each decorated class, and even then the ConfigStore addition
    would be an import side-effect which is debate-able in terms of best
    practices. This function allows us to be explicit about which configs
    will be available to ``@hydra.main()`` at runtime

    :param Iterable[ModuleType] modules: Imported module(s) whose members will be recursively searched
    :return None: Stores nodes in the ConfigStore
    """

    def inspect_module_hydra(module):
        """Shared logic between packages and modules

        :param ModuleType module: Module whose object members should be searched
        :return None: Adds nodes to the Hydra ConfigStore
        """
        cs = ConfigStore.instance()
        for name, obj in inspect.getmembers(module, predicate=lambda obj: hasattr(obj, HYDRA_CFG_ATTR)):
            store_kwargs = getattr(obj, HYDRA_CFG_ATTR)
            if not isinstance(store_kwargs, dict):
                raise TypeError(
                    f'Expected dict type for {obj.__name__}.{HYDRA_CFG_ATTR}, got {type(store_kwargs)}')
            cs.store(**obj.__hydra_config__)

    for mod in modules:
        prefix = mod.__name__ + '.'

        # As mentioned in the Python docs (https://docs.python.org/3/reference/import.html#packages)
        # Packages are just a special kind of module that contains a ``__path__`` attribute
        # In this case, we need to inspect all modules within the package
        if hasattr(mod, '__path__'):
            inspect_module_hydra(mod)  # Don't forget the package itself for content within ``__init__.py``
            for importer, modname, ispkg in pkgutil.walk_packages(mod.__path__, prefix):
                module = importer.find_module(modname).load_module(modname)
                inspect_module_hydra(module)

        # Otherwise, we can inspect the module members directly
        else:
            inspect_module_hydra(mod)

@rsokl
Copy link
Contributor Author

rsokl commented Oct 31, 2022

This seems like a re-vamp of #93, right?

Yep! The main difference, though, is that it is focused on registering configs in a config-store rather than attaching configs to the target-object (e.g., assigning func.config = builds(func))

Mine wound up being quite similar to what @Jasha10 and @rsokl propose with dedicated storage calls

Great, it is good to see that what I proposed is a pattern that you already have found to be useful in your work. That is definitely encouraging. Your insights here are much appreciated!

A couple features that I would suggest adding

  1. Easy registration of multiple modules in one function call. I think it will be quite common to import several ZenStore objects and it'd be nice to avoid typing out the .add_to_hydra_store() for each one. Perhaps a way to merge multiple into a parent ZenStore, and then the "add to Hydra" step gets called once for the parent?

This is interesting to think about. In terms of registering multiple stores, the easiest pattern to avoid repetition may simply be:

for _store in [store, data_store, model_store]:
    _store.add_to_hydra_store()

that being said, the ability to merge stores may be valuable for other reasons (e.g. to guard against accidentally overriding configs between stores). I'll look into this!

  1. Some default class attribute that gets transferred to the to_config_kw. In my code, I'd called it interpolations as originally discussed hydra-zen Gotchya List #142 (comment), but I think it would be more appropriate as a dunder

This is something that you could implement in a to_config override:

from hydra_zen.store import default_to_config

def my_to_config(target, **kw):
    attr_overrides = getattr(target, "__hydra__", {})
    return default_to_config(target, **{**kw, **attr_overrides})

As you can tell, I've been hesitant to move forward with a standard attribute that hydra-zen looks for to derive config parameters. I have been wanting to introduce functions like zen and store first, which will likely have substantial impacts on common code patterns in hydra-zen user code, and wait for the dust to settle, so to speak, to inform my vision of standardizing on protocols for designing "hydra-centric" library code. In the meantime, I am all for library authors devising their own schemes, and I am eager for hydra-zen to facilitate those schemes as much as possible.

  1. A standard best-practice for registering any associated OmegaConf resolvers along with the config nodes that use them. So far I haven't figured out a great solution for this

I will admit, I avoid using resolvers as much as possible. So I don't have any insights along these lines. But it would be great if store can help with this, in the same way that it encourages users to treat the config store locally. Perhaps @Jasha10 has thoughts on this.

  1. Default config group and name for the to_config parameter of store(). If I have lib/model/classification.py:MyModel, the most logical config store placement seems to be group='model', name='classification.MyModel. I realize I could override this once for my library with the self-partialing feature, so maybe it is best left up to library authors to define their default storage schema rather than code it into Hydra-Zen

I do think it is best left up to the library authors. Indeed, this is why I implemented self-partialing feature: to make it trivial/frictionless for library authors to implement their own defaults. This reduces the burden on hydra-zen to document and solicit a particular group/name schema, and enables authors to do what works best for them. (That being said, I would love to capture the system that you have devised and include a toy version of it in our docs, to help guide other library authors.

Thanks so much for including your implementation of my_store. It will be helpful for me to review this and mine it for insights & feature ideas. Our previous discussions and your inputs on facebookresearch/hydra#1982 greatly inspired this current PR.

@Jasha10
Copy link
Contributor

Jasha10 commented Nov 3, 2022

standard best-practice for registering any associated OmegaConf resolvers along with the config nodes that use them

I'm curious to see how you've been using resolvers, @addisonklinke. Would you be willing to share an example?

I do use resolvers in my own projects, but I try to limit their scope (e.g registering just a few utility resolvers at the top level of my @hydra.main function). I've shied away from extensive use of resolvers for the following reasons:

  • portability concerns (which might be resolved if we can develop best-practices per @addisonklinke's comment above)
  • lack of type safety / tooling support. Linters and type checkers won't tell me when I makes improper use of a resolver (e.g. if there's a TypeError), and Hydra will not raise an error at the time of config composition.
  • It feels like a code smell to put class-local business logic on the OmegaConf side rather than on the python side. (That being said, I think there's a good argument to be made in favor of registering well-behaved resolvers for global use as a configuration DSL.)

Edit: I take back what I said about not putting business logic into OmegaConf resolvers. Being able to do arithmetic with interpolated values seems super useful.

@rsokl

This comment was marked as outdated.

@rsokl rsokl marked this pull request as ready for review November 4, 2022 03:07
@addisonklinke
Copy link

@Jasha10 I'm curious to see how you've been using resolvers. Would you be willing to share an example?

Sure here are a couple examples where I used a custom resolver

  1. Setting the appropriate trainer plugins in PyTorch Lightning depending on which accelerator was configured
  2. Turning off dataset shuffling for trainings with limited batches

I believe both cases could almost be done using the built-in oc.select resolver, except that doing value comparison would require another nested resolver with a boolean expression. Here are my implementations

from omegaconf import OmegaConf

def ddp_unused_false(_root_):
    """Set plugin flags based on configured accelerator

    In Lightning 1.2.4, this PR set the DDP default for ``find_unused_parameters``
    to true to preserve backwards compatibility. However, in most cases is not
    recommended due to adverse performance
    https://github.com/PyTorchLightning/pytorch-lightning/pull/6438

    While there is a trainer plugin flag to force false, we don't want this to
    be our default configuration. If we did, Lightning would issue warnings on
    any machine with less than 2 GPUs (since DDP cannot be used in that case).
    Instead, we can use a resolver to check whether the DDP accelerator was
    selected at runtime. If so, set the plugin config, or if not, use the
    default trainer value of ``plugins=None``

    If you want DDP to search for unused parameters, you can still override on
    the CLI with ``lightning.trainer='{accelerator:ddp, plugins:null}'``

    :param omegaconf.BaseContainer _root_: Root config node
    :return Optional[str]: To pass as the plugin kwarg to Trainer
    """
    if OmegaConf.select(_root_, 'lightning.trainer.accelerator') == 'ddp':
        plugin_conf = 'ddp_find_unused_parameters_false'
    else:
        plugin_conf = None
    return plugin_conf


def only_shuffle_unlimited(_root_):
    """Turn off datamodule trainset shuffling if trainer is limiting batches

    Otherwise each epoch will see a different subset of the trainset
    batches which defeats the purpose of limiting to overfit on a
    smaller dataset

    :param omegaconf.BaseContainer _root_: Root config node
    :return bool: To pass as the ``shuffle_train`` kwarg to ``DatasetWrapper``
    """
    if OmegaConf.select(_root_, 'lightning.trainer.limit_train_batches') == 1.0:  # Float --> use 100% of batches
        return True
    return False

@Jasha10
Copy link
Contributor

Jasha10 commented Nov 8, 2022

Sure here are a couple examples where I used a custom resolver

I see. Thanks!

@rsokl rsokl merged commit 1024847 into main Nov 21, 2022
@rsokl rsokl deleted the store branch November 21, 2022 23:06
@rsokl
Copy link
Contributor Author

rsokl commented Nov 22, 2022

@jgbos @Jasha10 @addisonklinke hydra_zen.store and hydra_zen.ZenStore are now available to try out via the latest pre-release: https://mit-ll-responsible-ai.github.io/hydra-zen/changes.html#rc4-2022-11-21 🥳

If you get the chance to use these, let me know if you have any feedback on documentation, design, ergonomics, etc. ! You can check out the reference docs here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants