-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds hydra_zen.store
#331
Adds hydra_zen.store
#331
Conversation
@Jasha10 @addisonklinke @jgbos It'd be great to get your impressions of |
Awesome. I agree with the issues since they were my two biggest concerns for storing configs directly on functions were For |
Perhaps this could be handled on the Hydra side? Hydra's ConfigStore could issue a UserWarning if a previously-stored config node would be overwritten.
Agreed. In my own applications I've gravitated towards the following pattern: # contents of my_lib/math.py
def add(x, y): ...
def store_configs():
store(add)
... Then in my |
Thanks for the insights @Jasha10 I was chatting with @jgbos and we are thinking of modifying this PR to introduce class ZenStore:
_internal_repo = dict()
def __call__(*a, **kw):
# same functionality as the current `store` fn/decorator, but
# registers constructed node to `self._internal_repo`
config = # make config
name = # get name etc.
assert (name, group, etc) not in self._internal_repo
self._internal_repo[(name, group, etc)] = config
def add_to_hydra_store(self):
for key, config in self._internal_repo.items():
ConfigStore.instance().store(**key, node=config)
# plus nice repr/gettr for inspecting internal repo and accessing configs This is meant to get users to think about stored configs in a stateful and local way -- local to specific instances of their So someone's from my_lib.math import store as math_store
from my_lib.data import store as data_store
from hydra_zen import store, zen
@store(name="main_task", num_epochs=10)
def task(num_epochs, model, optim, trainer):
...
if __name__ == "__main__":
math_store.add_to_hydra_store()
data_store.add_to_hydra_store()
store.add_to_hydra_store()
zen(task).hydra_main(config_name="main_task", config_path=None, version_base="1.2") One thing that I like about your Lastly, one neat thing that manifests from this test_store = ZenStore()
demo_store = ZenStore()
experiment_store = ZenStore() I have seen library authors resort to some interesting tricks to switch many defaults en-masse based on different contexts. |
Summary This seems like a re-vamp of #93, right? Definitely looking more polished this time around, and I'm excited to have it as part of the Hydra-Zen library! 🎉 As you'll see, I think my custom implementation is much more clunky and I'd be happy to use something more official Entry Script Patterns Mine wind up being quite similar to what @Jasha10 and @rsokl propose with dedicated storage calls from hydra.core.config_store import ConfigStore
from lib.config.entry import TrainConf # The only manual dataclass, for top-level components
from lib.config.registration import register_modules
from lib import datamodule, model
def register_configs():
register_modules(datamodule, model)
cs = ConfigStore.instance()
cs.store(name='train_conf', node=TrainConf)
@hydra.main(config_name='train_conf', config_path=None)
def train(cfg):
...
if __name__ == '__main__':
register_configs()
train() In a given module I have something like @my_store()
class MyModel:
... Where Differences / Desired Features On the whole, I could readily convert my implementations of
Details If you're curious about my implementation details for auto-detecting config group/name (desired feature 3 above) and/or the import importlib
import inspect
import pkgutil
from hydra.core.config_store import ConfigStore
from hydra_zen import builds, just, get_target
HYDRA_CFG_ATTR = '__hydra_config__'
def my_store(*args, **kwargs):
"""Decorator to store the return of ``get_configstore_kwargs()`` in a ``__hydra_config__`` attribute
Note: to decorate a dataclass, ``@dataclass`` must be the inner decorator
:param args/kwargs: Matching the signature of ``get_configstore_kwargs()``. To override
these in multiple uses within a single module, you can shadow the imported ``my_store``
using ``functools.partial``. For example
```python
from functools import partial
from bml.config.registration import my_store
new_my_store = partial(my_store, zen_just=True)
```
"""
def wrapper(cls):
configstore_kwargs = get_configstore_kwargs(cls, *args, **kwargs)
setattr(cls, HYDRA_CFG_ATTR, configstore_kwargs) # TODO read-only attr which can't be modified later
return cls
return wrapper
def get_configstore_kwargs(
obj,
populate_full_signature=True,
group_prefix=slice(1, None),
name_prefix=None,
zen_just=False,
**kwargs):
"""Auto-build config node and determine its storage location within the Hydra ConfigStore
Used both with the ``my_store`` decorator and in standalone calls to register objects
from 3rd party libraries. Prerequisite knowledge of Hydra structured configs is recommended
https://hydra.cc/docs/next/tutorials/structured_config/intro/#internaldocs-banner
To illustrate the default behavior, take the following structure
package/
__init__.py
subpackage/
__init__.py
submodule.py # Contains object ``SomeClass``
Calling ``get_configstore_kwargs(SomeClass)`` would return the following dict
{
'group': 'subpackage/submodule',
'name': 'SomeClass',
'node': builds(SomeClass)
}
Providing a slice for ``name_prefix`` incorporates various elements of the
object's ``__module__`` path into the returned ``name`` key. Positive
indexing starts from the top-level package while negative indexing goes from
the innermost submodule. Indices operate on a list like the following
SomeClass.__module__.split('.') # ['package', 'subpackage', 'submodule']
Some example slices are
None --> 'SomeClass' # Take the object name as-is without any prefix
slice(-1, None) --> 'submodule.SomeClass'
slice(-2, None) --> 'subpackage.submodule.SomeClass'
slice(0, 1) --> 'package.SomeClass'
slice(1, 2) --> 'subpackage.SomeClass'
The same logic applies for ``group_prefix`` which affects the returned ``group``
key. In this case, the default is to remove the top-level ``package``
This group/name behavior can be overridden by either
1) Setting a module-level dict called ``__hydra_zen__`` in any script
on the target object's path. For example, submodule.py, subpackage/__init__.py,
or package/__init__.py. The dict should have keys matching the parameters
to ``ConfigStore.store()`` and string values only. If multiple ``__hydra_zen__``
settings are found, the one closest to the object (i.e. submodule.py) will
take precedence. For instance, subpackage/__init__.py could override with
__hydra_zen__ = {'group': 'subpackage'} # Instead of 'subpackage/submodule'
2) Specifying any of the ``ConfigStore.store()`` parameters as keyword arguments.
This takes precedent over any ``__hydra_zen__`` settings at the module level
Config nodes often rely on OmegaConf interpolations to provide default values based on
other sections of the config. However, using interpolation strings directly in the
constructor defaults could make the class impossible to use outside a Hydra context
(since the required resolvers might not be registered). Therefore, the recommended
practice (and format expected by this function) is to create a class-attribute
``interpolations`` which is a dictionary of resolvers. This allows the class to be
used in plain Python while also being configurable for Hydra. See the following
discussion: https://github.com/mit-ll-responsible-ai/hydra-zen/discussions/142
The ``interpolations`` can also be used to append to Hydra's defaults list. See discussion
https://github.com/mit-ll-responsible-ai/hydra-zen/discussions/142#discussioncomment-2503456
https://github.com/facebookresearch/hydra/discussions/2126#discussioncomment-2511512
```python
class MyClass:
'''Concrete example in datamodule.dataset.classfication.ImageClassification'''
interpolations = {
'zen_meta': {
'defaults': [
{'/config/section@_group_': 'NewChoice'}]}}
# Need to replace '/config/section' and 'NewChoice' with real values
# Key requires 'override' prefix if the defaults list already has an entry for that config group
```
Similar work currently has feature request: https://github.com/facebookresearch/hydra/issues/1982
Reference for class decorator with arguments: https://stackoverflow.com/a/4262943/7446465
:param Any obj: Class, function, or class method to build config for
:param bool populate_full_signature: Overriding the default behavior for ``hydra_zen.builds()``
:param slice group_prefix: What portion(s) of the object's module hierarchy to
retain in the ConfigStore ``group``. See example above for ``name_prefix``
:param Optional[slice] name_prefix: What portion(s) of the object's module hierarchy to
retain in the ConfigStore ``name``. See example above
:param bool zen_just: If enabled, changes the build function for the config node from
``builds`` to ``just``. Useful if the instantiated node needs to return a class
type rather than a concrete instance. See the Hydra-Zen docs for examples:
https://mit-ll-responsible-ai.github.io/hydra-zen/generated/hydra_zen.just.html
:param kwargs: Anything supported by Hydra-Zen's ``builds()`` or Hydra's
``ConfigStore.store()``. For example zen_partial, group, name, package, and
provider. If none are provided, the behavior described above infers an
appropriate default. Can also include overrides (i.e. interpolations or changes
in default values) for parameters specific to the object
:return dict store_kwargs: Intended for use with ``ConfigStore.store()``. See
usage in ``register_modules()`` for the actual storing execution
"""
# TODO allow ``__hydra_zen__`` to override any of the boolean parameters
# TODO refactor ``interpolations`` class-attribute to ``__hydra__`` (or zen)
# TODO evaluate whether ``*_prefix`` parameters would be better handled with Hydra package overrides
# Validate/partition the keyword arguments and existing class attributes
if 'node' in kwargs:
raise ValueError('node is created dynamically via ``hydra_zen.builds()`` and should not be provided')
configstore_sig = inspect.signature(ConfigStore.store)
builds_sig = inspect.signature(builds)
obj_sig = inspect.signature(obj.__init__) if inspect.isclass(obj) else inspect.signature(obj)
supports_kwargs = any([True for p in obj_sig.parameters.values() if p.kind == p.VAR_KEYWORD])
configstore_kwargs = {}
builds_kwargs = {}
interps = {}
for k, v in kwargs.items():
if k in configstore_sig.parameters:
configstore_kwargs.update({k: v})
elif k in builds_sig.parameters:
builds_kwargs.update({k: v})
elif k in obj_sig.parameters or supports_kwargs:
interps.update({k: v})
else:
raise ValueError(
f'kwarg {k} must match signature of {obj.__name__}, Hydra ConfigStore, or Hydra-Zen builds')
if hasattr(obj, HYDRA_CFG_ATTR):
config_from_parent = False
cfg = getattr(obj, HYDRA_CFG_ATTR)['node']
final_target = get_target(cfg)
for base_cls in inspect.getmro(obj):
if base_cls.__name__ == final_target.split('.')[-1]:
config_from_parent = True
break
if not config_from_parent:
raise RuntimeError(f'{obj.__name__}.{HYDRA_CFG_ATTR} exists, refusing to override from decorator')
# Determine the group, package, provider parameters to ``ConfigStore.store()``
# Done in order of increasing precedence so that the latest will overwrite prior defaults
# The final fallback is storing based on the class/function's place in the package
parents = obj.__module__.split('.')
node_name = [] if name_prefix is None else parents[name_prefix]
if hasattr(obj, '__self__'):
node_name.append(obj.__self__.__name__) # Classmethod targets like ``flash.ObjectDetectionData.from_coco``
else:
node_name.append(obj.__name__)
store_kwargs = {
'group': '/'.join(parents[group_prefix]),
'name': '.'.join(node_name)}
# If any modules have a ``__hydra_zen__`` dict, use the one closest to the object
module_hierarchy = get_module_hierarchy(obj, deepest_first=True)
for module in module_hierarchy:
overrides = getattr(module, '__hydra_zen__', None)
if overrides is not None:
# FIXME need to allow ``__hydra_zen__`` to override ``group/name_prefix`` parameters
store_kwargs.update(overrides)
break
# Kwargs specified with the decorator have the final say
store_kwargs.update(configstore_kwargs)
# Add the dynamic config node from Hydra-Zen
interps.update(getattr(obj, 'interpolations', {}))
if zen_just:
dynamic_cfg = just(obj)
else:
dynamic_cfg = builds(obj, populate_full_signature=populate_full_signature, **interps, **builds_kwargs)
store_kwargs.update({'node': dynamic_cfg})
return store_kwargs
def get_module_hierarchy(obj, deepest_first=True):
"""Get list of each module within an object's import hierarchy
:param object obj: The class or function
:param bool deepest_first: Whether to return the deepest nested module
first in the list. For example
```
True --> ['pkg.sub_pkg.submod', 'pkg.sub_pkg', 'pkg']
False --> ['pkg', 'pkg.sub_pkg', 'pkg.sub_pkg.submod']
```
:return List[module] module_hierarchy:
"""
module_hierarchy = []
parents = obj.__module__.split('.')
for depth in range(len(parents), 0, -1):
module = importlib.import_module('.'.join(parents[:depth]))
module_hierarchy.append(module)
if not deepest_first:
module_hierarchy.reverse()
return module_hierarchy
def register_modules(*modules):
"""Recursively check for __hydra_config__ attributes and add them to the ConfigStore
The ``my_store`` decorator above cannot perform this without explicitly
importing each decorated class, and even then the ConfigStore addition
would be an import side-effect which is debate-able in terms of best
practices. This function allows us to be explicit about which configs
will be available to ``@hydra.main()`` at runtime
:param Iterable[ModuleType] modules: Imported module(s) whose members will be recursively searched
:return None: Stores nodes in the ConfigStore
"""
def inspect_module_hydra(module):
"""Shared logic between packages and modules
:param ModuleType module: Module whose object members should be searched
:return None: Adds nodes to the Hydra ConfigStore
"""
cs = ConfigStore.instance()
for name, obj in inspect.getmembers(module, predicate=lambda obj: hasattr(obj, HYDRA_CFG_ATTR)):
store_kwargs = getattr(obj, HYDRA_CFG_ATTR)
if not isinstance(store_kwargs, dict):
raise TypeError(
f'Expected dict type for {obj.__name__}.{HYDRA_CFG_ATTR}, got {type(store_kwargs)}')
cs.store(**obj.__hydra_config__)
for mod in modules:
prefix = mod.__name__ + '.'
# As mentioned in the Python docs (https://docs.python.org/3/reference/import.html#packages)
# Packages are just a special kind of module that contains a ``__path__`` attribute
# In this case, we need to inspect all modules within the package
if hasattr(mod, '__path__'):
inspect_module_hydra(mod) # Don't forget the package itself for content within ``__init__.py``
for importer, modname, ispkg in pkgutil.walk_packages(mod.__path__, prefix):
module = importer.find_module(modname).load_module(modname)
inspect_module_hydra(module)
# Otherwise, we can inspect the module members directly
else:
inspect_module_hydra(mod) |
Yep! The main difference, though, is that it is focused on registering configs in a config-store rather than attaching configs to the target-object (e.g., assigning
Great, it is good to see that what I proposed is a pattern that you already have found to be useful in your work. That is definitely encouraging. Your insights here are much appreciated!
This is interesting to think about. In terms of registering multiple stores, the easiest pattern to avoid repetition may simply be: for _store in [store, data_store, model_store]:
_store.add_to_hydra_store() that being said, the ability to merge stores may be valuable for other reasons (e.g. to guard against accidentally overriding configs between stores). I'll look into this!
This is something that you could implement in a from hydra_zen.store import default_to_config
def my_to_config(target, **kw):
attr_overrides = getattr(target, "__hydra__", {})
return default_to_config(target, **{**kw, **attr_overrides}) As you can tell, I've been hesitant to move forward with a standard attribute that hydra-zen looks for to derive config parameters. I have been wanting to introduce functions like
I will admit, I avoid using resolvers as much as possible. So I don't have any insights along these lines. But it would be great if
I do think it is best left up to the library authors. Indeed, this is why I implemented self-partialing feature: to make it trivial/frictionless for library authors to implement their own defaults. This reduces the burden on hydra-zen to document and solicit a particular group/name schema, and enables authors to do what works best for them. (That being said, I would love to capture the system that you have devised and include a toy version of it in our docs, to help guide other library authors. Thanks so much for including your implementation of |
I'm curious to see how you've been using resolvers, @addisonklinke. Would you be willing to share an example? I do use resolvers in my own projects, but I try to limit their scope (e.g registering just a few utility resolvers at the top level of my
Edit: I take back what I said about not putting business logic into OmegaConf resolvers. Being able to do arithmetic with interpolated values seems super useful. |
This comment was marked as outdated.
This comment was marked as outdated.
Sure here are a couple examples where I used a custom resolver
I believe both cases could almost be done using the built-in from omegaconf import OmegaConf
def ddp_unused_false(_root_):
"""Set plugin flags based on configured accelerator
In Lightning 1.2.4, this PR set the DDP default for ``find_unused_parameters``
to true to preserve backwards compatibility. However, in most cases is not
recommended due to adverse performance
https://github.com/PyTorchLightning/pytorch-lightning/pull/6438
While there is a trainer plugin flag to force false, we don't want this to
be our default configuration. If we did, Lightning would issue warnings on
any machine with less than 2 GPUs (since DDP cannot be used in that case).
Instead, we can use a resolver to check whether the DDP accelerator was
selected at runtime. If so, set the plugin config, or if not, use the
default trainer value of ``plugins=None``
If you want DDP to search for unused parameters, you can still override on
the CLI with ``lightning.trainer='{accelerator:ddp, plugins:null}'``
:param omegaconf.BaseContainer _root_: Root config node
:return Optional[str]: To pass as the plugin kwarg to Trainer
"""
if OmegaConf.select(_root_, 'lightning.trainer.accelerator') == 'ddp':
plugin_conf = 'ddp_find_unused_parameters_false'
else:
plugin_conf = None
return plugin_conf
def only_shuffle_unlimited(_root_):
"""Turn off datamodule trainset shuffling if trainer is limiting batches
Otherwise each epoch will see a different subset of the trainset
batches which defeats the purpose of limiting to overfit on a
smaller dataset
:param omegaconf.BaseContainer _root_: Root config node
:return bool: To pass as the ``shuffle_train`` kwarg to ``DatasetWrapper``
"""
if OmegaConf.select(_root_, 'lightning.trainer.limit_train_batches') == 1.0: # Float --> use 100% of batches
return True
return False |
I see. Thanks! |
@jgbos @Jasha10 @addisonklinke If you get the chance to use these, let me know if you have any feedback on documentation, design, ergonomics, etc. ! You can check out the reference docs here |
hydra_zen.ZenStore
is an abstraction over Hydra's config store that streamlines the process of generating a config and then storing it in Hydra's config store. It enables users to isolate stores from Hydra's global config store and from one another, so that many stores can be maintained within a project without "clobbering" one another.It is designed to be highly customizable while having defaults that seldom need to be tinkered with. It also has some nice ergonomics (see the self-partialing feature 😄 ).
Basic Usage
This code:
is equivalent to:
and this:
can be rewritten as:
or
Basic Implementation Details
There is a lot more here than meets the eye. It is useful to take a peek at the implementation, which, at its core, is quite simple. Here is a stripped-down implementation:
Note that nearly every aspect of
store
, including how it generates configs, can be customized. Furthermore, thename
,group
, andpackage
fields can be specified as callables, which can be used to, for instance, auto-generate the node's name based on details like its module's path.Bells & Whistles
store
is both a function and a decoratorAs shown above
store
can be used as a function or a decoratorstore
is a passthroughThat is,
store(f) is f
, and extra keywords passed tostore
are passed to the config-creation function. This means that you can stack@store
decorators to store different configs for the same targetthis also streamlines things like creating partial configs
store
is self-partialingThis is really cool.. you can overwrite the default arguments of
store
without usingfunctools.partial
You can specify new defaults endlessly
This makes it easy to customize the behavior of
store
while preserving its signature and type annotations. You can seriously override everything 1Specialized support for dataclasses
The default
to_config
function will create configs that are proper subclasses of a target dataclass. Consider:Thus for each application of
store
,builds
produces a new dataclass-type, which is a subtype ofA
.Responsive type annotations
pyright impresses once again (unfortunately, mypy can't catch the following true positive -- I will open an issue)
Feedback
Does
@store
encourage an anti-pattern?Although I like the
@store
syntax, I am not crazy about having library authors decorate functions throughout their library -- it leads to a "non-locality", where those functions will only be added to the store if they are accessed. E.g. if I haveThe config for
add
only becomes available in Hydra's config store ifmy_lib.math
gets imported somewhere along the way. This feels like an anti-pattern as it leads to spooky action-at-a-distance behavior for unwitting users who may wonder whyadd
is suddenly in their config store2.I realize that library authors could just as easily write:
but this feels like more of a deliberate design decision, whereas the availability of the decorator syntax feels like it is practically begging for this pattern.
Is there a way to discourage or even prevent this? I have considered having some sort of intermediary function,
add_configs_to_store
, that users have to explicitly call to actually update Hydra's config store.I am also concerned with configs silently getting overwritten by configs of the same name.
This is just a preliminary idea, and I'd love to hear other ideas for encouraging best practices with Hydra's config store.
Footnotes
A library author might be interested in patterns similar to the following:
↩That being said, it is quite nice to be able to do:
↩