-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strict validation against Hydra's supported primitives #163
Conversation
I'm not sure this is clear enough for beginners. Do you mean that a user should try to build a configuration for the value rather than directly using the value, e.g., Maybe just reorder the sentence, "Consider creating a configuration for the particular value using hydra_zen.builds". |
Your logic on the the "exception vs. warning" is good. I can only think of one use case that might benefit from warnings instead: what if the user doesn't use the Hydra backend but wants to use the configs? I know, not a strong argument against exceptions, just trying to think of any good reason to go to warnings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of changes, nothing pops out as unusual. Excited to have complex support :)
I just realized that we should include omegaconf containers among the supported primitives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just used the enum support for some configs :)
Finally wrapping up on this. I added some additional discussion and remarks on implementation details in the original post. |
Closes #149
Adding strict validation against Hydra's supported primitives
Prior to this PR, hydra-zen would not check configured values to see if they are compatible with Hydra's supported primitives. This means that you could construct a config via
builds
andmake_config
that are destined to raise whenever Hydra tries to serialize or instantiate the config. This is no good!Before:
Also, our annotations did not provide any feedback along these lines
After:
Now
builds
,make_config
, andhydrated_dataclass
strive to strictly validate all configured values -- whether they are specified by the user or auto-populated from an object's signature. The goal here is to catch mistakes in configs, which would eventually be "snagged" by omegaconf/Hydra, before you even launch your app.I hope that the error message is sufficiently clear
We even peek inside containers and validate their contents
Note that all of our type-checking used for this validation should adhere to the form:
type(x) is in SUPPORTED_TYPE
, and notisinstance(x, SUPPORTED_TYPES)
because we do not handle/permit subclasses of valid primitive types.Providing Support for Additional Primitives
This PR isn't just about narc-ing on bad configs, it also adds new capabilities! We can add support for common types that Hydra doesn't handle. Presently, we add support for:
We're pretty fancy here - we even permit things like sets of complex numbers!
Improved Annotations
The annotation for our config-creating functions are much better now; type-checkers will now flag bad inputs. The following reflects the behavior of pyright/pylance under the "basic" type-checking mode.
Feedback on this PR
Exception vs Warning (for now)
Given that we raise an exception for this validation, upgrading from v0.3 to v0.4 could be a rude awakening for folks who are, for some reason, using hydra-zen to a bunch of ultimately-invalid configs.
I have been considering having this PR raise a warning on bad-configurations, and note that in a future version the warning will become an exception. The reasons why I opted for going the more abrupt route of raising an error is:
builds
andmake_config
in a way that is detached from Hydra... but that really isn't whathydra_zen
is for, and I wouldn't be surprised if literally no one is doing this.The above is my reasoning on this topic. I am open to discussion if there are other opinions.
Any Other Low-Hanging Fruit for Supporting Primitives?
I plan to add support for
numpy.ndarray
andtorch.Tensor
, but in a separate PR because handling the dynamic imports is a bit trickier.That all being said, are there other primitives, beyond
set
,frozenset
,complex
, andpathlib.Path
, that we should add support for? For example,pathlib.Path
jumped out at me as being super useful. Obviously there are plenty of fish in the std-lib sea; support can be added incrementally.Possible Future Capability: Enable Users to Register Their Own Primitives / Conversion Strategies
There are various ways by which we could enable users to register new types and corresponding conversion strategies.
The biggest "Pro" is that if someone is regularly using some custom type in their config, registering a converter makes config-design much more elegantly.
The biggest "Con" that I can think of is that this can introduce "secret" dependencies on dynamic code. E.g. A user's config-generating code can only be run after that converter has been registered. Fortunately, the resulting config (either the dataclass or yaml) should be independent of this dependency. This is an improvement of omegaconf's capability for registering resolvers -- where even yamls can become secretly tethered to third-party code.
Implementation Details
Restricted Primitives for Dictionary Keys
omegaconf does not permit dataclasses as keys in dictionaries, even if they would instantiate to a hashable value. Thus the extended primitive support is not provided for dict-keys
No Support for
attrs
(for now)We currently do not provide any support for the attrs library; it is likely that
attrs
instances will get caught by our validation in its current form. This is not likely to impact most hydra-zen users in the near term, but we should eventually add support for it.Support for Enum
Although Omegaconf provides support for
Enum
, the yaml serialization process only retains the enum-member's name:It appears that the enum-member can only be restored if a schema is provided that contains the
Enum
subclass in the annotation. Thus hydra-zen instead provides specialized support forEnum
:this way, one can roundtrip from a yaml and restore the actual enum-member: