Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include support to bytes built-in type. #845

Closed
wants to merge 0 commits into from

Conversation

pedropgusmao
Copy link
Contributor

Implements support for bytes in response to #844 .

@Jasha10 Jasha10 linked an issue Dec 22, 2021 that may be closed by this pull request
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 22, 2021

Hi @pedropgusmao, thanks for getting this started!
The implementation looks good.

Here are a few changes that I think should be made:

  • base.py:DictKeyType should accept bytes. Also, the DictConfig._s_validate_and_normalize_key method should be updated to accept a key_type of bytes. (done in commit aa0dd54)
  • Update the OmegaConf docs

Edit (2021-01-30):

This PR is now ready for review. Here are the enabled use-cases:

  • bytes type annotations are now legal in structured config definitions:
@dataclass
class Config:
    data: bytes = b"binary_data"
    s: str = "string data"

cfg = OmegaConf.create(Config)
assert type(cfg.data) is bytes
cfg.data = b"updated_binary_data"

# For safety's sake, there is no implicit conversion:
cfg.data = "string data"  # raises ValidationError: can't convert str -> bytes
cfg.data = 123  # raises ValidationError: can't convert int -> bytes
cfg.s = b"bin_data"  # raises ValidationError: can't convert bytes -> str
  • creation from dictionaries works too
>>> conf = OmegaConf.create({"key": b"value"})
>>> print(conf)
{'key': b'value'}
  • binary data can be round-tripped through yaml (which makes use of yaml's native support for binary data):
>>> yml = OmegaConf.to_yaml(conf)
>>> print(yml)
key: !!binary |
  dmFsdWU=

>>> OmegaConf.create(yml)
{'key': b'value'}
>>> # This yaml support is compatible with Hydra.
  • Binary keys work too:
>>> OmegaConf.create({b"binary_key": 123})
{b'binary_key': 123}

Note that OmegaConf is not tuned for working with big data: a config's contents may be copied multiple times during config manipulation (which would be bad if the data is big), and storing big data in a yaml file is strongly discouraged. Support for bytes is provided as a convenience for working with small amounts of binary data.

@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 22, 2021

There are several places in OmegaConf's tests where each supported type is tested. To keep consistency with the other tests, it would be good to add bytes test cases in these places.

  • tests/test_omegaconf.py:test_is_none
  • tests/test_omegaconf.py:test_is_interpolation
  • tests/test_utils.py:test_node_wrap
  • tests/test_utils.py:test_get_ref_type
  • tests/test_utils.py:test_is_optional
  • tests/test_nodes.py:test_accepts_mandatory_missing
  • tests/test_nodes.py:test_legal_assignment
  • tests/test_nodes.py:test_eq
  • tests/test_pydev_resolver_plugin.py:test_get_dictionary_node``tests/test_pydev_resolver_plugin.py:test_can_provide
  • tests/test_matrix.py:TestNodeTypesMatrix
  • tests/__init__.py:SubscriptedDict
  • tests/test_basic_ops_dict.py:TestDictKeyTypes
  • tests/test_basic_ops_dict.py:test_in_dict
  • tests/test_to_yaml.py:test_to_yaml
  • tests/test_to_yaml.py:test_to_yaml_string_primitive_types_list``tests/test_to_yaml.py:test_to_yaml_string_primitive_types_dict
  • tests/interpolation/test_interpolation.py:test_type_inherit_type
  • tests/test_compare_dictconfig_vs_dict.py:TestUntypedDictConfig
  • tests/test_compare_dictconfig_vs_dict.py:TestUntypeTestPrimitiveTypeDunderMethods
  • tests/structured_conf
  • tests/examples/test_dataclass_example.py:SimpleTypes
  • tests/examples/test_dataclass_example.py:simple_types_class
  • comments in tests/examples/test_dataclass_example.py
  • tests/test_serialization.py:test_pickle_untyped

@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 22, 2021

An open questions:
Are there any values that OmegaConf's interpolation parser should interpret as bytes?
If so, we will need to modify the following:

  • omegaconf/grammar and omegaconf/grammar_parser.py, etc
  • tests/test_grammar.py
  • tests/interpolation/built_in_resolvers/test_oc_decode.py:test_decode

omegaconf/nodes.py Outdated Show resolved Hide resolved
omegaconf/nodes.py Outdated Show resolved Hide resolved
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 22, 2021

I have pushed two commits to this PR branch.

@pedropgusmao
Copy link
Contributor Author

Thanks @Jasha10 ,
I will start working on: #845 (comment) (I might need help here though) and update the documentation for #845 (comment) .

Sorry, I'm not very familiar with OmegaConf's interpolation parser. Where could I learn more about it? #845 (comment)

@pedropgusmao
Copy link
Contributor Author

pedropgusmao commented Dec 22, 2021

working on tests:

  • tests/test_omegaconf.py:test_is_none
  • tests/test_omegaconf.py:test_is_interpolation
  • tests/test_omegaconf.py:test_get_type
  • tests/test_omegaconf.py:test_get_type_on_raw
  • tests/test_utils.py:test_node_wrap
  • tests/test_utils.py:test_get_ref_type
  • tests/test_utils.py:test_is_optional
  • tests/test_nodes.py:test_accepts_mandatory_missing (974c595)
  • tests/test_nodes.py:test_legal_assignment
  • tests/test_nodes.py:test_eq
  • tests/test_pydev_resolver_plugin.py:test_get_dictionary_node
  • tests/test_pydev_resolver_plugin.py:test_can_provide
  • tests/test_matrix.py:TestNodeTypesMatrix
  • tests/init.py:SubscriptedDict
  • tests/test_basic_ops_dict.py:TestDictKeyTypes
  • tests/test_basic_ops_dict.py:TestDelitemKeyTypes
  • tests/test_basic_ops_dict.py:test_dict_pop
  • tests/test_basic_ops_dict.py:test_dict_pop_error
  • tests/test_basic_ops_dict.py:test_in_dict
  • tests/test_to_yaml.py:test_to_yaml
  • tests/test_to_yaml.py:test_to_yaml_string_primitive_types_list
  • tests/test_to_yaml.py:test_to_yaml_string_primitive_types_dict
  • tests/interpolation/test_interpolation.py:test_type_inherit_type
  • tests/test_compare_dictconfig_vs_dict.py:TestUntypedDictConfig
  • tests/test_compare_dictconfig_vs_dict.py:TestUntypeTestPrimitiveTypeDunderMethods
  • tests/structured_conf
  • tests/examples/test_dataclass_example.py:SimpleTypes
  • tests/examples/test_dataclass_example.py:test_simple_types_class
  • comments in tests/examples/test_dataclass_example.py
  • tests/test_serialization.py:test_pickle_untyped

Edit: I've added a few more items to the list above
-- Jasha

@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 22, 2021

Don't worry about the parser for now -- we can publish a followup PR if we decide to change that later.
The idea is that arguments to omegaconf's interpolations and resolvers get parsed according to a grammar that is defined in the files omegaconf/grammar/*.g4.
For example, the string ${oc.decode:abc} would be parsed and decoded into a str, while ${oc.decode:123.456} would be decoded into a float. My question above was about if there is anything that should be decoded into bytes... But this is a question that can be dealt with later.

tests/test_errors.py Outdated Show resolved Hide resolved
@pedropgusmao
Copy link
Contributor Author

pedropgusmao commented Dec 30, 2021

Hi @Jasha10 , I hope you are doing fine.
Please take a look at the tests I've created so far and let me know if those need improving.
Also, when creating Bytes tests for tests/test_to_yaml.py:test_to_yaml, I have noticed that pyyaml doesn't like to use bytes as keys.

Any ideas on how we should adapt this, or if bytes as keys are really necessary?
It works.

@pedropgusmao
Copy link
Contributor Author

Hi @Jasha10 ,
Hope you are doing well and Happy New Year!
I was wondering if you had time to go over the tests I've written so far.
Thanks!

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 8, 2022

Apologies for the late reply!
I will take a look at it shortly.
Happy New Year to you too :)

tests/test_utils.py Outdated Show resolved Hide resolved
@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 28, 2022

Looking good, @pedropgusmao!
After some consideration, @pixelb and I feel best about not doing any conversion between str and bytes. Not doing automatic conversion will help avoid surprising behavior and bugs.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 28, 2022

FYI I'm planning to push a few more commits to round-out the testing.

tests/test_utils.py Outdated Show resolved Hide resolved
@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 30, 2022

I've rebased to incorporate some changes from the master branch.

Additionally, I've added two commits.

The first commit (32c0f9d d6e66b7) removes all conversion between strings and bytes.

from dataclasses import dataclass
from omegaconf import OmegaConf

@dataclass
class Conf:
    s: str = "abc"
    x: bytes = b"binary"

cfg = OmegaConf.create(Conf)

cfg.s = b"bin"  # previously performed conversion, now this throws ValidationError
cfg.x = "string"  # previously performed conversion, now this throws ValidationError

The second commit (64eeeab 86b1506) adds more tests for BytesNode and polishes the tests and documentation.

@Jasha10 Jasha10 requested a review from pixelb January 30, 2022 08:59
@pedropgusmao
Copy link
Contributor Author

Thanks for this, @Jasha10
Is there anything else I can do on my side to help with this?
Also, just to clarify things a bit for me and to others who might read this PR; after we apply these changes, does it mean that :

  • Calls to instantiate/ call that require Bytes to be passed as arguments will be allowed?
  • Bytes will be supported, but not from yaml files. Or at least not encouraged?
  • Conversions between str and Bytes will not be supported.
    Thanks

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 30, 2022

Thanks for this, @Jasha10 Is there anything else I can do on my side to help with this?

Nope, I am happy with the shape of the PR.


Also, just to clarify things a bit for me and to others who might read this PR; after we apply these changes, does it mean that :

  • Calls to instantiate/ call that require Bytes to be passed as arguments will be allowed?

Yes, Hydra+instantiate will support binary (bytes) values.

>>> instantiate({"_target_": "builtins.dict", "abc": b"123"})
{'abc': b'123'}
>>> instantiate({"_target_": "builtins.dict"}, abc=b"123")
{'abc': b'123'}

Binary data can be passed as a positional argument using _args_:

>>> instantiate({"_target_": "builtins.print", "_args_": [b"binary_arg", "foo"]})
b'binary_arg' foo

Binary keys are not supported by instantiate:

>>> instantiate({"_target_": "builtins.dict", b"binary_key": "foo"})
...
TypeError: keywords must be strings

  • Bytes will be supported, but not from yaml files.

Bytes will be supported in yaml files via yaml's !!binary tag:

>>> cfg = OmegaConf.create({"x": b"binary_data"})
>>> yml = OmegaConf.to_yaml(cfg)
>>> print(yml)
x: !!binary |
  YmluYXJ5X2RhdGE=

>>> OmegaConf.save(cfg, "x.yaml")
>>> print(open("x.yaml", "r").read())
x: !!binary |
  YmluYXJ5X2RhdGE=

>>> OmegaConf.load("x.yaml")
{'x': b'binary_data'}

This will work when Hydra loads yaml files too.


Or at least not encouraged?

Storing large binary data will not be encouraged.

The Hydra+OmegaConf stack is primarily intended as a human interface rather than as an arbitrary data store. In particular, OmegaConf and Hydra are not tuned to work with big data; they may make copies of data when performing operations such as config creation and merging. Also, yaml files use base64 to encode binary data, which is less memory-efficient than something like pickle or numpy.save.

Note that you can always use Hydra's instantiate API to invoke e.g. numpy.load when you need to get big data from disk.

big_data = instantiate({"_target_": "numpy.load", "file": "my_big_data.npy"})

For small blocks of binary data, there is BytesNode :)


  • Conversions between str and Bytes will not be supported.

Yes, automatic conversion will not be supported. Users will need to manually convert using python code or using a custom resolver.

Thanks

Thank you!!

@Jasha10 Jasha10 requested a review from jieru-hu January 31, 2022 03:35
@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 31, 2022

I've force-pushed again after doing an interactive rebase to drop commit f4d4ec5 from this PR. (The dropped commit would undo the diff from PR #853 which is now merged into master).

@Jasha10
Copy link
Collaborator

Jasha10 commented Mar 4, 2022

Whoops, I closed this PR on accident, and now it seems I'm unable to reopen it. I've opened PR #872 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have bytes as supported value type.
3 participants