Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience Methods #183

Merged
merged 8 commits into from
Mar 7, 2020
Merged

Convenience Methods #183

merged 8 commits into from
Mar 7, 2020

Conversation

rlizzo
Copy link
Member

@rlizzo rlizzo commented Mar 5, 2020

Motivation and Context

Why is this change required? What problem does it solve?:

Added numerous convenience methods for general usage, major changes to the checkout column data reader API to allow support of arbitrary column layouts.

Description

Describe your changes in detail:

  • Added log method to checkout instances
  • added diff method to Repository class
  • added CLI diff command
API Changes to Checkout __getitem__() and get() methods.

Checkout object can be thought of as a "dataset" ("dset") mapping a
view of samples across columns.

>>> dset = repo.checkout(branch='master')
>>>
# Get an column contained in the checkout.
>>> dset['foo']
ColumnDataReader
>>>
# Get a specific sample from ``'foo'`` (returns a single array)
>>> dset['foo', '1']
np.array([1])
>>>
# Get multiple samples from ``'foo'`` (returns a list of arrays, in order
# of input keys)
>>> dset[['foo', '1'], ['foo', '2'],  ['foo', '324']]
[np.array([1]), np.ndarray([2]), np.ndarray([324])]
>>>
# Get sample from multiple columns, column/data returned is ordered
# in same manner as input of func.
>>> dset[['foo', '1'], ['bar', '1'],  ['baz', '1']]
[np.array([1]), np.ndarray([1, 1]), np.ndarray([1, 1, 1])]
>>>
# Get multiple samples from multiple columns\
>>> keys = [(col, str(samp)) for samp in range(2) for col in ['foo', 'bar']]
>>> keys
[('foo', '0'), ('bar', '0'), ('foo', '1'), ('bar', '1')]
>>> dset[keys]
[np.array([1]), np.array([1, 1]), np.array([2]), np.array([2, 2])]

Arbitrary column layouts are supported by simply adding additional members
to the keys for each piece of data. For example, getting data from a column
with a nested layout:

>> dset['nested_col', 'sample_1', 'subsample_0']
np.array([1, 0])
>>>
# a sample accessor object can be retrieved at will...
>>> dset['nested_col', 'sample_1']
<class 'FlatSubsampleReader'>(column_name='nested_col', sample_name='sample_1')
>>>
# to get all subsamples in a nested sample use the Ellipsis operator
>>> dset['nested_col', 'sample_1', ...]
{'subsample_0': np.array([1, 0]),
 'subsample_1': np.array([1, 1]),
 ...
 'subsample_n': np.array([1, 255])}

Retrieval of data from different column types can be mixed and combined
as desired. For example, retrieving data from both flat and nested columns
simultaneously:

>>> dset[('nested_col', 'sample_1', '0'), ('foo', '0')]
[np.array([1, 0]), np.array([0])]
>>> dset[('nested_col', 'sample_1', ...), ('foo', '0')]
[{'subsample_0': np.array([1, 0]), 'subsample_1': np.array([1, 1])},
 np.array([0])]
>>> dset[('foo', '0'), ('nested_col', 'sample_1')]
[np.array([0]),
 <class 'FlatSubsampleReader'>(column_name='nested_col', sample_name='sample_1')]

If a column or data key does not exist, then this method will raise a KeyError.
As an alternative, missing keys can be gracefully handeled by calling :meth:get()
instead. This method does not (by default) raise an error if a key is missing.
Instead, a (configurable) default value is simply inserted in it's place.

>>> dset['foo', 'DOES_NOT_EXIST']
-------------------------------------------------------------------
KeyError                           Traceback (most recent call last)
<ipython-input-40-731e6ea62fb8> in <module>
----> 1 res = co['foo', 'DOES_NOT_EXIST']
KeyError: 'DOES_NOT_EXIST'

Screenshots (if appropriate):

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Documentation update
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Is this PR ready for review, or a work in progress?

  • Ready for review
  • Work in progress

How Has This Been Tested?

Put an x in the boxes that apply:

  • Current tests cover modifications made
  • New tests have been added to the test suite
  • Modifications were made to existing tests to support these changes
  • Tests may be needed, but they are not included when the PR was proposed
  • I don't know. Help!

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have signed (or will sign when prompted) the tensorwork CLA.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@rlizzo rlizzo added the enhancement New feature or request label Mar 5, 2020
@rlizzo rlizzo added this to the v0.5.0 milestone Mar 5, 2020
@rlizzo rlizzo self-assigned this Mar 5, 2020
@codecov
Copy link

codecov bot commented Mar 5, 2020

Codecov Report

Merging #183 into master will decrease coverage by 0.19%.
The diff coverage is 91.72%.

@@            Coverage Diff             @@
##           master     #183      +/-   ##
==========================================
- Coverage   95.25%   95.06%   -0.19%     
==========================================
  Files          97       98       +1     
  Lines       16175    15954     -221     
  Branches     1547     1539       -8     
==========================================
- Hits        15407    15166     -241     
- Misses        525      537      +12     
- Partials      243      251       +8
Impacted Files Coverage Δ
src/hangar/typesystem/__init__.py 100% <ø> (ø) ⬆️
src/hangar/columns/constructors.py 90.59% <ø> (ø) ⬆️
src/hangar/diff.py 96.07% <0%> (-0.85%) ⬇️
src/hangar/constants.py 100% <100%> (ø) ⬆️
src/hangar/columns/__init__.py 100% <100%> (ø) ⬆️
src/hangar/records/summarize.py 93.94% <100%> (+0.61%) ⬆️
src/hangar/columns/column.py 100% <100%> (ø) ⬆️
src/hangar/columns/common.py 95.08% <100%> (ø) ⬆️
tests/test_diff.py 99.74% <100%> (+0.05%) ⬆️
src/hangar/utils.py 95.83% <100%> (+0.09%) ⬆️
... and 13 more

@rlizzo rlizzo requested a review from hhsecond March 6, 2020 08:36
@rlizzo
Copy link
Member Author

rlizzo commented Mar 6, 2020

@hhsecond please review.

@rlizzo rlizzo merged commit 27c66f7 into tensorwerk:master Mar 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant