Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose _to_temp_dataset / _from_temp_dataset as semi-public API? #4837

Open
keewis opened this issue Jan 21, 2021 · 5 comments · May be fixed by #4863
Open

expose _to_temp_dataset / _from_temp_dataset as semi-public API? #4837

keewis opened this issue Jan 21, 2021 · 5 comments · May be fixed by #4863

Comments

@keewis
Copy link
Collaborator

keewis commented Jan 21, 2021

When writing accessors which behave the same for both Dataset and DataArray, it would be incredibly useful to be able to use DataArray._to_temp_dataset / DataArray._from_temp_dataset to deduplicate code. Is it safe to use those in external packages (like pint-xarray)?

Otherwise I guess it would be possible to use

name = da.name if da.name is None else "__temp"
temp_ds = da.to_dataset(name=name)
new_da = temp_ds[name]
if da.name is None:
    new_da = new_da.rename(da.name)
assert_identical(da, new_da)

but that seems less efficient.

@dcherian
Copy link
Contributor

Related: There's some weirdness about _to_temp_dataset not preserving name. I had to work around that for map_blocks

def dataset_to_dataarray(obj: Dataset) -> DataArray:
if not isinstance(obj, Dataset):
raise TypeError("Expected Dataset, got %s" % type(obj))
if len(obj.data_vars) > 1:
raise TypeError(
"Trying to convert Dataset with more than one data variable to DataArray"
)
return next(iter(obj.data_vars.values()))
def dataarray_to_dataset(obj: DataArray) -> Dataset:
# only using _to_temp_dataset would break
# func = lambda x: x.to_dataset()
# since that relies on preserving name.
if obj.name is None:
dataset = obj._to_temp_dataset()
else:
dataset = obj.to_dataset()
return dataset

@keewis
Copy link
Collaborator Author

keewis commented Jan 21, 2021

that's true. I guess for that there's the name parameter to _from_temp_dataset, which of course can't be used if _to_temp_dataset and _from_temp_dataset are called in different functions.

@max-sixty
Copy link
Collaborator

That seems very reasonable.

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that. Does that make sense?

@keewis
Copy link
Collaborator Author

keewis commented Jan 21, 2021

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that.

yes, I think that would work, too.

@shoyer
Copy link
Member

shoyer commented Jan 22, 2021

Related: There's some weirdness about _to_temp_dataset not preserving name.

It used to preserve name, but now name is always set to the _THIS_ARRAY object. This is for two reasons:

  1. It isn't always possible to convert a DataArray into a Dataset preserving name, if name is also found on a coordinate.
  2. To ensure that xarray functions always work, even in this case, it was more reliable to always replace the name. If we only replaced name in case (1), then it's likely that some functions wouldn't handle names of that form and would produce an error only in that case.

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that. Does that make sense?

This sounds like a fine idea to me. It's kind of the opposite of Dataset.map.

@keewis keewis linked a pull request Feb 5, 2021 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants