Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce() got multiple values for keyword argument 'dim' #3501

Closed
linlintamu opened this issue Nov 8, 2019 · 9 comments · Fixed by #3513
Closed

reduce() got multiple values for keyword argument 'dim' #3501

linlintamu opened this issue Nov 8, 2019 · 9 comments · Fixed by #3513
Labels

Comments

@linlintamu
Copy link

MCVE Code Sample

geo5=xr.open_dataset(dir_path+'500GPH/hgt.mon.mean.nc').sel(time=slice('1981-12-01','2010-02-01'), lat=slice(40.,-40.), lon=slice(140,290), level=500.)


### probably problems originate here
filenames = sorted(glob.glob(dir_path+'t2m/air.2m.gauss.*'))
t2m2= xr.open_mfdataset(filenames, concat_dim='time', combine='by_coords').sel(time=slice('1981-12-01','2010-02-28'), 
                                    lat=slice(40.,-40.), lon=slice(140,290)).resample(time='1M').mean(dim='time')

print(geo5)
<xarray.Dataset>
Dimensions:  (lat: 33, lon: 61, time: 339)
Coordinates:
    level    float32 500.0
  * lat      (lat) float32 40.0 37.5 35.0 32.5 30.0 ... -32.5 -35.0 -37.5 -40.0
  * lon      (lon) float32 140.0 142.5 145.0 147.5 ... 282.5 285.0 287.5 290.0
  * time     (time) datetime64[ns] 1981-12-01 1982-01-01 ... 2010-02-01
Data variables:
    hgt      (time, lat, lon) float32 ...
Attributes:
    description:     Data from NCEP initialized reanalysis (4x/day).  These a...
    platform:       Model
    Conventions:    COARDS
    NCO:            20121012
    history:        Created by NOAA-CIRES Climate Diagnostics Center (SAC) fr...
    title:          monthly mean hgt from the NCEP Reanalysis
    References:     http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reana...
    dataset_title:  NCEP-NCAR Reanalysis 1


print(t2m2)
<xarray.Dataset>
Dimensions:    (lat: 42, lon: 80, nbnds: 2, time: 339)
Coordinates:
  * time       (time) datetime64[ns] 1981-12-31 1982-01-31 ... 2010-02-28
  * lon        (lon) float32 140.625 142.5 144.375 ... 285.0 286.875 288.75
  * lat        (lat) float32 39.047 37.1422 35.2375 ... -37.1422 -39.047
Dimensions without coordinates: nbnds
Data variables:
    air        (time, lat, lon) float32 dask.array<chunksize=(1, 42, 80), meta=np.ndarray>
    time_bnds  (time, nbnds) float64 dask.array<chunksize=(1, 2), meta=np.ndarray>


geo5_res= geo5.where(geo5['time.season']=='DJF')
geo5_djf= geo5_res.rolling(min_periods=3, center=True, time=3).mean(dim='time').dropna('time')

t2m2_res = t2m2.where(t2m2['time.season']=='DJF')
t2m2_djf = t2m2_res.rolling(time=3, min_periods=3,center=True).mean(dim='time').dropna('time')

Expected Output

geo5_djf
<xarray.Dataset>
Dimensions: (lat: 33, lon: 61, time: 29)
Coordinates:
level float32 500.0

  • lat (lat) float32 40.0 37.5 35.0 32.5 30.0 ... -32.5 -35.0 -37.5 -40.0
  • lon (lon) float32 140.0 142.5 145.0 147.5 ... 282.5 285.0 287.5 290.0
  • time (time) datetime64[ns] 1982-01-01 1983-01-01 ... 2010-01-01
    Data variables:
    hgt (time, lat, lon) float32 5347.3706 5347.1294 ... 5696.4443

Problem Description

When I process the t2m2 to seasonal Dec-Jan-Feb mean, I encounter one very strange error.
The error is associated with 'dim'.

Personally, I guess the variable 't2m2' is different from variable 'geo5' only because 't2m2' originally is daily mean data, I processes them to be monthly data, as you can see in the code part.

Probably this leads to the error. But I am not able to tell why since the data structure is almost the same as the one of 'geo5'.
I apply exactly the same method from processing 'geo5' to 't2m2'.
Wierd enough is the method works fine for 'geo5' but fails for 't2m2'.

Is any one can help with this super annoying error?
The error seems like the mean(dim='time') has been set implicitly and I wrote mean(dim='time') again.

please find attached website for dealing with the rolling.
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.rolling.html


TypeError Traceback (most recent call last)
in
15 t2m2_res = t2m2.where(t2m2['time.season']=='DJF')
16
---> 17 t2m2_djf = t2m2_res.air.rolling(time=3, min_periods=3,center=True).mean(dim='time').dropna('time')

//anaconda3/lib/python3.7/site-packages/xarray/core/rolling.py in method(self, **kwargs)
127 def method(self, **kwargs):
128 return self._numpy_or_bottleneck_reduce(
--> 129 array_agg_func, bottleneck_move_func, **kwargs
130 )
131

//anaconda3/lib/python3.7/site-packages/xarray/core/rolling.py in _numpy_or_bottleneck_reduce(self, array_agg_func, bottleneck_move_func, **kwargs)
381 return self._bottleneck_reduce(bottleneck_move_func, **kwargs)
382 else:
--> 383 return self.reduce(array_agg_func, **kwargs)
384
385

//anaconda3/lib/python3.7/site-packages/xarray/core/rolling.py in reduce(self, func, **kwargs)
297 rolling_dim = utils.get_temp_dimname(self.obj.dims, "_rolling_dim")
298 windows = self.construct(rolling_dim)
--> 299 result = windows.reduce(func, dim=rolling_dim, **kwargs)
300
301 # Find valid windows based on count.

TypeError: reduce() got multiple values for keyword argument 'dim'

Output of xr.show_versions()

# Paste the output here xr.show_versions() here
@dcherian
Copy link
Contributor

dcherian commented Nov 9, 2019

The error seems like the mean(dim='time') has been set implicitly and I wrote mean(dim='time') again.

If you make it rolling(...).mean() instead of rolling().mean(dim="time'), it should work

@linlintamu
Copy link
Author

Thank you very much!!

Could you please tell me where exactly did the rolling(...) set the mean dimension implicitly?

Thank you again!

@max-sixty
Copy link
Collaborator

Should raise an error on rolling()? Is there ever a use case for nothing being passed in there?

@dcherian
Copy link
Contributor

dcherian commented Nov 9, 2019

Could you please tell me where exactly did the rolling(...) set the mean dimension implicitly?

You've asked it to create the rolling object along dimension time (.rolling(time=3, ...)) so that's the dimension the reduction operation acts on.

Should raise an error on rolling()

I hope it does ;) . I skipped a few characters typing out my response...

@linlintamu
Copy link
Author

I agree that this looks like a silly question at first glance.
Why I ask this question is because I applied exactly the same method to the other variables I processes, like 'geo5' and others, please refer to the code attached.
And this method just worked very fine for the other variables except 't2m2' even though I set rolling(time=3,...).mean(dim='time'), but only failed for the variable 't2m2' and said 'dim' was set implicitly.

So why 'dim' is not set implicitly in the rolling(time=3,...).mean(dim=time') of other variables?

Thank you very much,
Lin

@dcherian
Copy link
Contributor

You are right. If you could make a simpler reproducible example with dummy data, I could look at it.

@dcherian dcherian reopened this Nov 11, 2019
@keewis
Copy link
Collaborator

keewis commented Nov 11, 2019

I just tried to reproduce this, which leads me to consider this a problem with dask:

import numpy as np
import pandas as pd
import xarray as xr

time = pd.date_range("1982-01-31", "2010-02-28", freq="M") 
lat = np.linspace(-40, 40, 42) 
lon = np.linspace(140, 290, 80) 
air = np.random.rand(len(time), len(lat), len(lon)) 
time_nbds = np.random.rand(len(time), 2)

ds = xr.Dataset(
    {
        "air": (("time", "lat", "lon"), air),
        "time_nbds": (("time", "nbds"), time_nbds),
    },
    coords={"lat": lat, "lon": lon, "time": time},
)
ds_dask = ds.chunk({"time": 1, "lat": 42, "lon": 80, "nbds": 2})

def rolling(ds):
    return ( 
        ds.where(ds["time.season"] == "DJF") 
        .rolling(time=3, min_periods=3, center=True) 
        .mean(dim='time') 
        .dropna('time') 
    )

print(rolling(ds))
print("---")
print(rolling(ds_dask))

prints as

<xarray.Dataset>
Dimensions:    (lat: 42, lon: 80, nbds: 2, time: 28)
Coordinates:
  * lat        (lat) float64 -40.0 -38.05 -36.1 -34.15 ... 34.15 36.1 38.05 40.0
  * lon        (lon) float64 140.0 141.9 143.8 145.7 ... 284.3 286.2 288.1 290.0
  * time       (time) datetime64[ns] 1983-01-31 1984-01-31 ... 2010-01-31
Dimensions without coordinates: nbds
Data variables:
    air        (time, lat, lon) float64 0.576 0.2662 0.7418 ... 0.5447 0.6263
    time_nbds  (time, nbds) float64 0.7143 0.397 0.7276 ... 0.577 0.8224 0.6742
---
TypeError: reduce() got multiple values for keyword argument 'dim'

@dcherian dcherian added the bug label Nov 11, 2019
@linlintamu
Copy link
Author

Moreover, when I try to do something like rolling(time=3, min_period=3, center=True).mean(), without assigning anything in mean(),
some times I will get the seasonal mean say Dec-Jan-Feb mean of every year (time dim !=0), but some times I will get everything mean over time dimension (time=0), means all years Dec-Jan-Feb will be averaged altogether.

I am so confused at using xarray and the corresponding functions...

Thanks,
Lin

@keewis
Copy link
Collaborator

keewis commented Nov 11, 2019

is that in the same situation as above where one dataset works as expected and the other doesn't, or does it differ between runs of the same code with the same data?

Edit: what I would be interested in is an example for the second issue because I honestly can't figure out how to reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants