Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full xarray.Dataset support #3

Closed
smithara opened this issue Oct 15, 2018 · 2 comments
Closed

Full xarray.Dataset support #3

smithara opened this issue Oct 15, 2018 · 2 comments
Labels
enhancement New feature or request

Comments

@smithara
Copy link
Member

smithara commented Oct 15, 2018

Currently only scalar and 3-vectors are handled by the translation from CDF to xarray.Dataset (here). 3-vectors are given a dimensions label "dim", and I just implemented that to handle the MAG-B_NEC data.

The proper solution is that every dimension in the data is given an appropriate label - I am not sure if this information is in the original CDF files, otherwise it will just have to be hard-coded in for every variable. It would make sense to do this on the server and build and send a netCDF from the server instead.

The same applies to adding metadata (units etc - e.g. cdf.varattsget("F") -> {'DESCRIPTION': 'Magnetic field intensity', 'UNITS': 'nT'}, and global attributes for ORIGINAL_PRODUCT_NAMES, MAGNETIC_MODELS ...). This is particularly useful as it will be used by xarray for plotting: http://xarray.pydata.org/en/stable/plotting.html#one-dimension

The xarray.Dataset/netCDF (xarray.Dataset is a direct mapping to a netCDF file) should probably follow the netCDF-CF conventions - this is in line with Aeolus (I think).

See also: http://xarray.pydata.org/en/stable/faq.html#what-is-your-approach-to-metadata

It would also be good to look at making the xarray.Dataset creation faster. The main slowdown is probably the pandas.to_datetime() call (same applies to the pandas.Dataframe conversion). Also, with very large datasets where xarray.concat() is used, it is very slow - I found that a file of a few GB took longer than 30 minutes to create the xarray.Dataset. This is further justification to build this on the server instead.

In the (probably far) future, I think we can make use of sparse xarray so that the (empty except for one point) Lat/Lon/Rad dimensions can be filled in, instead of just using a "flat" time series, so that we build a "data cube" and 2D plotting and other things can be done directly. (I could be wrong here, or most likely there is some other way to achieve this)

@smithara smithara added the enhancement New feature or request label Oct 15, 2018
@smithara
Copy link
Member Author

smithara commented Nov 29, 2018

viresclient currently requires xarray < 0.11 as there may be some incompatible changes that I haven't checked. Update to allow v.011 - See http://xarray.pydata.org/en/stable/whats-new.html

Update: higher versions (<1.0) of xarray are allowable since viresclient v0.4

@smithara
Copy link
Member Author

Meta data is now included in the produced xarray.Dataset.
Global attributes (accessible as ds.attrs): "Sources", "MagneticModels", "RangeFilters"
Variable attributes (ds[x].attrs): "units", "description"

Multi-dimensional variables are now set up with appropriate xarray dimensions and coordinate labels (

# Frame names to use as xarray dimension names
)

I have not yet made improvements to the loading speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant