From 7e33010af5f970c85573dae0bc98678c609ddca2 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 14 Jan 2021 15:38:52 +0100 Subject: [PATCH 01/52] documentation first draft --- doc/internals.rst | 91 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) diff --git a/doc/internals.rst b/doc/internals.rst index 60d32128c60..379c4d1b525 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -231,3 +231,94 @@ re-open it directly with Zarr: zgroup = zarr.open("rasm.zarr") print(zgroup.tree()) dict(zgroup["Tair"].attrs) + + +How to add a new backend +------------------------ + +Adding a new backend for read support to Xarray is easy, and you don't need to integrate your code in Xarray. +All you need to do is to: + +- Implement a function that returns an ``xarry.Dataset`` + +- Create a `BackendEntrypoint`` instance with your function as input. + +- Declare such instance as external plugin in your setup.py. + +``BackendEntrypoint` class is the main interface with Xarray, +it's a container of attributes and functions to be implemented by the backend: + +- ``open_dataset`` + +- [``open_dataset_parameters``] + +- [``guess_can_open``] + +While ``open_dataset`` is mandatory, ``open_dataset_parameters`` and ``guess_can_open`` are optional. + + +BackendEntrypoint.open_dataset +++++++++++++++++++++++++++++++ + +**Inputs** + +``BackendEntrypoint.open_dataset`` function shall take in input one argument, ``filename`` and one keyword argument ``drop_variables``: + +- ``filename`` may be a string containg a relative path, or the an instance of ``pathlib.Path``. +- ``drop_variables`` may be `None` or a iterable containing the variables names to be dropped in reading the data. + +It may also take in input a set of keyword arguments, that will be passed from Xarray :py:func:`open_dataset` +directly to the backend ``BackendEntrypoint.open_dataset``. +Currently in Xarray :py:func:`open_dataset` there are two group of arguments will be passed to the backend. +The first one are the **decoders**, explicity defined in Xarray :py:func:`open_dataset` signature: + +- ``mask_and_scale=None`` +- ``decode_times=None`` +- ``decode_timedelta=None`` +- ``use_cftime=None`` +- ``concat_characters=None`` +- ``decode_coords=None`` + +They will be passed to the backend only if the user will pass explicity a value different from `None`. +These parameters can be enabled/disabled by by the User, setting the keyword ``decode_cf`` managed by Xarray. +The backend can implement these specific decoders keywords arguments, +and it is desiderable if this makes sense for the specific backend. For more details see **decoders** sub-section. + + +The second one can be passed by the user in a dictionary inside ``backend_kwargs`` or explicity as keyword arguments ``**kwargs``. +They will be grouped together and passed to the backend as keyword arguments. + +**Output** + +```BackendEntrypoint`.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset` +that implements an additional method ``close``, used by Xarray to ensure that the related files are closed. +If don't want to support the lazy loading and writing, then your work is almost done. + +BackendEntrypoint.open_dataset_parameters ++++++++++++++++++++++++++++++++++++++++++ +``open_dataset_parameters``is the list of ``BackendEntrypoint.open_dataset`` parameters. +It is needed to enable/disable the decoders supported by the backend when the User set explicity ``decode_cf``. For this +reason all the decoders supported by the backend must be explicity declared in the signature. +``open_dataset_parameters`` it is no mandatory and if it is not provided xarray will inspect the signature of +``BackendEntrypoint.open_dataset` and it will create ``open_dataset_parameters``. +However, the signature inspection will not support `**kwargs` and `*args` are in the signature and in this case it will +raise an error. + +BackendEntrypoint.guess_can_open ++++++++++++++++++++++++++++++++++++++++++ + +How to support Lazy Loading ++++++++++++++++++++++++++++ + +Decoders +++++++++ +- strings.CharacterArrayCoder() +- strings.EncodedStringCoder() +- variables.UnsignedIntegerCoder() +- variables.CFMaskCoder() +- variables.CFScaleOffsetCoder() +- times.CFTimedeltaCoder() +- times.CFDatetimeCoder(use_cftime=use_cftime) + +How to register a backend ++++++++++++++++++++++++++ From 75544c9cbedc07c2457ad72f0f6e6364a899b4c1 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 14 Jan 2021 16:07:58 +0100 Subject: [PATCH 02/52] documentation update --- doc/internals.rst | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 379c4d1b525..d8015b772b8 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -239,7 +239,7 @@ How to add a new backend Adding a new backend for read support to Xarray is easy, and you don't need to integrate your code in Xarray. All you need to do is to: -- Implement a function that returns an ``xarry.Dataset`` +- Implement a function that returns an instance :py:class:``Dataset`` - Create a `BackendEntrypoint`` instance with your function as input. @@ -249,9 +249,7 @@ All you need to do is to: it's a container of attributes and functions to be implemented by the backend: - ``open_dataset`` - - [``open_dataset_parameters``] - - [``guess_can_open``] While ``open_dataset`` is mandatory, ``open_dataset_parameters`` and ``guess_can_open`` are optional. @@ -292,7 +290,7 @@ They will be grouped together and passed to the backend as keyword arguments. ```BackendEntrypoint`.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset` that implements an additional method ``close``, used by Xarray to ensure that the related files are closed. -If don't want to support the lazy loading and writing, then your work is almost done. +If don't want to support the lazy loading, then the :py:class:`Dataset` shall contain numpy.arrays and your work is almost done. BackendEntrypoint.open_dataset_parameters +++++++++++++++++++++++++++++++++++++++++ @@ -307,6 +305,8 @@ raise an error. BackendEntrypoint.guess_can_open +++++++++++++++++++++++++++++++++++++++++ + + How to support Lazy Loading +++++++++++++++++++++++++++ @@ -322,3 +322,10 @@ Decoders How to register a backend +++++++++++++++++++++++++ +Define in your setup.py (or setup.cfg) an new entrypoint with: + +- group: ``xarray.backend`` +- name: the name to be passed to :py:func:`open_dataset` as `engine``.` +- object reference: the reference to the instance of ``BackendEntrypoint`` + +See https://packaging.python.org/specifications/entry-points/#data-model for more information. From c6f64cc521321b170e189bf7046d6a19b7833867 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 14 Jan 2021 16:38:32 +0100 Subject: [PATCH 03/52] documentation update --- doc/internals.rst | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index d8015b772b8..31fbf508472 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -306,19 +306,9 @@ BackendEntrypoint.guess_can_open +++++++++++++++++++++++++++++++++++++++++ - How to support Lazy Loading +++++++++++++++++++++++++++ -Decoders -++++++++ -- strings.CharacterArrayCoder() -- strings.EncodedStringCoder() -- variables.UnsignedIntegerCoder() -- variables.CFMaskCoder() -- variables.CFScaleOffsetCoder() -- times.CFTimedeltaCoder() -- times.CFDatetimeCoder(use_cftime=use_cftime) How to register a backend +++++++++++++++++++++++++ @@ -329,3 +319,23 @@ Define in your setup.py (or setup.cfg) an new entrypoint with: - object reference: the reference to the instance of ``BackendEntrypoint`` See https://packaging.python.org/specifications/entry-points/#data-model for more information. + +Decoders +++++++++ + +The decoders implement the specific operations to transform data on-disk representation +to Xarray representation. + +Example of decode-time... + +Xarray uses internally a set of decoders needed to transform netCDF4 on disk data into a :py:class:`Dataset` following +Xarray standards. + +- strings.CharacterArrayCoder() +- strings.EncodedStringCoder() +- variables.UnsignedIntegerCoder() +- variables.CFMaskCoder() +- variables.CFScaleOffsetCoder() +- times.CFTimedeltaCoder() +- times.CFDatetimeCoder(use_cftime=use_cftime) + From 11fc28379b69c607c4050f0b3cf34c6610c0993f Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Tue, 26 Jan 2021 14:57:09 +0100 Subject: [PATCH 04/52] update documentation --- doc/internals.rst | 77 +++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 39 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 31fbf508472..4114401fd10 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -239,14 +239,13 @@ How to add a new backend Adding a new backend for read support to Xarray is easy, and you don't need to integrate your code in Xarray. All you need to do is to: -- Implement a function that returns an instance :py:class:``Dataset`` +- Create a class that inherits from BackendEntrypoint -- Create a `BackendEntrypoint`` instance with your function as input. +- Implement the method ``open_dataset`` that returns an instance :py:class:``Dataset`` -- Declare such instance as external plugin in your setup.py. +- Declare such class as an external plugin in your setup.py. -``BackendEntrypoint` class is the main interface with Xarray, -it's a container of attributes and functions to be implemented by the backend: +Your ``BackendEntrypoint` sub-class is the main interface with Xarray, and it should implement the following attributes and functions: - ``open_dataset`` - [``open_dataset_parameters``] @@ -255,20 +254,17 @@ it's a container of attributes and functions to be implemented by the backend: While ``open_dataset`` is mandatory, ``open_dataset_parameters`` and ``guess_can_open`` are optional. -BackendEntrypoint.open_dataset +open_dataset ++++++++++++++++++++++++++++++ **Inputs** -``BackendEntrypoint.open_dataset`` function shall take in input one argument, ``filename`` and one keyword argument ``drop_variables``: +The backend ``open_dataset`` method shall take in input one argument, ``filename`` and one keyword argument ``drop_variables``: -- ``filename`` may be a string containg a relative path, or the an instance of ``pathlib.Path``. -- ``drop_variables`` may be `None` or a iterable containing the variables names to be dropped in reading the data. +- ``filename`` may be a string containing a relative path, or an instance of ``pathlib.Path``. +- ``drop_variables`` may be `None` or an iterable containing the variables names to be dropped in reading the data. -It may also take in input a set of keyword arguments, that will be passed from Xarray :py:func:`open_dataset` -directly to the backend ``BackendEntrypoint.open_dataset``. -Currently in Xarray :py:func:`open_dataset` there are two group of arguments will be passed to the backend. -The first one are the **decoders**, explicity defined in Xarray :py:func:`open_dataset` signature: +It is desiderable for the backend ``open_dataset`` to implement in its interface all the folowing boolean keyword arguments, the **decoders** (if it make sense for the backend): - ``mask_and_scale=None`` - ``decode_times=None`` @@ -277,34 +273,39 @@ The first one are the **decoders**, explicity defined in Xarray :py:func:`open_d - ``concat_characters=None`` - ``decode_coords=None`` -They will be passed to the backend only if the user will pass explicity a value different from `None`. -These parameters can be enabled/disabled by by the User, setting the keyword ``decode_cf`` managed by Xarray. -The backend can implement these specific decoders keywords arguments, -and it is desiderable if this makes sense for the specific backend. For more details see **decoders** sub-section. +These keyword arguments are explicitly defined in Xarray :py:func:`open_dataset` signature and they will be passed to the backend only if the User passes from them a value different from `None`. For more details see **decoders** sub-section. +Your backend can also take in input a set of backend-specific keyword arguments. All the kwargs passed to Xarray :py:func:`open_dataset`, in the parameter ``backend_kwargs`` or explicitly as keyword arguments (``**kwargs``), will be grouped and passed to the backend as keyword arguments. -The second one can be passed by the user in a dictionary inside ``backend_kwargs`` or explicity as keyword arguments ``**kwargs``. -They will be grouped together and passed to the backend as keyword arguments. **Output** -```BackendEntrypoint`.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset` +Backend ``open_dataset`` output shall be an instance of Xarray :py:class:`Dataset` that implements an additional method ``close``, used by Xarray to ensure that the related files are closed. -If don't want to support the lazy loading, then the :py:class:`Dataset` shall contain numpy.arrays and your work is almost done. +If don't want to support the lazy loading, then the :py:class:`Dataset` shall contain NumPy.arrays and your work is almost done. -BackendEntrypoint.open_dataset_parameters -+++++++++++++++++++++++++++++++++++++++++ -``open_dataset_parameters``is the list of ``BackendEntrypoint.open_dataset`` parameters. -It is needed to enable/disable the decoders supported by the backend when the User set explicity ``decode_cf``. For this -reason all the decoders supported by the backend must be explicity declared in the signature. -``open_dataset_parameters`` it is no mandatory and if it is not provided xarray will inspect the signature of -``BackendEntrypoint.open_dataset` and it will create ``open_dataset_parameters``. -However, the signature inspection will not support `**kwargs` and `*args` are in the signature and in this case it will -raise an error. - -BackendEntrypoint.guess_can_open +Dask Chunking +++++++++++++ + +Xarray manege the dataset chunking, so the backend should deal with it. If your on-disk format is already chunked and you want to suggest the preferred chunking, then you can use the attribute ``preferred_chunks`` in each variable encoding: ``ds[variable].encoding[“preferred_chunks”] = my_preferred_chunks `. +The preferred chunking will be used in two cases. +The first one is if the User does not select specific chunking for some dimension, for example: +``chunks={}``, xarray will use the preferred_chunks for all the dimension. +``chunks={“x”: 1000}``, xarray will use the preferred chunking for all the dimensions except “x” +The second one if ``chunks=”auto”``. In this case, ``preferred_chunks`` is used to estimate the “best” chunking with dask.core.normalize_chunks. + +open_dataset_parameters +++++++++++++++++++++++++++++++++++++++++ +``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. It is not mandatory and if the backend doesn’t provide it explicity, Xarray will create it by inspecting backend signature. +Xarray uses ``open_dataset_parameters`` when needs to select only the decoders supported by the backend. +If ``open_dataset_parameters`` is not defined, Xarray will raise an error during the inspection if it finds `**kwargs` and `*args` in the signature. +If the backend will provide ``open_dataset_parameters``, then it will be possible to use `**kwargs` and `*args` in the signature. This is discouraged unless there are good reasons for using `**kwargs` or `*args`. + +guess_can_open ++++++++++++++++++++++++++++++++++++++++++ +``guess_can_open`` is used for the automatic discovering of the engines able to open the file, in case the User does not specify explicity the engine. If you are not intrested to support this feature, you can skip this step since `BackendEntrypoint`` already provide a default ``guess_can_open`` that returns always `False`. +Backend ``guess_can_open`` shall take in input the ``filename_or_obj`` :py:func:`open_dataset` parameter, and it shall return a boolean. How to support Lazy Loading +++++++++++++++++++++++++++ @@ -312,11 +313,11 @@ How to support Lazy Loading How to register a backend +++++++++++++++++++++++++ -Define in your setup.py (or setup.cfg) an new entrypoint with: +Define in your setup.py (or setup.cfg) a new entrypoint with: - group: ``xarray.backend`` - name: the name to be passed to :py:func:`open_dataset` as `engine``.` -- object reference: the reference to the instance of ``BackendEntrypoint`` +- object reference: the reference of the class that you have implemented. See https://packaging.python.org/specifications/entry-points/#data-model for more information. @@ -325,11 +326,7 @@ Decoders The decoders implement the specific operations to transform data on-disk representation to Xarray representation. - -Example of decode-time... - -Xarray uses internally a set of decoders needed to transform netCDF4 on disk data into a :py:class:`Dataset` following -Xarray standards. +Xarray uses internally a set of decoders needed to transform netCDF4 on-disk data into a :py:class:`Dataset` following Xarray standards (For more details see :py:function:`open_dataset`): - strings.CharacterArrayCoder() - strings.EncodedStringCoder() @@ -339,3 +336,5 @@ Xarray standards. - times.CFTimedeltaCoder() - times.CFDatetimeCoder(use_cftime=use_cftime) +They are grouped in higher-level function conventions.decode_cf_variables. It takes in input the list of variables to decode and the decoders to activate/deactivate. +Some transformations can be common to more backends, so before implementing a decoder, you should to be sure that is not already implemented by xarray. From 1b8ac135ce97d7c2b30e7fc6ab55f3628753c595 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Tue, 2 Feb 2021 02:17:34 +0100 Subject: [PATCH 05/52] update backend documentation --- doc/internals.rst | 84 +++++++++++++++++------------------------------ 1 file changed, 30 insertions(+), 54 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 4114401fd10..3f1a12453ca 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -236,35 +236,33 @@ re-open it directly with Zarr: How to add a new backend ------------------------ -Adding a new backend for read support to Xarray is easy, and you don't need to integrate your code in Xarray. -All you need to do is to: +Adding a new backend for read support to Xarray is easy, and does not require to integrate any code in Xarray; all you need to do is approaching the following steps: -- Create a class that inherits from BackendEntrypoint +- Create a class that inherits from ``BackendEntrypoint`` -- Implement the method ``open_dataset`` that returns an instance :py:class:``Dataset`` +- Implement the method :py:func:``open_dataset`` that returns an instance of :py:class:``Dataset`` -- Declare such class as an external plugin in your setup.py. +- Declare such a class as an external plugin in your setup.py. -Your ``BackendEntrypoint` sub-class is the main interface with Xarray, and it should implement the following attributes and functions: +Your ``BackendEntrypoint`` sub-class is the ~~main~~ primary interface with Xarray, and it should implement the following attributes and functions: -- ``open_dataset`` -- [``open_dataset_parameters``] -- [``guess_can_open``] - -While ``open_dataset`` is mandatory, ``open_dataset_parameters`` and ``guess_can_open`` are optional. +- ``open_dataset`` (mandatory) +- [``open_dataset_parameters``] (optional) +- [``guess_can_open``] (optional) +These are detailed in the following. open_dataset -++++++++++++++++++++++++++++++ +++++++++++++ **Inputs** -The backend ``open_dataset`` method shall take in input one argument, ``filename`` and one keyword argument ``drop_variables``: +The backend ``open_dataset`` method takes as input one argument (``filename``), and one keyword argument (``drop_variables``): -- ``filename`` may be a string containing a relative path, or an instance of ``pathlib.Path``. -- ``drop_variables`` may be `None` or an iterable containing the variables names to be dropped in reading the data. +- ``filename``: can be a string containing a relative path or an instance of ``pathlib.Path``. +- ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. -It is desiderable for the backend ``open_dataset`` to implement in its interface all the folowing boolean keyword arguments, the **decoders** (if it make sense for the backend): +If it makes sense for your backend, :py:func:`open_dataset` method should implement in its interface all the following boolean keyword arguments, called **decoders** which default to ``None``: - ``mask_and_scale=None`` - ``decode_times=None`` @@ -273,39 +271,30 @@ It is desiderable for the backend ``open_dataset`` to implement in its interface - ``concat_characters=None`` - ``decode_coords=None`` -These keyword arguments are explicitly defined in Xarray :py:func:`open_dataset` signature and they will be passed to the backend only if the User passes from them a value different from `None`. For more details see **decoders** sub-section. - -Your backend can also take in input a set of backend-specific keyword arguments. All the kwargs passed to Xarray :py:func:`open_dataset`, in the parameter ``backend_kwargs`` or explicitly as keyword arguments (``**kwargs``), will be grouped and passed to the backend as keyword arguments. +These keyword arguments are explicitly defined in Xarray :py:func:`open_dataset` signature. Xarray will pass them to the backend only if the User sets a value different from ``None`` explicitly. +Your backend can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to Xarray :py:func`open_dataset` grouped either via the ``backend_kwarg`` parameter or explicitly using the syntax ``**kwargs``. **Output** -Backend ``open_dataset`` output shall be an instance of Xarray :py:class:`Dataset` -that implements an additional method ``close``, used by Xarray to ensure that the related files are closed. -If don't want to support the lazy loading, then the :py:class:`Dataset` shall contain NumPy.arrays and your work is almost done. +The output of the backend :py:func:`open_dataset` shall be an instance of Xarray :py:class:`Dataset` that implements the additional method ``close``, used by Xarray to ensure the related files are eventually closed. +If you don't want to support the lazy loading, then the :py:class:`Dataset` shall contain ``NumPy.arrays`` and your work is almost done. -Dask Chunking -++++++++++++ +Dask chunking ++++++++++++++ -Xarray manege the dataset chunking, so the backend should deal with it. If your on-disk format is already chunked and you want to suggest the preferred chunking, then you can use the attribute ``preferred_chunks`` in each variable encoding: ``ds[variable].encoding[“preferred_chunks”] = my_preferred_chunks `. -The preferred chunking will be used in two cases. -The first one is if the User does not select specific chunking for some dimension, for example: -``chunks={}``, xarray will use the preferred_chunks for all the dimension. -``chunks={“x”: 1000}``, xarray will use the preferred chunking for all the dimensions except “x” -The second one if ``chunks=”auto”``. In this case, ``preferred_chunks`` is used to estimate the “best” chunking with dask.core.normalize_chunks. open_dataset_parameters -+++++++++++++++++++++++++++++++++++++++++ -``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. It is not mandatory and if the backend doesn’t provide it explicity, Xarray will create it by inspecting backend signature. -Xarray uses ``open_dataset_parameters`` when needs to select only the decoders supported by the backend. -If ``open_dataset_parameters`` is not defined, Xarray will raise an error during the inspection if it finds `**kwargs` and `*args` in the signature. - -If the backend will provide ``open_dataset_parameters``, then it will be possible to use `**kwargs` and `*args` in the signature. This is discouraged unless there are good reasons for using `**kwargs` or `*args`. ++++++++++++++++++++++++ +``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. Its use is not mandatory, and if the backend does not provide it explicitly, Xarray creates a list of them automatically by inspecting the backend signature. +Xarray uses ``open_dataset_parameters`` only when it needs to select the **decoders** supported by the backend. +If ``open_dataset_parameters`` is not defined, but `**kwargs` and `*args` have been passed to the signature, Xarray raises an error. +On the other hand, if the backend provides the ``open_dataset_parameters``, then `**kwargs` and `*args` can be used in the signature. However, this practice is discouraged unless there is a good reasons for using `**kwargs` or `*args`. guess_can_open -+++++++++++++++++++++++++++++++++++++++++ -``guess_can_open`` is used for the automatic discovering of the engines able to open the file, in case the User does not specify explicity the engine. If you are not intrested to support this feature, you can skip this step since `BackendEntrypoint`` already provide a default ``guess_can_open`` that returns always `False`. -Backend ``guess_can_open`` shall take in input the ``filename_or_obj`` :py:func:`open_dataset` parameter, and it shall return a boolean. +++++++++++++++ +``guess_can_open`` is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since `BackendEntrypoint` already provides a default ``guess_can_open`` that always returns `False`. +Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of :py:func:`open_dataset`, and returns a boolean. How to support Lazy Loading +++++++++++++++++++++++++++ @@ -319,22 +308,9 @@ Define in your setup.py (or setup.cfg) a new entrypoint with: - name: the name to be passed to :py:func:`open_dataset` as `engine``.` - object reference: the reference of the class that you have implemented. -See https://packaging.python.org/specifications/entry-points/#data-model for more information. +See https://packaging.python.org/specifications/entry-points/#data-model for more information + Decoders ++++++++ -The decoders implement the specific operations to transform data on-disk representation -to Xarray representation. -Xarray uses internally a set of decoders needed to transform netCDF4 on-disk data into a :py:class:`Dataset` following Xarray standards (For more details see :py:function:`open_dataset`): - -- strings.CharacterArrayCoder() -- strings.EncodedStringCoder() -- variables.UnsignedIntegerCoder() -- variables.CFMaskCoder() -- variables.CFScaleOffsetCoder() -- times.CFTimedeltaCoder() -- times.CFDatetimeCoder(use_cftime=use_cftime) - -They are grouped in higher-level function conventions.decode_cf_variables. It takes in input the list of variables to decode and the decoders to activate/deactivate. -Some transformations can be common to more backends, so before implementing a decoder, you should to be sure that is not already implemented by xarray. From e471f3a8cf36c3ff3b7549c309fa6f603f688645 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Tue, 2 Feb 2021 16:31:31 +0100 Subject: [PATCH 06/52] incompletre draft: update Backend Documentation --- doc/internals.rst | 155 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 124 insertions(+), 31 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 3f1a12453ca..7a137de54bf 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -234,17 +234,18 @@ re-open it directly with Zarr: How to add a new backend ------------------------- +------------------------------------ -Adding a new backend for read support to Xarray is easy, and does not require to integrate any code in Xarray; all you need to do is approaching the following steps: - -- Create a class that inherits from ``BackendEntrypoint`` - -- Implement the method :py:func:``open_dataset`` that returns an instance of :py:class:``Dataset`` +Adding a new backend for read support to Xarray is easy, and does not require +to integrate any code in Xarray; all you need to do is approaching the +following steps: +- Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint` +- Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` - Declare such a class as an external plugin in your setup.py. -Your ``BackendEntrypoint`` sub-class is the ~~main~~ primary interface with Xarray, and it should implement the following attributes and functions: +Your ``BackendEntrypoint`` sub-class is the primary interface with Xarray, and +it should implement the following attributes and functions: - ``open_dataset`` (mandatory) - [``open_dataset_parameters``] (optional) @@ -255,14 +256,18 @@ These are detailed in the following. open_dataset ++++++++++++ -**Inputs** +Inputs +^^^^^^ -The backend ``open_dataset`` method takes as input one argument (``filename``), and one keyword argument (``drop_variables``): +The backend ``open_dataset`` method takes as input one argument +(``filename``), and one keyword argument (``drop_variables``): - ``filename``: can be a string containing a relative path or an instance of ``pathlib.Path``. - ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. -If it makes sense for your backend, :py:func:`open_dataset` method should implement in its interface all the following boolean keyword arguments, called **decoders** which default to ``None``: +If it makes sense for your backend, your ``open_dataset`` method should +implement in its interface all the following boolean keyword arguments, called +**decoders** which default to ``None``: - ``mask_and_scale=None`` - ``decode_times=None`` @@ -271,46 +276,134 @@ If it makes sense for your backend, :py:func:`open_dataset` method should implem - ``concat_characters=None`` - ``decode_coords=None`` -These keyword arguments are explicitly defined in Xarray :py:func:`open_dataset` signature. Xarray will pass them to the backend only if the User sets a value different from ``None`` explicitly. -Your backend can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to Xarray :py:func`open_dataset` grouped either via the ``backend_kwarg`` parameter or explicitly using the syntax ``**kwargs``. +These keyword arguments are explicitly defined in Xarray +:py:meth:`~xarray.open_dataset` signature. Xarray will pass them to the +backend only if the User sets a value different from ``None`` explicitly. +Your backend can also take as input a set of backend-specific keyword +arguments. All these keyword arguments can be passed to +:py:meth:`~xarray.open_dataset` grouped either via the ``backend_kwarg`` +parameter or explicitly using the syntax ``**kwargs``. +Output +^^^^^^ +The output of the backend `open_dataset` shall be an instance of +Xarray py:class:`~xarray.Dataset` that implements the additional method ``close``, +used by Xarray to ensure the related files are eventually closed. -**Output** +If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` +shall contain ``NumPy.arrays`` and your work is almost done. -The output of the backend :py:func:`open_dataset` shall be an instance of Xarray :py:class:`Dataset` that implements the additional method ``close``, used by Xarray to ensure the related files are eventually closed. -If you don't want to support the lazy loading, then the :py:class:`Dataset` shall contain ``NumPy.arrays`` and your work is almost done. +open_dataset_parameters ++++++++++++++++++++++++ +``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. +It is not a mandatory parameter, and if the backend does not provide it +explicitly, Xarray creates a list of them automatically by inspecting the +backend signature. -Dask chunking -+++++++++++++ +Xarray uses ``open_dataset_parameters`` only when it needs to select +the **decoders** supported by the backend. +If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` have +been passed to the signature, Xarray raises an error. +On the other hand, if the backend provides the ``open_dataset_parameters``, +then ``**kwargs`` and `*args`` can be used in the signature. -open_dataset_parameters -+++++++++++++++++++++++ -``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. Its use is not mandatory, and if the backend does not provide it explicitly, Xarray creates a list of them automatically by inspecting the backend signature. -Xarray uses ``open_dataset_parameters`` only when it needs to select the **decoders** supported by the backend. -If ``open_dataset_parameters`` is not defined, but `**kwargs` and `*args` have been passed to the signature, Xarray raises an error. -On the other hand, if the backend provides the ``open_dataset_parameters``, then `**kwargs` and `*args` can be used in the signature. However, this practice is discouraged unless there is a good reasons for using `**kwargs` or `*args`. +However, this practice is discouraged unless there is a good reasons for using +`**kwargs` or `*args`. guess_can_open ++++++++++++++ -``guess_can_open`` is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since `BackendEntrypoint` already provides a default ``guess_can_open`` that always returns `False`. -Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of :py:func:`open_dataset`, and returns a boolean. +``guess_can_open`` is used to identify the proper engine to open your data +file automatically in case the engine is not specified explicitly. If you are +not interested in supporting this feature, you can skip this step since +py:class:`~xarray.backend.common.BackendEntrypoint` already provides a default +py:meth:`~xarray.backend.common BackendEntrypoint.guess_engine` that always returns ``False``. -How to support Lazy Loading -+++++++++++++++++++++++++++ +Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of +Xarray :py:meth:`~xarray.open_dataset`, and returns a boolean. How to register a backend -+++++++++++++++++++++++++ ++++++++++++++++++++++++++++ + Define in your setup.py (or setup.cfg) a new entrypoint with: - group: ``xarray.backend`` -- name: the name to be passed to :py:func:`open_dataset` as `engine``.` +- name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``. - object reference: the reference of the class that you have implemented. -See https://packaging.python.org/specifications/entry-points/#data-model for more information +See https://packaging.python.org/specifications/entry-points/#data-model +for more information + +How to support Lazy Loading ++++++++++++++++++++++++++++ + + +Dask chunking ++++++++++++++ +The backend is not directly involved in `Dask `__ chunking, since it is managed +internally by Xarray. However, the backend can define the preferred chunk size +inside the variable’s encoding ``var.encoding["preferred_chunks"]``. +The ``preferred_chunks`` may be useful to improve performances with lazy loading. +``preferred_chunks`` shall be a dictionary specifying chunk size per dimension +like ``{“dim1”: 1000, “dim2”: 2000}`` or +``{“dim1”: [1000, 100], “dim2”: [2000, 2000, 2000]]}``. + +The ``preferred_chunks`` is used by Xarray to define the chunk size in some +special cases: + +- If ``chunks`` along a dimension is None or not defined +- If ``chunks`` is “auto” + +In the first case Xarray uses the chunks size specified in +``preferred_chunks``. +In the second case Xarray accommodates ideal chunk sizes, preserving if +possible the "preferred_chunks". The ideal chunk size is computed with using +``dask.core.normalize function``, setting ``previus_chunks = preferred_chunks``. Decoders ++++++++ - +The decoders implement the specific operations to transform data on-disk +representation to Xarray representation. + +A classic example is the decoding of the time. In the NetCDF, the variable +time is stored as integers witha time unit that contains an origin (for +example: "seconds since 1970-1-1"), Xarray transforms the pair integer and +unit in a ``NumPy.datetimes``. + +The standard decoders implemented by Xarray are: +- strings.CharacterArrayCoder() +- strings.EncodedStringCoder() +- variables.UnsignedIntegerCoder() +- variables.CFMaskCoder() +- variables.CFScaleOffsetCoder() +- times.CFTimedeltaCoder() +- times.CFDatetimeCoder() + +Some transformations can be common to more backends, so before implementing a +new decoder, please be sure that is not already implemented by Xarray. + +Xarray’s decoders can be reused by the backends, either instantiating directly +the decoders or using the higher-level function +:py:func:`~xarray.conventions.decode_cf_variables` that groups Xarray decoders. + +In some cases the transformation to apply strongly depends on the on-disk data +format, therefore you may need to implement your own decoder. + +An example is the time format in grib files. grib format is very different +from the NetCDF one: the time is stored in two attributes dataDate and +dataTime as strings. Therefore, in this case it is not possible to reuse the +Xarray time decoder, but a new one shall be implemented. + +Decoders can be activated or deactivated using the boolean keywords of Xarray +:py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, +``decode_times``, ``decode_timedelta``, ``use_cftime``, ``concat_characters``, +``decode_coords``. + +Such keywords are passed to the backend only if the User sets a value +different from ``None``. Note that the backend does not necessarily have to +implement all the decoders, but it shall declare in its ``open_dataset`` +interface only the boolean keywords related to the supported decoders. The +deactivation and activation of the supported decoders shall be implemented by +the backend. From 65c339dd9c30200028cdd52a388264222280afad Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Wed, 3 Feb 2021 09:38:21 +0100 Subject: [PATCH 07/52] fix --- doc/internals.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 7a137de54bf..04b93ad6259 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -337,7 +337,10 @@ for more information How to support Lazy Loading +++++++++++++++++++++++++++ - +If you want to make your backend effective with big data, then you should support +the lazy loading. Basically, you shall replace the :py:meth:`numpy.array` inside the variables +the py:class:`~xarray.Dataset` with a custom class that inherits from +py:class:`~xarray.backend.common.BackendArray`. Dask chunking +++++++++++++ @@ -358,7 +361,7 @@ special cases: In the first case Xarray uses the chunks size specified in ``preferred_chunks``. In the second case Xarray accommodates ideal chunk sizes, preserving if -possible the "preferred_chunks". The ideal chunk size is computed with using +possible the "preferred_chunks". The ideal chunk size is computed using ``dask.core.normalize function``, setting ``previus_chunks = preferred_chunks``. From 113197d50ad11fd35b6a57781d44a2d35cb4b85d Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Wed, 3 Feb 2021 10:52:41 +0100 Subject: [PATCH 08/52] fix syle --- doc/internals.rst | 68 ++++++++++++++++++++++++++++++----------------- 1 file changed, 43 insertions(+), 25 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 04b93ad6259..963989f0f33 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -337,10 +337,27 @@ for more information How to support Lazy Loading +++++++++++++++++++++++++++ -If you want to make your backend effective with big data, then you should support -the lazy loading. Basically, you shall replace the :py:meth:`numpy.array` inside the variables -the py:class:`~xarray.Dataset` with a custom class that inherits from -py:class:`~xarray.backend.common.BackendArray`. +If you want to make your backend effective with big datasets, then you should support +the lazy loading. +Basically, you shall replace the :py:class:`numpy.array` inside the variables with +a custom class: + +.. ipython:: python + backend_array = YourBackendArray() + data = indexing.LazilyOuterIndexedArray(backend_array) + variable = Variable(..., data, ...) + +Where ``YourBackendArray``is a class that inherit from +:py:class:`~xarray.backends.common.BackendArray` and +:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a +class of Xarray that wraps an array to make basic and outer indexing lazy. + +BackendArray +^^^^^^^^^^^^ + +CachingFileManager +^^^^^^^^^^^^^^^^^^ + Dask chunking +++++++++++++ @@ -367,15 +384,15 @@ possible the "preferred_chunks". The ideal chunk size is computed using Decoders ++++++++ -The decoders implement the specific operations to transform data on-disk +The decoders implement specific operations to transform data from on-disk representation to Xarray representation. -A classic example is the decoding of the time. In the NetCDF, the variable -time is stored as integers witha time unit that contains an origin (for -example: "seconds since 1970-1-1"), Xarray transforms the pair integer and -unit in a ``NumPy.datetimes``. +A classic example is the “time” variable decoding operation. In NetCDF, the +elements of the “time” variable are stored as integers, and the unit contains +an origin (for example: "seconds since 1970-1-1"). In this case, Xarray +transforms the pair integer-unit in a ``np.datetimes``. -The standard decoders implemented by Xarray are: +The standard decoders implemented in Xarray are: - strings.CharacterArrayCoder() - strings.EncodedStringCoder() - variables.UnsignedIntegerCoder() @@ -384,29 +401,30 @@ The standard decoders implemented by Xarray are: - times.CFTimedeltaCoder() - times.CFDatetimeCoder() -Some transformations can be common to more backends, so before implementing a -new decoder, please be sure that is not already implemented by Xarray. +Some of the transformations can be common to more backends, so before +implementing a new decoder, be sure Xarray does not already implement that one. -Xarray’s decoders can be reused by the backends, either instantiating directly -the decoders or using the higher-level function +The backends can reuse Xarray’s decoders, either instantiating the decoders +directly or using the higher-level function :py:func:`~xarray.conventions.decode_cf_variables` that groups Xarray decoders. -In some cases the transformation to apply strongly depends on the on-disk data -format, therefore you may need to implement your own decoder. +In some cases, the transformation to apply strongly depends on the on-disk +data format. Therefore, you may need to implement your decoder. -An example is the time format in grib files. grib format is very different -from the NetCDF one: the time is stored in two attributes dataDate and -dataTime as strings. Therefore, in this case it is not possible to reuse the -Xarray time decoder, but a new one shall be implemented. +An example of such a case is when you have to deal with the time format of a +grib file. grib format is very different from the NetCDF one: in grib, the +time is stored in two attributes dataDate and dataTime as strings. Therefore, +it is not possible to reuse the Xarray time decoder, and implementing a new +one is mandatory. -Decoders can be activated or deactivated using the boolean keywords of Xarray +Decoders can be activated or deactivated using the boolean keywords of :py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, -``decode_times``, ``decode_timedelta``, ``use_cftime``, ``concat_characters``, -``decode_coords``. +``decode_times``, ``decode_timedelta``, ``use_cftime``, +``concat_characters``, ``decode_coords``. Such keywords are passed to the backend only if the User sets a value different from ``None``. Note that the backend does not necessarily have to implement all the decoders, but it shall declare in its ``open_dataset`` interface only the boolean keywords related to the supported decoders. The -deactivation and activation of the supported decoders shall be implemented by -the backend. +backend shall implement the deactivation and activation of the supported +decoders. From f9ed1d43b888af37fe34b32e45d7cf65626668b1 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:27:13 +0100 Subject: [PATCH 09/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 963989f0f33..15f305d71e8 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -281,7 +281,7 @@ These keyword arguments are explicitly defined in Xarray backend only if the User sets a value different from ``None`` explicitly. Your backend can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to -:py:meth:`~xarray.open_dataset` grouped either via the ``backend_kwarg`` +:py:meth:`~xarray.open_dataset` grouped either via the ``backend_kwargs`` parameter or explicitly using the syntax ``**kwargs``. Output From 6b2ecd569a806fab663b20340348d4dc8f20176b Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:28:25 +0100 Subject: [PATCH 10/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 15f305d71e8..dd5c6a09004 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -291,7 +291,7 @@ Xarray py:class:`~xarray.Dataset` that implements the additional method ``close` used by Xarray to ensure the related files are eventually closed. If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` -shall contain ``NumPy.arrays`` and your work is almost done. +shall contain :py:class:`numpy.ndarray` and your work is almost done. open_dataset_parameters +++++++++++++++++++++++ From 195816364a14ec90930221e003d3bda556c3c4d9 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:28:57 +0100 Subject: [PATCH 11/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index dd5c6a09004..e20f027cab6 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -287,7 +287,7 @@ parameter or explicitly using the syntax ``**kwargs``. Output ^^^^^^ The output of the backend `open_dataset` shall be an instance of -Xarray py:class:`~xarray.Dataset` that implements the additional method ``close``, +Xarray :py:class:`~xarray.Dataset` that implements the additional method ``close``, used by Xarray to ensure the related files are eventually closed. If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` From bba32e4050611f2942333d3c763e0717006d33d0 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:29:11 +0100 Subject: [PATCH 12/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index e20f027cab6..99dc280e4b9 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -372,7 +372,7 @@ like ``{“dim1”: 1000, “dim2”: 2000}`` or The ``preferred_chunks`` is used by Xarray to define the chunk size in some special cases: -- If ``chunks`` along a dimension is None or not defined +- If ``chunks`` along a dimension is ``None`` or not defined - If ``chunks`` is “auto” In the first case Xarray uses the chunks size specified in From cb8d716c6cd0aa26ca4fafdd72ce967eaf83c7c9 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:29:40 +0100 Subject: [PATCH 13/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 99dc280e4b9..b2f674fc8cb 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -316,7 +316,7 @@ guess_can_open ``guess_can_open`` is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since -py:class:`~xarray.backend.common.BackendEntrypoint` already provides a default +py:class:`~xarray.backends.common.BackendEntrypoint` already provides a default py:meth:`~xarray.backend.common BackendEntrypoint.guess_engine` that always returns ``False``. Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of From 2adf3556241d71f911a3ff275ca71d0d3a78cfb1 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:30:03 +0100 Subject: [PATCH 14/52] Update doc/internals.rst Co-authored-by: keewis --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index b2f674fc8cb..b0afa802235 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -317,7 +317,7 @@ guess_can_open file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since py:class:`~xarray.backends.common.BackendEntrypoint` already provides a default -py:meth:`~xarray.backend.common BackendEntrypoint.guess_engine` that always returns ``False``. +:py:meth:`~xarray.backend.common.BackendEntrypoint.guess_engine` that always returns ``False``. Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of Xarray :py:meth:`~xarray.open_dataset`, and returns a boolean. From 7ec3238cfc9779fa4ce9708dae0fdd1e01d2d1ce Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:30:33 +0100 Subject: [PATCH 15/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index b0afa802235..a69455f2773 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -306,7 +306,7 @@ the **decoders** supported by the backend. If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` have been passed to the signature, Xarray raises an error. On the other hand, if the backend provides the ``open_dataset_parameters``, -then ``**kwargs`` and `*args`` can be used in the signature. +then ``**kwargs`` and ``*args`` can be used in the signature. However, this practice is discouraged unless there is a good reasons for using `**kwargs` or `*args`. From fb0349384290546a019bd968e63d9fec5ff8dc3f Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:31:11 +0100 Subject: [PATCH 16/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index a69455f2773..72065a7923f 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -379,7 +379,7 @@ In the first case Xarray uses the chunks size specified in ``preferred_chunks``. In the second case Xarray accommodates ideal chunk sizes, preserving if possible the "preferred_chunks". The ideal chunk size is computed using -``dask.core.normalize function``, setting ``previus_chunks = preferred_chunks``. +:py:func:`dask.core.normalize_chunks`, setting ``previous_chunks = preferred_chunks``. Decoders From 22794d2d7211d35114bc6871645bc4329f69cbd8 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:31:42 +0100 Subject: [PATCH 17/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 72065a7923f..c80703a4fa8 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -309,7 +309,7 @@ On the other hand, if the backend provides the ``open_dataset_parameters``, then ``**kwargs`` and ``*args`` can be used in the signature. However, this practice is discouraged unless there is a good reasons for using -`**kwargs` or `*args`. +``**kwargs`` or ``*args``. guess_can_open ++++++++++++++ From 67d2c1f6ad9cb187b41a0a88a6ad9486a108cfe5 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:31:59 +0100 Subject: [PATCH 18/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index c80703a4fa8..afe7d2ee7c6 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -236,7 +236,7 @@ re-open it directly with Zarr: How to add a new backend ------------------------------------ -Adding a new backend for read support to Xarray is easy, and does not require +Adding a new backend for read support to Xarray does not require to integrate any code in Xarray; all you need to do is approaching the following steps: From f58f16b0d9c3079a1e95f7b6491d926ce925a6bc Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:32:12 +0100 Subject: [PATCH 19/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index afe7d2ee7c6..27770b70c32 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -242,7 +242,7 @@ following steps: - Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint` - Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` -- Declare such a class as an external plugin in your setup.py. +- Declare this class as an external plugin in your setup.py. Your ``BackendEntrypoint`` sub-class is the primary interface with Xarray, and it should implement the following attributes and functions: From 6a07a7c29a1f24138a1cd96b596b7979277bbb76 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:38:00 +0100 Subject: [PATCH 20/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 27770b70c32..d4b1e688e4a 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -237,8 +237,7 @@ How to add a new backend ------------------------------------ Adding a new backend for read support to Xarray does not require -to integrate any code in Xarray; all you need to do is approaching the -following steps: +to integrate any code in Xarray; all you need to do is follow these steps: - Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint` - Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` From 1285874ec39255ef340f49e285fca3b69915d901 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:38:16 +0100 Subject: [PATCH 21/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index d4b1e688e4a..c1f9bf78c95 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -239,7 +239,7 @@ How to add a new backend Adding a new backend for read support to Xarray does not require to integrate any code in Xarray; all you need to do is follow these steps: -- Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint` +- Create a class that inherits from Xarray py:class:`~xarray.backends.common.BackendEntrypoint` - Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` - Declare this class as an external plugin in your setup.py. From 112837d77ea58a2a7f1bd331cce003f76e9e662e Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:39:11 +0100 Subject: [PATCH 22/52] Update doc/internals.rst Co-authored-by: keewis --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index c1f9bf78c95..eed047b02cf 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -261,7 +261,7 @@ Inputs The backend ``open_dataset`` method takes as input one argument (``filename``), and one keyword argument (``drop_variables``): -- ``filename``: can be a string containing a relative path or an instance of ``pathlib.Path``. +- ``filename``: can be a string containing a path or an instance of :py:class:`pathlib.Path`. - ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. If it makes sense for your backend, your ``open_dataset`` method should From bdc46aa0ec5ac4c1ec3644295153538d2a467611 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:42:49 +0100 Subject: [PATCH 23/52] Update doc/internals.rst Co-authored-by: keewis --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index eed047b02cf..5645b106324 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -276,7 +276,7 @@ implement in its interface all the following boolean keyword arguments, called - ``decode_coords=None`` These keyword arguments are explicitly defined in Xarray -:py:meth:`~xarray.open_dataset` signature. Xarray will pass them to the +:py:func:`~xarray.open_dataset` signature. Xarray will pass them to the backend only if the User sets a value different from ``None`` explicitly. Your backend can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to From fe2204827f47be46406d0f4e09496579d4d26030 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 04:43:46 +0100 Subject: [PATCH 24/52] Update doc/internals.rst Co-authored-by: Mathias Hauser --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 5645b106324..f00c59b87d3 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -337,7 +337,7 @@ for more information How to support Lazy Loading +++++++++++++++++++++++++++ If you want to make your backend effective with big datasets, then you should support -the lazy loading. +lazy loading. Basically, you shall replace the :py:class:`numpy.array` inside the variables with a custom class: From f492136824835c170756e57b4ba75e210adb2eb8 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Wed, 3 Feb 2021 11:45:32 +0100 Subject: [PATCH 25/52] update section lazy laoding --- doc/internals.rst | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 963989f0f33..7f18d80ae6f 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -339,25 +339,40 @@ How to support Lazy Loading +++++++++++++++++++++++++++ If you want to make your backend effective with big datasets, then you should support the lazy loading. -Basically, you shall replace the :py:class:`numpy.array` inside the variables with -a custom class: +Basically, when you instantiate the variables, instead of using a :py:class:`numpy.array`, +you need to use custom class that support lazy loading indexing: -.. ipython:: python - backend_array = YourBackendArray() +.. code-block:: python + + backend_array = CustomBackendArray() data = indexing.LazilyOuterIndexedArray(backend_array) - variable = Variable(..., data, ...) + var = Variable(..., data, ...) + + +Xarray implements the wrapper class that manages the the lazy loading: +:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. +While the backend must implement ``CustomBackendArray`` that inherit from +:py:class:`~xarray.backends.common.BackendArray` that implements the +method `__getitem__`. -Where ``YourBackendArray``is a class that inherit from -:py:class:`~xarray.backends.common.BackendArray` and -:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a -class of Xarray that wraps an array to make basic and outer indexing lazy. +BackendArray subclassing +^^^^^^^^^^^^^^^^^^^^^^^^ + +In your sub-class you need to implement two methods in addition to the +``__init__`` one: + +- ``__getitem`` +- ``__getitem__`` + +Where ``__get_item`` -BackendArray -^^^^^^^^^^^^ CachingFileManager ^^^^^^^^^^^^^^^^^^ +Type of Indexing +^^^^^^^^^^^^^^^^ + Dask chunking +++++++++++++ From c50a95c0e6c4c6554a510b25fb70c28d19ee3aeb Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 08:56:33 +0100 Subject: [PATCH 26/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index f00c59b87d3..5c765e4a2cc 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -372,7 +372,7 @@ The ``preferred_chunks`` is used by Xarray to define the chunk size in some special cases: - If ``chunks`` along a dimension is ``None`` or not defined -- If ``chunks`` is “auto” +- If ``chunks`` is ``"auto"`` In the first case Xarray uses the chunks size specified in ``preferred_chunks``. From ab62bebbbfb88a2b066cbcf2f84fc491b7128237 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 08:58:49 +0100 Subject: [PATCH 27/52] Update doc/internals.rst Co-authored-by: Deepak Cherian --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 5c765e4a2cc..11e6f441722 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -325,7 +325,7 @@ Xarray :py:meth:`~xarray.open_dataset`, and returns a boolean. How to register a backend +++++++++++++++++++++++++++ -Define in your setup.py (or setup.cfg) a new entrypoint with: +Define in your ``setup.py`` (or ``setup.cfg``) a new entrypoint with: - group: ``xarray.backend`` - name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``. From e470f361b73b18990b2b11f12de66d2750773329 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Feb 2021 08:59:32 +0100 Subject: [PATCH 28/52] Update doc/internals.rst Co-authored-by: keewis --- doc/internals.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/internals.rst b/doc/internals.rst index 11e6f441722..be1efbf49d1 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -342,6 +342,7 @@ Basically, you shall replace the :py:class:`numpy.array` inside the variables wi a custom class: .. ipython:: python + backend_array = YourBackendArray() data = indexing.LazilyOuterIndexedArray(backend_array) variable = Variable(..., data, ...) From a777445a988ffed43ed932a4c28c228cef153905 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 4 Feb 2021 09:08:41 +0100 Subject: [PATCH 29/52] update internals.rst backend --- doc/internals.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 0e05ca25b78..a5c26299872 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -241,7 +241,7 @@ to integrate any code in Xarray; all you need to do is follow these steps: - Create a class that inherits from Xarray py:class:`~xarray.backends.common.BackendEntrypoint` - Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` -- Declare this class as an external plugin in your setup.py. +- Declare this class as an external plugin in your ``setup.py``. Your ``BackendEntrypoint`` sub-class is the primary interface with Xarray, and it should implement the following attributes and functions: @@ -347,8 +347,9 @@ a custom class you need to use custom class that supports lazy loading indexing: data = indexing.LazilyOuterIndexedArray(backend_array) variable = Variable(..., data, ...) -Xarray implements the wrapper class that manages the the lazy loading: -:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. +Xarray implements the wrapper classes that manages the lazy loading: +:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` and +:py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` While the backend must implement ``YourBackendArray`` that inherit from :py:class:`~xarray.backends.common.BackendArray` and implements the method ``__getitem__``. From 1f0870e6f0de8520a36ba389f7b18aefc9431737 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Wed, 10 Feb 2021 17:43:35 +0100 Subject: [PATCH 30/52] add lazy loading documentation --- doc/api-hidden.rst | 9 ++ doc/internals.rst | 334 +++++++++++++++++++++++++++++++++------------ 2 files changed, 252 insertions(+), 91 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index e5492ec73a4..d4f9201c76a 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -809,3 +809,12 @@ backends.DummyFileManager.acquire backends.DummyFileManager.acquire_context backends.DummyFileManager.close + + backends.common.BackendArray + + core.indexing.IndexingSupport + core.indexing.explicit_indexing_adapter + core.indexing.BasicIndexer + core.indexing.OuterIndexer + core.indexing.VectorizedIndexer + core.indexing.LazilyOuterIndexedArray diff --git a/doc/internals.rst b/doc/internals.rst index 65c482687a0..6b443e7265c 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -237,26 +237,58 @@ How to add a new backend ------------------------------------ Adding a new backend for read support to Xarray does not require -to integrate any code in Xarray; all you need to do is follow these steps: +to integrate any code in Xarray; all you need to do is: -- Create a class that inherits from Xarray py:class:`~xarray.backends.common.BackendEntrypoint` -- Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset` -- Declare this class as an external plugin in your ``setup.py``. +- Create a class that inherits from Xarray :py:class:`~xarray.backends.common.BackendEntrypoint` and implements the method ``open_dataset`` (:ref:`RST backend_entrypoint`) + +- Declare this class as an external plugin in your ``setup.py`` (:ref:`RST backend_registration`) + +If you want to support also lazy loading and dask see :ref:`RST lazy_loading` and :ref:`RST dask`. + +.. _RST backend_entrypoint: + +BackendEntrypoint subclassing +++++++++++++++++++++++++++++++ Your ``BackendEntrypoint`` sub-class is the primary interface with Xarray, and -it should implement the following attributes and functions: +it should implement the following attributes and methods: - ``open_dataset`` (mandatory) -- [``open_dataset_parameters``] (optional) -- [``guess_can_open``] (optional) +- ``open_dataset_parameters`` (optional) +- ``guess_can_open`` (optional) + +This is what a ``BackendEntrypoint`` subclass should look like: -These are detailed in the following. +.. code-block:: python + + class YourBackendEntrypoint(BackendEntrypoint): + + def open_dataset( + self, + filename_or_obj, + decode_coords=None, + ... + ): + ... + return ds + + open_dataset_parameters = ... + + def guess_can_open(self, filename_or_obj): + try: + _, ext = os.path.splitext(filename_or_obj) + except TypeError: + return False + return ext in {...} + +``BackendEntrypoint`` subclass methods and attributes are detailed in the following. + +.. _RST open_dataset: open_dataset -++++++++++++ +^^^^^^^^^^^^ -Inputs -^^^^^^ +**Inputs** The backend ``open_dataset`` method takes as input one argument (``filename``), and one keyword argument (``drop_variables``): @@ -265,26 +297,29 @@ The backend ``open_dataset`` method takes as input one argument - ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. If it makes sense for your backend, your ``open_dataset`` method should -implement in its interface all the following boolean keyword arguments, called -**decoders** which default to ``None``: +implement in its interface the following boolean keyword arguments, called +**decoders**, which default to ``None``: -- ``mask_and_scale=None`` -- ``decode_times=None`` -- ``decode_timedelta=None`` -- ``use_cftime=None`` -- ``concat_characters=None`` -- ``decode_coords=None`` +- ``mask_and_scale`` +- ``decode_times`` +- ``decode_timedelta`` +- ``use_cftime`` +- ``concat_characters`` +- ``decode_coords`` These keyword arguments are explicitly defined in Xarray -:py:func:`~xarray.open_dataset` signature. Xarray will pass them to the -backend only if the User sets a value different from ``None`` explicitly. +:py:func:`~xarray.open_dataset` signature. Xarray will pass them to the +backend only if the User explicitly sets a value different from ``None``. + +For more details on decoders see :ref:`RST decoders`. + Your backend can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to -:py:meth:`~xarray.open_dataset` grouped either via the ``backend_kwargs`` +:py:func:`~xarray.open_dataset` grouped either via the ``backend_kwargs`` parameter or explicitly using the syntax ``**kwargs``. -Output -^^^^^^ +**Output** + The output of the backend `open_dataset` shall be an instance of Xarray :py:class:`~xarray.Dataset` that implements the additional method ``close``, used by Xarray to ensure the related files are eventually closed. @@ -292,8 +327,11 @@ used by Xarray to ensure the related files are eventually closed. If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` shall contain :py:class:`numpy.ndarray` and your work is almost done. +.. _RST open_dataset_parameters: + open_dataset_parameters -+++++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^^^ + ``open_dataset_parameters`` is the list of backend ``open_dataset`` parameters. It is not a mandatory parameter, and if the backend does not provide it explicitly, Xarray creates a list of them automatically by inspecting the @@ -302,16 +340,19 @@ backend signature. Xarray uses ``open_dataset_parameters`` only when it needs to select the **decoders** supported by the backend. -If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` have -been passed to the signature, Xarray raises an error. +If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` are +in backend ``open_dataset`` signature, Xarray raises an error. On the other hand, if the backend provides the ``open_dataset_parameters``, then ``**kwargs`` and ``*args`` can be used in the signature. However, this practice is discouraged unless there is a good reasons for using ``**kwargs`` or ``*args``. +.. _RST guess_can_open: + guess_can_open -++++++++++++++ +^^^^^^^^^^^^^^ + ``guess_can_open`` is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since @@ -321,6 +362,57 @@ py:class:`~xarray.backends.common.BackendEntrypoint` already provides a default Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of Xarray :py:meth:`~xarray.open_dataset`, and returns a boolean. +.. _RST decoders: + +Decoders +^^^^^^^^ +The decoders implement specific operations to transform data from on-disk +representation to Xarray representation. + +A classic example is the “time” variable decoding operation. In NetCDF, the +elements of the “time” variable are stored as integers, and the unit contains +an origin (for example: "seconds since 1970-1-1"). In this case, Xarray +transforms the pair integer-unit in a :py:class:`numpy.datetime64`. + +The standard decoders implemented in Xarray are: + +- strings.CharacterArrayCoder() +- strings.EncodedStringCoder() +- variables.UnsignedIntegerCoder() +- variables.CFMaskCoder() +- variables.CFScaleOffsetCoder() +- times.CFTimedeltaCoder() +- times.CFDatetimeCoder() + +Some of the transformations can be common to more backends, so before +implementing a new decoder, be sure Xarray does not already implement that one. + +The backends can reuse Xarray’s decoders, either instantiating the decoders +directly or using the higher-level function +:py:func:`~xarray.conventions.decode_cf_variables` that groups Xarray decoders. + +In some cases, the transformation to apply strongly depends on the on-disk +data format. Therefore, you may need to implement your decoder. + +An example of such a case is when you have to deal with the time format of a +grib file. grib format is very different from the NetCDF one: in grib, the +time is stored in two attributes dataDate and dataTime as strings. Therefore, +it is not possible to reuse the Xarray time decoder, and implementing a new +one is mandatory. + +Decoders can be activated or deactivated using the boolean keywords of +:py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, +``decode_times``, ``decode_timedelta``, ``use_cftime``, +``concat_characters``, ``decode_coords``. + +Such keywords are passed to the backend only if the User sets a value +different from ``None``. Note that the backend does not necessarily have to +implement all the decoders, but it shall declare in its ``open_dataset`` +interface only the boolean keywords related to the supported decoders. The +backend shall implement the deactivation and activation of the supported +decoders. + +.. _RST backend_registration: How to register a backend +++++++++++++++++++++++++++ @@ -331,15 +423,37 @@ Define in your ``setup.py`` (or ``setup.cfg``) a new entrypoint with: - name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``. - object reference: the reference of the class that you have implemented. +You can declare the entrypoint in ``setup.cfg`` using the following syntax: + +.. code-block:: + + [options.entry_points] + xarray.backends = + engine_name = your_package.your_module:your_backendentrypoint + +in ``setup.py``: + +.. code-block:: + + setuptools.setup( + ... + entry_points={ + 'xarray.backends': ['engine_name=your_package.your_module:your_backendentrypoint'], + }, + ) + + See https://packaging.python.org/specifications/entry-points/#data-model for more information +.. _RST lazy_loading: + How to support Lazy Loading +++++++++++++++++++++++++++ If you want to make your backend effective with big datasets, then you should support lazy loading. -Basically, you shall replace the :py:class:`numpy.array` inside the variables with -custom class that supports lazy loading indexing: +Basically, you shall replace the :py:class:`numpy.ndarray` inside the variables with a +custom class that supports lazy loading indexing. See the example below: .. code-block:: python @@ -347,26 +461,111 @@ custom class that supports lazy loading indexing: data = indexing.LazilyOuterIndexedArray(backend_array) variable = Variable(..., data, ...) -Xarray implements the wrapper classes that manages the lazy loading: -:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` and -:py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` -While the backend must implement ``YourBackendArray`` that inherit from -:py:class:`~xarray.backends.common.BackendArray` and implements the -method ``__getitem__``. +Where: + +- :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is class provided by Xarray that manages the lazy loading. +- ``YourBackendArray`` shall be implemented by the backend and it shall inherit from :py:class:`~xarray.backends.common.BackendArray`. BackendArray subclassing ^^^^^^^^^^^^^^^^^^^^^^^^ -Type of Indexing -^^^^^^^^^^^^^^^^ +BackendArray subclass shall implement the following method and attributes: + +- the ``__getitem__`` method that takes in input an index and returns a `NumPy `__ array. +- the ``shape`` attribute +- the ``dtype`` attribute + + +Xarray supports different type of `indexing `__, that can be +grouped in three type of indexes +:py:class:`~xarray.core.indexing.BasicIndexer`, +:py:class:`~xarray.core.indexing.OuterIndexer` and +:py:class:`~xarray.core.indexing.VectorizedIndexer`. +This implies that the implementation of the method ``__getitem__`` can be tricky. +In oder to simplify this task, Xarray provides an helper function, +:py:func:`~xarray.core.indexing.explicit_indexing_adapter`, that transforms all the input ``indexer`` +types (`basic`, `outer`, `vectorized`) in a tuple which is interpreted correctly by your backend. + +This is an example ``BackendArray`` subclass implementation: + +.. code-block:: python + + class YourBackendArray(BackendArray): + + def __init__(self, ...): + self.shape = ... + self.dtype = ... + self.lock = ... + + def __getitem__(self, key): + return indexing.explicit_indexing_adapter( + key, self.shape, indexing.IndexingSupport.BASIC, self._raw_indexing_method + ) + + def _raw_indexing_method(self, key): + # thread safe method that access to data on disk + with self.lock: + ... + return item + +Note that ``BackendArray`` ``__getitem__`` must be thread safe to support multi-thread processing. + +The :py:func:`~xarray.core.indexing.explicit_indexing_adapter` takes in input in addition to the +``key`` and the array ``shape``, the following parameters: + +- ``indexing_support``: the type of index supported by ``raw_indexing_method``. +- ``raw_indexing_method``: a method that shall take in input a key in the form of a tuple and return an indexed :py:class:`numpy.ndarray`. + +For more details see +:py:class:`~xarray.core.indexing.IndexingSupport` and :ref:`RST indexing` + +In order to support `Dask `__ distributed and :py:mod:`multiprocessing`, +``BackendArray`` subclass should be serializable either with +:ref:`io.pickle` or `cloudpickle `__. That implies +that all the reference to open files should be dropped. For opening files, we therefore +suggest to use the helper class provided by Xarray +:py:class:`~xarray.backends.CachingFileManager`. + +.. _RST indexing: -CachingFileManager -^^^^^^^^^^^^^^^^^^ +Indexing Examples +^^^^^^^^^^^^^^^^^ +**BASIC** -Dask chunking -+++++++++++++ -The backend is not directly involved in `Dask `__ chunking, since it is managed -internally by Xarray. However, the backend can define the preferred chunk size +In the ``BASIC`` indexing support, numbers and slices are supported. +In this the behaviour is the same as NumPy. + +Example: + +.. code-block:: python + + # () shall return the full array + >>> backend_array._raw_indexing_method(()) + array([[ 0, 1, 2, 3], + [ 4, 5, 6, 7], + [ 8, 9, 10, 11]]) + + # shall support integers + >>> backend_array._raw_indexing_method(1, 1) + 5 + + # shall support slices + >>> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) + array([[ 2, 3], + [ 6, 7], + [10, 11]]) + +- OUTER: supports number, slices and vectors +- OUTER_1VECTOR: supports number, slices and at most one vectors +- VECTORIZED: supports vectorized index + +.. _RST dask: + +Backend preferred chunks +^^^^^^^^^^^^^^^^^^^^^^^^ + +The backend is not directly involved in `Dask `__ chunking, since it is internally +managed by Xarray. However, the backend can define the preferred chunk size inside the variable’s encoding ``var.encoding["preferred_chunks"]``. The ``preferred_chunks`` may be useful to improve performances with lazy loading. ``preferred_chunks`` shall be a dictionary specifying chunk size per dimension @@ -385,50 +584,3 @@ In the second case Xarray accommodates ideal chunk sizes, preserving if possible the "preferred_chunks". The ideal chunk size is computed using :py:func:`dask.core.normalize_chunks`, setting ``previous_chunks = preferred_chunks``. - -Decoders -++++++++ -The decoders implement specific operations to transform data from on-disk -representation to Xarray representation. - -A classic example is the “time” variable decoding operation. In NetCDF, the -elements of the “time” variable are stored as integers, and the unit contains -an origin (for example: "seconds since 1970-1-1"). In this case, Xarray -transforms the pair integer-unit in a ``np.datetimes``. - -The standard decoders implemented in Xarray are: -- strings.CharacterArrayCoder() -- strings.EncodedStringCoder() -- variables.UnsignedIntegerCoder() -- variables.CFMaskCoder() -- variables.CFScaleOffsetCoder() -- times.CFTimedeltaCoder() -- times.CFDatetimeCoder() - -Some of the transformations can be common to more backends, so before -implementing a new decoder, be sure Xarray does not already implement that one. - -The backends can reuse Xarray’s decoders, either instantiating the decoders -directly or using the higher-level function -:py:func:`~xarray.conventions.decode_cf_variables` that groups Xarray decoders. - -In some cases, the transformation to apply strongly depends on the on-disk -data format. Therefore, you may need to implement your decoder. - -An example of such a case is when you have to deal with the time format of a -grib file. grib format is very different from the NetCDF one: in grib, the -time is stored in two attributes dataDate and dataTime as strings. Therefore, -it is not possible to reuse the Xarray time decoder, and implementing a new -one is mandatory. - -Decoders can be activated or deactivated using the boolean keywords of -:py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, -``decode_times``, ``decode_timedelta``, ``use_cftime``, -``concat_characters``, ``decode_coords``. - -Such keywords are passed to the backend only if the User sets a value -different from ``None``. Note that the backend does not necessarily have to -implement all the decoders, but it shall declare in its ``open_dataset`` -interface only the boolean keywords related to the supported decoders. The -backend shall implement the deactivation and activation of the supported -decoders. From 885a6bdd9d60637d2ea0d3d147df741d80b3c5e1 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 11:31:05 +0100 Subject: [PATCH 31/52] update example on indexing type --- doc/api-hidden.rst | 14 +++ doc/internals.rst | 231 ++++++++++++++++++++++++++++++--------------- 2 files changed, 168 insertions(+), 77 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index d4f9201c76a..e4f69fc1278 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -818,3 +818,17 @@ core.indexing.OuterIndexer core.indexing.VectorizedIndexer core.indexing.LazilyOuterIndexedArray + core.indexing.LazilyVectorizedIndexedArray + + conventions.decode_cf_variables + + coding.variables.UnsignedIntegerCoder + coding.variables.CFMaskCoder + coding.variables.CFScaleOffsetCoder + + coding.strings.CharacterArrayCoder + coding.strings.EncodedStringCoder + + coding.times.CFTimedeltaCoder + coding.times.CFDatetimeCoder + diff --git a/doc/internals.rst b/doc/internals.rst index 6b443e7265c..d34774071d4 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -234,7 +234,7 @@ re-open it directly with Zarr: How to add a new backend ------------------------------------- +------------------------ Adding a new backend for read support to Xarray does not require to integrate any code in Xarray; all you need to do is: @@ -243,19 +243,19 @@ to integrate any code in Xarray; all you need to do is: - Declare this class as an external plugin in your ``setup.py`` (:ref:`RST backend_registration`) -If you want to support also lazy loading and dask see :ref:`RST lazy_loading` and :ref:`RST dask`. +If you also want to support lazy loading and dask see :ref:`RST lazy_loading` and :ref:`RST dask`. .. _RST backend_entrypoint: BackendEntrypoint subclassing -++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++ Your ``BackendEntrypoint`` sub-class is the primary interface with Xarray, and it should implement the following attributes and methods: -- ``open_dataset`` (mandatory) -- ``open_dataset_parameters`` (optional) -- ``guess_can_open`` (optional) +- the ``open_dataset`` method (mandatory) +- the ``open_dataset_parameters`` attribute (optional) +- the ``guess_can_open`` method (optional). This is what a ``BackendEntrypoint`` subclass should look like: @@ -293,8 +293,10 @@ open_dataset The backend ``open_dataset`` method takes as input one argument (``filename``), and one keyword argument (``drop_variables``): -- ``filename``: can be a string containing a path or an instance of :py:class:`pathlib.Path`. -- ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. +- ``filename``: can be a string containing a path or an instance of + :py:class:`pathlib.Path`. +- ``drop_variables``: can be `None` or an iterable containing the variable + names to be dropped when reading the data. If it makes sense for your backend, your ``open_dataset`` method should implement in its interface the following boolean keyword arguments, called @@ -321,11 +323,12 @@ parameter or explicitly using the syntax ``**kwargs``. **Output** The output of the backend `open_dataset` shall be an instance of -Xarray :py:class:`~xarray.Dataset` that implements the additional method ``close``, -used by Xarray to ensure the related files are eventually closed. +Xarray :py:class:`~xarray.Dataset` that implements the additional method +``close``, used by Xarray to ensure the related files are eventually closed. -If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` -shall contain :py:class:`numpy.ndarray` and your work is almost done. +If you don't want to support the lazy loading, then the +:py:class:`~xarray.Dataset` shall contain values as a :py:class:`numpy.ndarray` +and your work is almost done. .. _RST open_dataset_parameters: @@ -340,8 +343,8 @@ backend signature. Xarray uses ``open_dataset_parameters`` only when it needs to select the **decoders** supported by the backend. -If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` are -in backend ``open_dataset`` signature, Xarray raises an error. +If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` +are in the backend ``open_dataset`` signature, Xarray raises an error. On the other hand, if the backend provides the ``open_dataset_parameters``, then ``**kwargs`` and ``*args`` can be used in the signature. @@ -356,8 +359,9 @@ guess_can_open ``guess_can_open`` is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since -py:class:`~xarray.backends.common.BackendEntrypoint` already provides a default -:py:meth:`~xarray.backend.common.BackendEntrypoint.guess_engine` that always returns ``False``. +:py:class:`~xarray.backends.common.BackendEntrypoint` already provides a +default :py:meth:`~xarray.backend.common.BackendEntrypoint.guess_engine` +that always returns ``False``. Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of Xarray :py:meth:`~xarray.open_dataset`, and returns a boolean. @@ -374,25 +378,48 @@ elements of the “time” variable are stored as integers, and the unit contain an origin (for example: "seconds since 1970-1-1"). In this case, Xarray transforms the pair integer-unit in a :py:class:`numpy.datetime64`. -The standard decoders implemented in Xarray are: +The standard coders implemented in Xarray are: -- strings.CharacterArrayCoder() -- strings.EncodedStringCoder() -- variables.UnsignedIntegerCoder() -- variables.CFMaskCoder() -- variables.CFScaleOffsetCoder() -- times.CFTimedeltaCoder() -- times.CFDatetimeCoder() +- :py:class:`xarray.coding.strings.CharacterArrayCoder()` +- :py:class:`xarray.coding.strings.EncodedStringCoder()` +- :py:class:`xarray.coding.variables.UnsignedIntegerCoder()` +- :py:class:`xarray.coding.variables.CFMaskCoder()` +- :py:class:`xarray.coding.variables.CFScaleOffsetCoder()` +- :py:class:`xarray.coding.times.CFTimedeltaCoder()` +- :py:class:`xarray.coding.times.CFDatetimeCoder()` + +Xarray coders have all the same interface. They have two methods: ``decode`` +and ``encode``. The method ``decode`` applies a transformation from on-disk +format to Xarray format using the :py:class:`~xarray.Variable` attributes. +The attributes used for the encoding are dropped and saved inside the +``Variable.encoding``. The method ``encode`` perform the +inverse transformation using the ``encoding`` instead of the attributes. + +In the following an example on how to use the coders ``decode`` method: + +.. ipython:: python + + var = xr.Variable( + dims=("x",), + data=np.arange(10.0), + attrs={"scale_factor": 10, "add_offset": 2} + ) + var + + coder = xr.coding.variables.CFScaleOffsetCoder() + decoded_var = coder.decode(var) + decoded_var + decoded_var.encoding Some of the transformations can be common to more backends, so before implementing a new decoder, be sure Xarray does not already implement that one. -The backends can reuse Xarray’s decoders, either instantiating the decoders -directly or using the higher-level function +The backends can reuse Xarray’s decoders, either instantiating the coders +and using the method ``decode`` directly or using the higher-level function :py:func:`~xarray.conventions.decode_cf_variables` that groups Xarray decoders. In some cases, the transformation to apply strongly depends on the on-disk -data format. Therefore, you may need to implement your decoder. +data format. Therefore, you may need to implement your own decoder. An example of such a case is when you have to deal with the time format of a grib file. grib format is very different from the NetCDF one: in grib, the @@ -417,21 +444,13 @@ decoders. How to register a backend +++++++++++++++++++++++++++ -Define in your ``setup.py`` (or ``setup.cfg``) a new entrypoint with: +Define a new entrypoint in your ``setup.py`` (or ``setup.cfg``) with: - group: ``xarray.backend`` -- name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``. +- name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine`` - object reference: the reference of the class that you have implemented. -You can declare the entrypoint in ``setup.cfg`` using the following syntax: - -.. code-block:: - - [options.entry_points] - xarray.backends = - engine_name = your_package.your_module:your_backendentrypoint - -in ``setup.py``: +You can declare the entrypoint in ``setup.py`` using the following syntax: .. code-block:: @@ -442,6 +461,14 @@ in ``setup.py``: }, ) +in ``setup.cfg``: + +.. code-block:: + + [options.entry_points] + xarray.backends = + engine_name = your_package.your_module:your_backendentrypoint + See https://packaging.python.org/specifications/entry-points/#data-model for more information @@ -450,10 +477,11 @@ for more information How to support Lazy Loading +++++++++++++++++++++++++++ -If you want to make your backend effective with big datasets, then you should support -lazy loading. -Basically, you shall replace the :py:class:`numpy.ndarray` inside the variables with a -custom class that supports lazy loading indexing. See the example below: +If you want to make your backend effective with big datasets, then you should +support lazy loading. +Basically, you shall replace the :py:class:`numpy.ndarray` inside the +variables with a custom class that supports lazy loading indexing. +See the example below: .. code-block:: python @@ -463,28 +491,37 @@ custom class that supports lazy loading indexing. See the example below: Where: -- :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is class provided by Xarray that manages the lazy loading. -- ``YourBackendArray`` shall be implemented by the backend and it shall inherit from :py:class:`~xarray.backends.common.BackendArray`. +- :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class + provided by Xarray that manages the lazy loading. Note, if your backend + support only vectorized index, you must use + :py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` instead of + :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` (for more details + on indexing support see the following sections) +- ``YourBackendArray`` shall be implemented by the backend and shall inherit + from :py:class:`~xarray.backends.common.BackendArray`. BackendArray subclassing ^^^^^^^^^^^^^^^^^^^^^^^^ -BackendArray subclass shall implement the following method and attributes: +The BackendArray subclass shall implement the following method and attributes: -- the ``__getitem__`` method that takes in input an index and returns a `NumPy `__ array. +- the ``__getitem__`` method that takes in input an index and returns a + `NumPy `__ array - the ``shape`` attribute -- the ``dtype`` attribute +- the ``dtype`` attribute. -Xarray supports different type of `indexing `__, that can be -grouped in three type of indexes +Xarray supports different type of +`indexing `__, that can be +grouped in three types of indexes :py:class:`~xarray.core.indexing.BasicIndexer`, :py:class:`~xarray.core.indexing.OuterIndexer` and :py:class:`~xarray.core.indexing.VectorizedIndexer`. This implies that the implementation of the method ``__getitem__`` can be tricky. -In oder to simplify this task, Xarray provides an helper function, -:py:func:`~xarray.core.indexing.explicit_indexing_adapter`, that transforms all the input ``indexer`` -types (`basic`, `outer`, `vectorized`) in a tuple which is interpreted correctly by your backend. +In oder to simplify this task, Xarray provides a helper function, +:py:func:`~xarray.core.indexing.explicit_indexing_adapter`, that transforms +all the input ``indexer`` types (`basic`, `outer`, `vectorized`) in a tuple +which is interpreted correctly by your backend. This is an example ``BackendArray`` subclass implementation: @@ -508,22 +545,25 @@ This is an example ``BackendArray`` subclass implementation: ... return item -Note that ``BackendArray`` ``__getitem__`` must be thread safe to support multi-thread processing. +Note that ``BackendArray.__getitem__`` must be thread safe to support +multi-thread processing. -The :py:func:`~xarray.core.indexing.explicit_indexing_adapter` takes in input in addition to the -``key`` and the array ``shape``, the following parameters: +The :py:func:`~xarray.core.indexing.explicit_indexing_adapter` method takes in +input the ``key``, the array ``shape`` and the following parameters: -- ``indexing_support``: the type of index supported by ``raw_indexing_method``. -- ``raw_indexing_method``: a method that shall take in input a key in the form of a tuple and return an indexed :py:class:`numpy.ndarray`. +- ``indexing_support``: the type of index supported by ``raw_indexing_method`` +- ``raw_indexing_method``: a method that shall take in input a key in the form + of a tuple and return an indexed :py:class:`numpy.ndarray`. For more details see -:py:class:`~xarray.core.indexing.IndexingSupport` and :ref:`RST indexing` - -In order to support `Dask `__ distributed and :py:mod:`multiprocessing`, -``BackendArray`` subclass should be serializable either with -:ref:`io.pickle` or `cloudpickle `__. That implies -that all the reference to open files should be dropped. For opening files, we therefore -suggest to use the helper class provided by Xarray +:py:class:`~xarray.core.indexing.IndexingSupport` and :ref:`RST indexing`. + +In order to support `Dask `__ distributed and +:py:mod:`multiprocessing`, ``BackendArray`` subclass should be serializable +either with :ref:`io.pickle` or +`cloudpickle `__. +That implies that all the reference to open files should be dropped. For +opening files, we therefore suggest to use the helper class provided by Xarray :py:class:`~xarray.backends.CachingFileManager`. .. _RST indexing: @@ -533,7 +573,7 @@ Indexing Examples **BASIC** In the ``BASIC`` indexing support, numbers and slices are supported. -In this the behaviour is the same as NumPy. +The behaviour is the same as `NumPy `__ . Example: @@ -555,32 +595,69 @@ Example: [ 6, 7], [10, 11]]) -- OUTER: supports number, slices and vectors -- OUTER_1VECTOR: supports number, slices and at most one vectors -- VECTORIZED: supports vectorized index +**OUTER** + +The ``OUTER`` indexing shall support number, slices and in addition it shall +support also lists of integers. The the outer indexing is equivalent to +combining multiple input list with ``itertools.product()``. + +Example: + +.. code-block:: python + + >>> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) + array([[ 0, 1, 2], + [ 4, 5, 6]]) + + # shall support integers + >>> backend_array._raw_indexing_method(1, 1) + 5 + + +**OUTER_1VECTOR** + +The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one +list. The behaviour shall be the same of ``OUTER`` indexing. + +**VECTORIZED** + +``VECTORIZED`` shall take in input lists of integers. This +indexing is equivalent to combining multiple input list with zip(). + +.. code-block:: python + + >>> backend_array._raw_indexing_method([0, 1, 2], [0, 1, 2]) + array([ 0, 5, 10]) + +Note, if your need to use this type of indexing support, you shall use +:py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` instead of +:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. + .. _RST dask: Backend preferred chunks ^^^^^^^^^^^^^^^^^^^^^^^^ -The backend is not directly involved in `Dask `__ chunking, since it is internally -managed by Xarray. However, the backend can define the preferred chunk size -inside the variable’s encoding ``var.encoding["preferred_chunks"]``. -The ``preferred_chunks`` may be useful to improve performances with lazy loading. -``preferred_chunks`` shall be a dictionary specifying chunk size per dimension -like ``{“dim1”: 1000, “dim2”: 2000}`` or +The backend is not directly involved in `Dask `__ +chunking, since it is internally managed by Xarray. However, the backend can +define the preferred chunk size inside the variable’s encoding +``var.encoding["preferred_chunks"]``. The ``preferred_chunks`` may be useful +to improve performances with lazy loading. ``preferred_chunks`` shall be a +dictionary specifying chunk size per dimension like +``{“dim1”: 1000, “dim2”: 2000}`` or ``{“dim1”: [1000, 100], “dim2”: [2000, 2000, 2000]]}``. The ``preferred_chunks`` is used by Xarray to define the chunk size in some special cases: -- If ``chunks`` along a dimension is ``None`` or not defined -- If ``chunks`` is ``"auto"`` +- if ``chunks`` along a dimension is ``None`` or not defined +- if ``chunks`` is ``"auto"``. In the first case Xarray uses the chunks size specified in ``preferred_chunks``. In the second case Xarray accommodates ideal chunk sizes, preserving if possible the "preferred_chunks". The ideal chunk size is computed using -:py:func:`dask.core.normalize_chunks`, setting ``previous_chunks = preferred_chunks``. +:py:func:`dask.core.normalize_chunks`, setting +``previous_chunks = preferred_chunks``. From 138133635aa585fba6fcd766a33d63ed6bbf161e Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 13:31:54 +0100 Subject: [PATCH 32/52] style --- doc/api-hidden.rst | 1 - doc/internals.rst | 57 ++++++++++++++++++++++------------------------ 2 files changed, 27 insertions(+), 31 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index e4f69fc1278..b76bbd508f4 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -831,4 +831,3 @@ coding.times.CFTimedeltaCoder coding.times.CFDatetimeCoder - diff --git a/doc/internals.rst b/doc/internals.rst index d34774071d4..6378371d629 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -262,12 +262,11 @@ This is what a ``BackendEntrypoint`` subclass should look like: .. code-block:: python class YourBackendEntrypoint(BackendEntrypoint): - def open_dataset( self, filename_or_obj, decode_coords=None, - ... + # other backend specific keyword arguments ): ... return ds @@ -400,9 +399,7 @@ In the following an example on how to use the coders ``decode`` method: .. ipython:: python var = xr.Variable( - dims=("x",), - data=np.arange(10.0), - attrs={"scale_factor": 10, "add_offset": 2} + dims=("x",), data=np.arange(10.0), attrs={"scale_factor": 10, "add_offset": 2} ) var @@ -455,15 +452,16 @@ You can declare the entrypoint in ``setup.py`` using the following syntax: .. code-block:: setuptools.setup( - ... - entry_points={ - 'xarray.backends': ['engine_name=your_package.your_module:your_backendentrypoint'], + entry_points={ + "xarray.backends": [ + "engine_name=your_package.your_module:your_backendentrypoint" + ], }, ) in ``setup.cfg``: -.. code-block:: +.. code-block:: cfg [options.entry_points] xarray.backends = @@ -528,7 +526,6 @@ This is an example ``BackendArray`` subclass implementation: .. code-block:: python class YourBackendArray(BackendArray): - def __init__(self, ...): self.shape = ... self.dtype = ... @@ -536,7 +533,10 @@ This is an example ``BackendArray`` subclass implementation: def __getitem__(self, key): return indexing.explicit_indexing_adapter( - key, self.shape, indexing.IndexingSupport.BASIC, self._raw_indexing_method + key, + self.shape, + indexing.IndexingSupport.BASIC, + self._raw_indexing_method, ) def _raw_indexing_method(self, key): @@ -579,21 +579,17 @@ Example: .. code-block:: python - # () shall return the full array - >>> backend_array._raw_indexing_method(()) - array([[ 0, 1, 2, 3], - [ 4, 5, 6, 7], - [ 8, 9, 10, 11]]) + # () shall return the full array + >> backend_array._raw_indexing_method(()) + array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) # shall support integers - >>> backend_array._raw_indexing_method(1, 1) + >> backend_array._raw_indexing_method(1, 1) 5 - # shall support slices - >>> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) - array([[ 2, 3], - [ 6, 7], - [10, 11]]) + # shall support slices + >> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) + array([[2, 3], [6, 7], [10, 11]]) **OUTER** @@ -605,12 +601,11 @@ Example: .. code-block:: python - >>> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) - array([[ 0, 1, 2], - [ 4, 5, 6]]) + >> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) + array([[0, 1, 2], [4, 5, 6]]) # shall support integers - >>> backend_array._raw_indexing_method(1, 1) + >> backend_array._raw_indexing_method(1, 1) 5 @@ -621,13 +616,16 @@ list. The behaviour shall be the same of ``OUTER`` indexing. **VECTORIZED** -``VECTORIZED`` shall take in input lists of integers. This -indexing is equivalent to combining multiple input list with zip(). +``VECTORIZED`` shall support integers, slices and lists of integers. +The indexing with lists in this case is equivalent to combining multiple +input list with zip(). + +Example: .. code-block:: python >>> backend_array._raw_indexing_method([0, 1, 2], [0, 1, 2]) - array([ 0, 5, 10]) + array([0, 5, 10]) Note, if your need to use this type of indexing support, you shall use :py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` instead of @@ -660,4 +658,3 @@ In the second case Xarray accommodates ideal chunk sizes, preserving if possible the "preferred_chunks". The ideal chunk size is computed using :py:func:`dask.core.normalize_chunks`, setting ``previous_chunks = preferred_chunks``. - From 0ef410a25bf676963630d5a5847be0c85901a932 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 14:40:12 +0100 Subject: [PATCH 33/52] fix --- doc/internals.rst | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 6378371d629..07391259cd2 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -526,7 +526,10 @@ This is an example ``BackendArray`` subclass implementation: .. code-block:: python class YourBackendArray(BackendArray): - def __init__(self, ...): + def __init__( + self, + # other backend specific keyword arguments + ): self.shape = ... self.dtype = ... self.lock = ... @@ -580,7 +583,7 @@ Example: .. code-block:: python # () shall return the full array - >> backend_array._raw_indexing_method(()) + >>> backend_array._raw_indexing_method(()) array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) # shall support integers @@ -588,7 +591,7 @@ Example: 5 # shall support slices - >> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) + >>> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) array([[2, 3], [6, 7], [10, 11]]) **OUTER** @@ -601,11 +604,11 @@ Example: .. code-block:: python - >> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) + >>> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) array([[0, 1, 2], [4, 5, 6]]) # shall support integers - >> backend_array._raw_indexing_method(1, 1) + >>> backend_array._raw_indexing_method(1, 1) 5 From dc3613844f90b2fb2b61fcb207f22286111412e5 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 15:24:26 +0100 Subject: [PATCH 34/52] modify backend indexing doc --- doc/internals.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 07391259cd2..a3c9e4347e5 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -387,12 +387,12 @@ The standard coders implemented in Xarray are: - :py:class:`xarray.coding.times.CFTimedeltaCoder()` - :py:class:`xarray.coding.times.CFDatetimeCoder()` -Xarray coders have all the same interface. They have two methods: ``decode`` -and ``encode``. The method ``decode`` applies a transformation from on-disk -format to Xarray format using the :py:class:`~xarray.Variable` attributes. -The attributes used for the encoding are dropped and saved inside the -``Variable.encoding``. The method ``encode`` perform the -inverse transformation using the ``encoding`` instead of the attributes. +Xarray coders all have the same interface. They have two methods: ``decode`` +and ``encode``. The method ``decode`` takes a ``Variable`` in on-disk +format and returns a ``Variable`` in Xarray format. Variable +attributes no more applicable after the decoding, are dropped and stored in the +``Variable.encoding`` to make them available to the ``encode`` method, which +performs the inverse transformation. In the following an example on how to use the coders ``decode`` method: @@ -490,11 +490,11 @@ See the example below: Where: - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class - provided by Xarray that manages the lazy loading. Note, if your backend - support only vectorized index, you must use - :py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` instead of - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` (for more details - on indexing support see the following sections) + provided by Xarray that manages the lazy loading. Note, that + :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` supports `basic` + and `outer` indexing. While `outer` is supported by + :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. For more details + see the following sections. - ``YourBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. From 23e24230b46541f07ee88f98346cc89fcb29c75c Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 15:28:44 +0100 Subject: [PATCH 35/52] fix --- doc/internals.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index a3c9e4347e5..d15e0cf26dc 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -491,9 +491,8 @@ Where: - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class provided by Xarray that manages the lazy loading. Note, that - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` supports `basic` - and `outer` indexing. While `outer` is supported by - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. For more details + it supports `basic` and `outer` indexing. While `vectorized` indexing is supported by + :py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray`. For more details see the following sections. - ``YourBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. From 99ca49eb0a4b7c85fc6df4e475c154efbcfb48f6 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 15:34:30 +0100 Subject: [PATCH 36/52] removed LazilyVectorizedIndexedArray from backend doc --- doc/internals.rst | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index d15e0cf26dc..1cd12d8a1bf 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -490,10 +490,7 @@ See the example below: Where: - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class - provided by Xarray that manages the lazy loading. Note, that - it supports `basic` and `outer` indexing. While `vectorized` indexing is supported by - :py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray`. For more details - see the following sections. + provided by Xarray that manages the lazy loading. - ``YourBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. @@ -629,10 +626,6 @@ Example: >>> backend_array._raw_indexing_method([0, 1, 2], [0, 1, 2]) array([0, 5, 10]) -Note, if your need to use this type of indexing support, you shall use -:py:class:`~xarray.core.indexing.LazilyVectorizedIndexedArray` instead of -:py:class:`~xarray.core.indexing.LazilyOuterIndexedArray`. - .. _RST dask: From b1eb0771f53fcb14cd77d0d71df2ac7547bfcfab Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 15:41:39 +0100 Subject: [PATCH 37/52] small fix in doc --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 1cd12d8a1bf..a358cf2c457 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -611,7 +611,7 @@ Example: **OUTER_1VECTOR** The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one -list. The behaviour shall be the same of ``OUTER`` indexing. +list. The behaviour with the list shall be the same of ``OUTER`` indexing. **VECTORIZED** From 39bf16b322003460abf6f3da6249c97b1e7b4c73 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 11 Feb 2021 16:44:17 +0100 Subject: [PATCH 38/52] small fixes in backend doc --- doc/internals.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index a358cf2c457..2ed352b3138 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -594,9 +594,7 @@ Example: The ``OUTER`` indexing shall support number, slices and in addition it shall support also lists of integers. The the outer indexing is equivalent to -combining multiple input list with ``itertools.product()``. - -Example: +combining multiple input list with ``itertools.product()``: .. code-block:: python @@ -617,9 +615,8 @@ list. The behaviour with the list shall be the same of ``OUTER`` indexing. ``VECTORIZED`` shall support integers, slices and lists of integers. The indexing with lists in this case is equivalent to combining multiple -input list with zip(). - -Example: +input list with ``zip()``. This is the same semantic used by +`NumPy `__ for lists: .. code-block:: python From 121c06038283ec678fbcf7e36fb99e0ab824c8f6 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Fri, 12 Feb 2021 07:55:58 +0100 Subject: [PATCH 39/52] removed exmple vectorized indexing --- doc/internals.rst | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 2ed352b3138..e1850fc76c6 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -611,18 +611,6 @@ combining multiple input list with ``itertools.product()``: The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one list. The behaviour with the list shall be the same of ``OUTER`` indexing. -**VECTORIZED** - -``VECTORIZED`` shall support integers, slices and lists of integers. -The indexing with lists in this case is equivalent to combining multiple -input list with ``zip()``. This is the same semantic used by -`NumPy `__ for lists: - -.. code-block:: python - - >>> backend_array._raw_indexing_method([0, 1, 2], [0, 1, 2]) - array([0, 5, 10]) - .. _RST dask: From e838d405f0ee0c8ed3155bdc2fcdd071c7704bd3 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 14:08:00 +0100 Subject: [PATCH 40/52] update documentation --- doc/api-hidden.rst | 1 + doc/internals.rst | 113 +++++++++++++++++++++++++------------- xarray/backends/common.py | 36 ++++++++++-- 3 files changed, 108 insertions(+), 42 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index b76bbd508f4..11cbadd04a0 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -811,6 +811,7 @@ backends.DummyFileManager.close backends.common.BackendArray + backends.common.BackendEntrypoint core.indexing.IndexingSupport core.indexing.explicit_indexing_adapter diff --git a/doc/internals.rst b/doc/internals.rst index e1850fc76c6..acd54709a69 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -239,11 +239,12 @@ How to add a new backend Adding a new backend for read support to Xarray does not require to integrate any code in Xarray; all you need to do is: -- Create a class that inherits from Xarray :py:class:`~xarray.backends.common.BackendEntrypoint` and implements the method ``open_dataset`` (:ref:`RST backend_entrypoint`) +- Create a class that inherits from Xarray :py:class:`~xarray.backends.common.BackendEntrypoint` + and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint` -- Declare this class as an external plugin in your ``setup.py`` (:ref:`RST backend_registration`) +- Declare this class as an external plugin in your ``setup.py``, see :ref:`RST backend_registration` -If you also want to support lazy loading and dask see :ref:`RST lazy_loading` and :ref:`RST dask`. +If you also want to support lazy loading and dask see :ref:`RST lazy_loading`. .. _RST backend_entrypoint: @@ -261,17 +262,18 @@ This is what a ``BackendEntrypoint`` subclass should look like: .. code-block:: python - class YourBackendEntrypoint(BackendEntrypoint): + class MyBackendEntrypoint(BackendEntrypoint): def open_dataset( self, filename_or_obj, - decode_coords=None, + *, + drop_variables=None, # other backend specific keyword arguments ): ... return ds - open_dataset_parameters = ... + open_dataset_parameters = ["filename_or_obj", "drop_variables"] def guess_can_open(self, filename_or_obj): try: @@ -287,10 +289,46 @@ This is what a ``BackendEntrypoint`` subclass should look like: open_dataset ^^^^^^^^^^^^ -**Inputs** +The backend ``open_dataset`` shall implement reading from file, the variables +decoding and it shall instantiate the output Xarray class :py:class:`~xarray.Dataset`. -The backend ``open_dataset`` method takes as input one argument -(``filename``), and one keyword argument (``drop_variables``): +The following is an example of the high level processing steps: + +.. code-block:: python + + def open_dataset( + self, + filename_or_obj, + *, + drop_variables=None, + decode_times=True, + decode_timedelta=True, + decode_coords=True, + my_backend_param=None, + ): + vars, attrs, coords = my_reader( + filename_or_obj, + drop_variables=drop_variables, + my_backend_param=my_backend_param, + ) + vars, attrs, coords = my_decode_variables( + vars, attrs, decode_times, decode_timedelta, decode_coords + ) # see also conventions.decode_cf_variables + + ds = xr.Dataset(vars, attrs=attrs) + ds = ds.set_coords(coords) + ds.set_close(store.close) + + return ds + + +The output :py:class:`~xarray.Dataset` shall implement the additional custom method +``close``, used by Xarray to ensure the related files are eventually closed. This +method shall be set by using :py:meth:`~xarray.Dataset.set_close`. + + +The input of ``open_dataset`` method are one argument +(``filename``) and one keyword argument (``drop_variables``): - ``filename``: can be a string containing a path or an instance of :py:class:`pathlib.Path`. @@ -319,11 +357,6 @@ arguments. All these keyword arguments can be passed to :py:func:`~xarray.open_dataset` grouped either via the ``backend_kwargs`` parameter or explicitly using the syntax ``**kwargs``. -**Output** - -The output of the backend `open_dataset` shall be an instance of -Xarray :py:class:`~xarray.Dataset` that implements the additional method -``close``, used by Xarray to ensure the related files are eventually closed. If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` shall contain values as a :py:class:`numpy.ndarray` @@ -483,15 +516,15 @@ See the example below: .. code-block:: python - backend_array = YourBackendArray() + backend_array = MyBackendArray() data = indexing.LazilyOuterIndexedArray(backend_array) - variable = Variable(..., data, ...) + var = xr.Variable(dims, data, attrs=attrs, encoding=encoding) Where: - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class provided by Xarray that manages the lazy loading. -- ``YourBackendArray`` shall be implemented by the backend and shall inherit +- ``MyBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. BackendArray subclassing @@ -521,14 +554,17 @@ This is an example ``BackendArray`` subclass implementation: .. code-block:: python - class YourBackendArray(BackendArray): + class MyBackendArray(BackendArray): def __init__( self, + shape, + dtype, + lock, # other backend specific keyword arguments ): - self.shape = ... - self.dtype = ... - self.lock = ... + self.shape = shape + self.dtype = lock + self.lock = dtype def __getitem__(self, key): return indexing.explicit_indexing_adapter( @@ -572,23 +608,23 @@ Indexing Examples **BASIC** In the ``BASIC`` indexing support, numbers and slices are supported. -The behaviour is the same as `NumPy `__ . Example: -.. code-block:: python +.. ipython:: + :verbatim: - # () shall return the full array - >>> backend_array._raw_indexing_method(()) - array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) + In [1]: # () shall return the full array + ...: backend_array._raw_indexing_method(()) + Out[1]: array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) - # shall support integers - >> backend_array._raw_indexing_method(1, 1) - 5 + In [2]: # shall support integers + ...: backend_array._raw_indexing_method(1, 1) + Out[2]: 5 - # shall support slices - >>> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) - array([[2, 3], [6, 7], [10, 11]]) + In [3]: # shall support slices + ...: backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) + Out[3]: array([[2, 3], [6, 7], [10, 11]]) **OUTER** @@ -596,14 +632,15 @@ The ``OUTER`` indexing shall support number, slices and in addition it shall support also lists of integers. The the outer indexing is equivalent to combining multiple input list with ``itertools.product()``: -.. code-block:: python +.. ipython:: + :verbatim: - >>> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) - array([[0, 1, 2], [4, 5, 6]]) + In [1]: backend_array._raw_indexing_method([0, 1], [0, 1, 2]) + Out[1]: array([[0, 1, 2], [4, 5, 6]]) # shall support integers - >>> backend_array._raw_indexing_method(1, 1) - 5 + In [2]: backend_array._raw_indexing_method(1, 1) + Out[2]: 5 **OUTER_1VECTOR** @@ -612,7 +649,7 @@ The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one list. The behaviour with the list shall be the same of ``OUTER`` indexing. -.. _RST dask: +.. _RST preferred_chunks: Backend preferred chunks ^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/xarray/backends/common.py b/xarray/backends/common.py index e2905d0866b..226ec245d7b 100644 --- a/xarray/backends/common.py +++ b/xarray/backends/common.py @@ -1,7 +1,7 @@ import logging import time import traceback -from typing import Dict, Tuple, Type, Union +from typing import Dict, Tuple, Type, Union, Any import numpy as np @@ -9,7 +9,6 @@ from ..core import indexing from ..core.pycompat import is_duck_dask_array from ..core.utils import FrozenDict, NdimSizeLenMixin - # Create a logger object, but don't add any handlers. Leave that to user code. logger = logging.getLogger(__name__) @@ -344,12 +343,41 @@ def encode(self, variables, attributes): class BackendEntrypoint: + """ + ``BackendEntrypoint`` is a class container and it is the main interface + for the backend plugins, see :ref:`RST backend_entrypoint`. + It shall implement: + + - ``open_dataset`` method: it shall implement reading from file, variables + decoding and it returns an instance of :py:class:`~xarray.Dataset`. + It shall take in input at least ``filename_or_obj`` argument and + ``drop_variables`` keyword argument. + For more details see :ref:`RST open_dataset`. + - ``guess_can_open`` method: it shall return ``True`` if the backend is able to open + ``filename_or_obj``, ``False`` otherwise. The implementation of this + method is not mandatory. + """ + open_dataset_parameters: Union[Tuple, None] = None + """list of ``open_dataset`` method parameters""" + + def open_dataset( + self, + filename_or_obj: str, + drop_variables: Tuple[str] = None, + **kwargs: Any, + ): + """ + Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`. + """ - def open_dataset(self): raise NotImplementedError - def guess_can_open(self, store_spec): + def guess_can_open(self, filename_or_obj): + """ + Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`. + """ + return False From 8633e08e0314e2360762d6b42c7feaf62f2ef0b5 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 14:08:00 +0100 Subject: [PATCH 41/52] update documentation --- doc/api-hidden.rst | 3 + doc/internals.rst | 121 ++++++++++++++++++++++++++------------ xarray/backends/common.py | 36 ++++++++++-- 3 files changed, 117 insertions(+), 43 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index b76bbd508f4..0fbb31b6f4c 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -811,6 +811,9 @@ backends.DummyFileManager.close backends.common.BackendArray + backends.common.BackendEntrypoint + backends.common.BackendEntrypoint.guess_can_open + backends.common.BackendEntrypoint.open_dataset core.indexing.IndexingSupport core.indexing.explicit_indexing_adapter diff --git a/doc/internals.rst b/doc/internals.rst index e1850fc76c6..7be1a734bc4 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -239,11 +239,15 @@ How to add a new backend Adding a new backend for read support to Xarray does not require to integrate any code in Xarray; all you need to do is: -- Create a class that inherits from Xarray :py:class:`~xarray.backends.common.BackendEntrypoint` and implements the method ``open_dataset`` (:ref:`RST backend_entrypoint`) +- Create a class that inherits from Xarray :py:class:`~xarray.backends.common.BackendEntrypoint` + and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint` -- Declare this class as an external plugin in your ``setup.py`` (:ref:`RST backend_registration`) +- Declare this class as an external plugin in your ``setup.py``, see :ref:`RST backend_registration` -If you also want to support lazy loading and dask see :ref:`RST lazy_loading` and :ref:`RST dask`. +If you also want to support lazy loading and dask see :ref:`RST lazy_loading`. + +Note that the new interface for backends is available from xarray +version >= 0.18 onwards. .. _RST backend_entrypoint: @@ -261,17 +265,18 @@ This is what a ``BackendEntrypoint`` subclass should look like: .. code-block:: python - class YourBackendEntrypoint(BackendEntrypoint): + class MyBackendEntrypoint(BackendEntrypoint): def open_dataset( self, filename_or_obj, - decode_coords=None, + *, + drop_variables=None, # other backend specific keyword arguments ): ... return ds - open_dataset_parameters = ... + open_dataset_parameters = ["filename_or_obj", "drop_variables"] def guess_can_open(self, filename_or_obj): try: @@ -287,10 +292,46 @@ This is what a ``BackendEntrypoint`` subclass should look like: open_dataset ^^^^^^^^^^^^ -**Inputs** +The backend ``open_dataset`` shall implement reading from file, the variables +decoding and it shall instantiate the output Xarray class :py:class:`~xarray.Dataset`. + +The following is an example of the high level processing steps: + +.. code-block:: python + + def open_dataset( + self, + filename_or_obj, + *, + drop_variables=None, + decode_times=True, + decode_timedelta=True, + decode_coords=True, + my_backend_param=None, + ): + vars, attrs, coords = my_reader( + filename_or_obj, + drop_variables=drop_variables, + my_backend_param=my_backend_param, + ) + vars, attrs, coords = my_decode_variables( + vars, attrs, decode_times, decode_timedelta, decode_coords + ) # see also conventions.decode_cf_variables + + ds = xr.Dataset(vars, attrs=attrs) + ds = ds.set_coords(coords) + ds.set_close(store.close) + + return ds -The backend ``open_dataset`` method takes as input one argument -(``filename``), and one keyword argument (``drop_variables``): + +The output :py:class:`~xarray.Dataset` shall implement the additional custom method +``close``, used by Xarray to ensure the related files are eventually closed. This +method shall be set by using :py:meth:`~xarray.Dataset.set_close`. + + +The input of ``open_dataset`` method are one argument +(``filename``) and one keyword argument (``drop_variables``): - ``filename``: can be a string containing a path or an instance of :py:class:`pathlib.Path`. @@ -319,11 +360,6 @@ arguments. All these keyword arguments can be passed to :py:func:`~xarray.open_dataset` grouped either via the ``backend_kwargs`` parameter or explicitly using the syntax ``**kwargs``. -**Output** - -The output of the backend `open_dataset` shall be an instance of -Xarray :py:class:`~xarray.Dataset` that implements the additional method -``close``, used by Xarray to ensure the related files are eventually closed. If you don't want to support the lazy loading, then the :py:class:`~xarray.Dataset` shall contain values as a :py:class:`numpy.ndarray` @@ -359,7 +395,7 @@ guess_can_open file automatically in case the engine is not specified explicitly. If you are not interested in supporting this feature, you can skip this step since :py:class:`~xarray.backends.common.BackendEntrypoint` already provides a -default :py:meth:`~xarray.backend.common.BackendEntrypoint.guess_engine` +default :py:meth:`~xarray.backend.common.BackendEntrypoint.guess_can_open` that always returns ``False``. Backend ``guess_can_open`` takes as input the ``filename_or_obj`` parameter of @@ -483,15 +519,15 @@ See the example below: .. code-block:: python - backend_array = YourBackendArray() + backend_array = MyBackendArray() data = indexing.LazilyOuterIndexedArray(backend_array) - variable = Variable(..., data, ...) + var = xr.Variable(dims, data, attrs=attrs, encoding=encoding) Where: - :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class provided by Xarray that manages the lazy loading. -- ``YourBackendArray`` shall be implemented by the backend and shall inherit +- ``MyBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. BackendArray subclassing @@ -521,14 +557,17 @@ This is an example ``BackendArray`` subclass implementation: .. code-block:: python - class YourBackendArray(BackendArray): + class MyBackendArray(BackendArray): def __init__( self, + shape, + dtype, + lock, # other backend specific keyword arguments ): - self.shape = ... - self.dtype = ... - self.lock = ... + self.shape = shape + self.dtype = lock + self.lock = dtype def __getitem__(self, key): return indexing.explicit_indexing_adapter( @@ -572,23 +611,23 @@ Indexing Examples **BASIC** In the ``BASIC`` indexing support, numbers and slices are supported. -The behaviour is the same as `NumPy `__ . Example: -.. code-block:: python +.. ipython:: + :verbatim: - # () shall return the full array - >>> backend_array._raw_indexing_method(()) - array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) + In [1]: # () shall return the full array + ...: backend_array._raw_indexing_method(()) + Out[1]: array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) - # shall support integers - >> backend_array._raw_indexing_method(1, 1) - 5 + In [2]: # shall support integers + ...: backend_array._raw_indexing_method(1, 1) + Out[2]: 5 - # shall support slices - >>> backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) - array([[2, 3], [6, 7], [10, 11]]) + In [3]: # shall support slices + ...: backend_array._raw_indexing_method(slice(0, 3), slice(2, 4)) + Out[3]: array([[2, 3], [6, 7], [10, 11]]) **OUTER** @@ -596,14 +635,15 @@ The ``OUTER`` indexing shall support number, slices and in addition it shall support also lists of integers. The the outer indexing is equivalent to combining multiple input list with ``itertools.product()``: -.. code-block:: python +.. ipython:: + :verbatim: - >>> backend_array._raw_indexing_method([0, 1], [0, 1, 2]) - array([[0, 1, 2], [4, 5, 6]]) + In [1]: backend_array._raw_indexing_method([0, 1], [0, 1, 2]) + Out[1]: array([[0, 1, 2], [4, 5, 6]]) # shall support integers - >>> backend_array._raw_indexing_method(1, 1) - 5 + In [2]: backend_array._raw_indexing_method(1, 1) + Out[2]: 5 **OUTER_1VECTOR** @@ -611,8 +651,11 @@ combining multiple input list with ``itertools.product()``: The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one list. The behaviour with the list shall be the same of ``OUTER`` indexing. +If you support more complex indexing as `explicit indexing` or +`numpy indexing`, you can have a look to the implemetation of Zarr backend and Scipy backend, +currently available in :py:mod:`~xarray.backends` module. -.. _RST dask: +.. _RST preferred_chunks: Backend preferred chunks ^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/xarray/backends/common.py b/xarray/backends/common.py index e2905d0866b..226ec245d7b 100644 --- a/xarray/backends/common.py +++ b/xarray/backends/common.py @@ -1,7 +1,7 @@ import logging import time import traceback -from typing import Dict, Tuple, Type, Union +from typing import Dict, Tuple, Type, Union, Any import numpy as np @@ -9,7 +9,6 @@ from ..core import indexing from ..core.pycompat import is_duck_dask_array from ..core.utils import FrozenDict, NdimSizeLenMixin - # Create a logger object, but don't add any handlers. Leave that to user code. logger = logging.getLogger(__name__) @@ -344,12 +343,41 @@ def encode(self, variables, attributes): class BackendEntrypoint: + """ + ``BackendEntrypoint`` is a class container and it is the main interface + for the backend plugins, see :ref:`RST backend_entrypoint`. + It shall implement: + + - ``open_dataset`` method: it shall implement reading from file, variables + decoding and it returns an instance of :py:class:`~xarray.Dataset`. + It shall take in input at least ``filename_or_obj`` argument and + ``drop_variables`` keyword argument. + For more details see :ref:`RST open_dataset`. + - ``guess_can_open`` method: it shall return ``True`` if the backend is able to open + ``filename_or_obj``, ``False`` otherwise. The implementation of this + method is not mandatory. + """ + open_dataset_parameters: Union[Tuple, None] = None + """list of ``open_dataset`` method parameters""" + + def open_dataset( + self, + filename_or_obj: str, + drop_variables: Tuple[str] = None, + **kwargs: Any, + ): + """ + Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`. + """ - def open_dataset(self): raise NotImplementedError - def guess_can_open(self, store_spec): + def guess_can_open(self, filename_or_obj): + """ + Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`. + """ + return False From e3eb56d03c6f41f8c70d23cf38422359d3c31370 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 15:06:37 +0100 Subject: [PATCH 42/52] isort --- xarray/backends/common.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/backends/common.py b/xarray/backends/common.py index d22ea136e55..aa902602278 100644 --- a/xarray/backends/common.py +++ b/xarray/backends/common.py @@ -1,7 +1,7 @@ import logging import time import traceback -from typing import Dict, Tuple, Type, Union, Any +from typing import Any, Dict, Tuple, Type, Union import numpy as np From a45647828dfaf1e18871cdc82d65d41fcd63df83 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 16:44:32 +0100 Subject: [PATCH 43/52] rename store_spec in filename_or_obj in guess_can_open --- xarray/backends/cfgrib_.py | 4 ++-- xarray/backends/h5netcdf_.py | 6 +++--- xarray/backends/netCDF4_.py | 6 +++--- xarray/backends/pydap_.py | 4 ++-- xarray/backends/scipy_.py | 6 +++--- xarray/backends/store.py | 4 ++-- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/xarray/backends/cfgrib_.py b/xarray/backends/cfgrib_.py index 65c5bc2a02b..a8e26b49dde 100644 --- a/xarray/backends/cfgrib_.py +++ b/xarray/backends/cfgrib_.py @@ -87,9 +87,9 @@ def get_encoding(self): class CfgribfBackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): + def guess_can_open(self, filename_or_obj): try: - _, ext = os.path.splitext(store_spec) + _, ext = os.path.splitext(filename_or_obj) except TypeError: return False return ext in {".grib", ".grib2", ".grb", ".grb2"} diff --git a/xarray/backends/h5netcdf_.py b/xarray/backends/h5netcdf_.py index aa892c4f89c..2d6f662f822 100644 --- a/xarray/backends/h5netcdf_.py +++ b/xarray/backends/h5netcdf_.py @@ -329,14 +329,14 @@ def close(self, **kwargs): class H5netcdfBackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): + def guess_can_open(self, filename_or_obj): try: - return read_magic_number(store_spec).startswith(b"\211HDF\r\n\032\n") + return read_magic_number(filename_or_obj).startswith(b"\211HDF\r\n\032\n") except TypeError: pass try: - _, ext = os.path.splitext(store_spec) + _, ext = os.path.splitext(filename_or_obj) except TypeError: return False diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py index e3d87aaf83f..cad56e70571 100644 --- a/xarray/backends/netCDF4_.py +++ b/xarray/backends/netCDF4_.py @@ -513,11 +513,11 @@ def close(self, **kwargs): class NetCDF4BackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): - if isinstance(store_spec, str) and is_remote_uri(store_spec): + def guess_can_open(self, filename_or_obj): + if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj): return True try: - _, ext = os.path.splitext(store_spec) + _, ext = os.path.splitext(filename_or_obj) except TypeError: return False return ext in {".nc", ".nc4", ".cdf"} diff --git a/xarray/backends/pydap_.py b/xarray/backends/pydap_.py index 7f8622ca66e..462a1cc322d 100644 --- a/xarray/backends/pydap_.py +++ b/xarray/backends/pydap_.py @@ -108,8 +108,8 @@ def get_dimensions(self): class PydapBackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): - return isinstance(store_spec, str) and is_remote_uri(store_spec) + def guess_can_open(self, filename_or_obj): + return isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj) def open_dataset( self, diff --git a/xarray/backends/scipy_.py b/xarray/backends/scipy_.py index ddc157ed8e4..49215843397 100644 --- a/xarray/backends/scipy_.py +++ b/xarray/backends/scipy_.py @@ -233,14 +233,14 @@ def close(self): class ScipyBackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): + def guess_can_open(self, filename_or_obj): try: - return read_magic_number(store_spec).startswith(b"CDF") + return read_magic_number(filename_or_obj).startswith(b"CDF") except TypeError: pass try: - _, ext = os.path.splitext(store_spec) + _, ext = os.path.splitext(filename_or_obj) except TypeError: return False return ext in {".nc", ".nc4", ".cdf", ".gz"} diff --git a/xarray/backends/store.py b/xarray/backends/store.py index d57b3ab9df8..860a0254b64 100644 --- a/xarray/backends/store.py +++ b/xarray/backends/store.py @@ -4,8 +4,8 @@ class StoreBackendEntrypoint(BackendEntrypoint): - def guess_can_open(self, store_spec): - return isinstance(store_spec, AbstractDataStore) + def guess_can_open(self, filename_or_obj): + return isinstance(filename_or_obj, AbstractDataStore) def open_dataset( self, From abf60e06e120a1ba89b01c2a693756f0ce3757d9 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 18:00:10 +0100 Subject: [PATCH 44/52] small update in backend documentation --- doc/internals.rst | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 7be1a734bc4..a4dfbb48566 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -461,16 +461,13 @@ it is not possible to reuse the Xarray time decoder, and implementing a new one is mandatory. Decoders can be activated or deactivated using the boolean keywords of -:py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, +Xarray :py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``, ``decode_times``, ``decode_timedelta``, ``use_cftime``, ``concat_characters``, ``decode_coords``. - Such keywords are passed to the backend only if the User sets a value different from ``None``. Note that the backend does not necessarily have to implement all the decoders, but it shall declare in its ``open_dataset`` -interface only the boolean keywords related to the supported decoders. The -backend shall implement the deactivation and activation of the supported -decoders. +interface only the boolean keywords related to the supported decoders. .. _RST backend_registration: From e72ce9b3d76a6fb59acc5c909a96b3554bdb8629 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 25 Feb 2021 19:47:40 +0100 Subject: [PATCH 45/52] small update in backend documentation --- doc/internals.rst | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index a4dfbb48566..c8a5f21413e 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -246,7 +246,7 @@ to integrate any code in Xarray; all you need to do is: If you also want to support lazy loading and dask see :ref:`RST lazy_loading`. -Note that the new interface for backends is available from xarray +Note that the new interface for backends is available from Xarray version >= 0.18 onwards. .. _RST backend_entrypoint: @@ -338,8 +338,8 @@ The input of ``open_dataset`` method are one argument - ``drop_variables``: can be `None` or an iterable containing the variable names to be dropped when reading the data. -If it makes sense for your backend, your ``open_dataset`` method should -implement in its interface the following boolean keyword arguments, called +If it makes sense for your backend, your ``open_dataset`` method +should implement in its interface the following boolean keyword arguments, called **decoders**, which default to ``None``: - ``mask_and_scale`` @@ -349,10 +349,12 @@ implement in its interface the following boolean keyword arguments, called - ``concat_characters`` - ``decode_coords`` +Note: all the supported decoders shall be declared explicitly +in backend ``open_dataset`` signature. + These keyword arguments are explicitly defined in Xarray :py:func:`~xarray.open_dataset` signature. Xarray will pass them to the backend only if the User explicitly sets a value different from ``None``. - For more details on decoders see :ref:`RST decoders`. Your backend can also take as input a set of backend-specific keyword @@ -375,14 +377,10 @@ It is not a mandatory parameter, and if the backend does not provide it explicitly, Xarray creates a list of them automatically by inspecting the backend signature. -Xarray uses ``open_dataset_parameters`` only when it needs to select -the **decoders** supported by the backend. - If ``open_dataset_parameters`` is not defined, but ``**kwargs`` and ``*args`` are in the backend ``open_dataset`` signature, Xarray raises an error. On the other hand, if the backend provides the ``open_dataset_parameters``, then ``**kwargs`` and ``*args`` can be used in the signature. - However, this practice is discouraged unless there is a good reasons for using ``**kwargs`` or ``*args``. From 7108f808e22acf109f1406b4a70d715600a24f95 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Wed, 3 Mar 2021 21:56:30 +0100 Subject: [PATCH 46/52] Update doc/internals.rst Co-authored-by: Stephan Hoyer --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index c8a5f21413e..6ea15d13be3 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -564,7 +564,7 @@ This is an example ``BackendArray`` subclass implementation: self.dtype = lock self.lock = dtype - def __getitem__(self, key): + def __getitem__(self, key: xarray.core.indexing.ExplicitIndexer) -> np.typing.ArrayLike: return indexing.explicit_indexing_adapter( key, self.shape, From e8499cd882cbc1ed6a96968bc900ed2c994d1b49 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Thu, 4 Mar 2021 12:21:17 +0100 Subject: [PATCH 47/52] Update doc/internals.rst Co-authored-by: Stephan Hoyer --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 6ea15d13be3..702a4143180 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -572,7 +572,7 @@ This is an example ``BackendArray`` subclass implementation: self._raw_indexing_method, ) - def _raw_indexing_method(self, key): + def _raw_indexing_method(self, key: tuple) -> np.typing.ArrayLike: # thread safe method that access to data on disk with self.lock: ... From 0955c16bc6ec2104a538e88d6816572383c9bca4 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 4 Mar 2021 12:40:31 +0100 Subject: [PATCH 48/52] fix backend documentation --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 702a4143180..12ddd0a90ac 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -643,7 +643,7 @@ combining multiple input list with ``itertools.product()``: **OUTER_1VECTOR** -The ``OUTER_1VECTOR`` indexing shall supports number, slices and at least one +The ``OUTER_1VECTOR`` indexing shall supports number, slices and at most one list. The behaviour with the list shall be the same of ``OUTER`` indexing. If you support more complex indexing as `explicit indexing` or From 54c202c9c4e81740090d3edd07f5d844dc9c3611 Mon Sep 17 00:00:00 2001 From: Aureliana Barghini Date: Thu, 4 Mar 2021 14:39:33 +0100 Subject: [PATCH 49/52] replace LazilyOuterIndexedArray with LazilyIndexedArray --- doc/api-hidden.rst | 2 +- doc/internals.rst | 8 +++++--- xarray/backends/cfgrib_.py | 2 +- xarray/backends/h5netcdf_.py | 2 +- xarray/backends/netCDF4_.py | 2 +- xarray/backends/pseudonetcdf_.py | 2 +- xarray/backends/pydap_.py | 2 +- xarray/backends/pynio_.py | 2 +- xarray/backends/rasterio_.py | 4 +--- xarray/backends/zarr.py | 2 +- xarray/conventions.py | 2 +- xarray/core/indexing.py | 6 +++--- xarray/core/variable.py | 2 +- xarray/tests/test_dataset.py | 2 +- xarray/tests/test_indexing.py | 20 +++++++++----------- xarray/tests/test_variable.py | 20 +++++++++----------- 16 files changed, 38 insertions(+), 42 deletions(-) diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst index 0fbb31b6f4c..301fc53e0fa 100644 --- a/doc/api-hidden.rst +++ b/doc/api-hidden.rst @@ -820,7 +820,7 @@ core.indexing.BasicIndexer core.indexing.OuterIndexer core.indexing.VectorizedIndexer - core.indexing.LazilyOuterIndexedArray + core.indexing.LazilyIndexedArray core.indexing.LazilyVectorizedIndexedArray conventions.decode_cf_variables diff --git a/doc/internals.rst b/doc/internals.rst index 12ddd0a90ac..10ffa83451c 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -515,12 +515,12 @@ See the example below: .. code-block:: python backend_array = MyBackendArray() - data = indexing.LazilyOuterIndexedArray(backend_array) + data = indexing.LazilyIndexedArray(backend_array) var = xr.Variable(dims, data, attrs=attrs, encoding=encoding) Where: -- :py:class:`~xarray.core.indexing.LazilyOuterIndexedArray` is a class +- :py:class:`~xarray.core.indexing.LazilyIndexedArray` is a class provided by Xarray that manages the lazy loading. - ``MyBackendArray`` shall be implemented by the backend and shall inherit from :py:class:`~xarray.backends.common.BackendArray`. @@ -564,7 +564,9 @@ This is an example ``BackendArray`` subclass implementation: self.dtype = lock self.lock = dtype - def __getitem__(self, key: xarray.core.indexing.ExplicitIndexer) -> np.typing.ArrayLike: + def __getitem__( + self, key: xarray.core.indexing.ExplicitIndexer + ) -> np.typing.ArrayLike: return indexing.explicit_indexing_adapter( key, self.shape, diff --git a/xarray/backends/cfgrib_.py b/xarray/backends/cfgrib_.py index a8e26b49dde..7ebbc246f55 100644 --- a/xarray/backends/cfgrib_.py +++ b/xarray/backends/cfgrib_.py @@ -62,7 +62,7 @@ def open_store_variable(self, name, var): data = var.data else: wrapped_array = CfGribArrayWrapper(self, var.data) - data = indexing.LazilyOuterIndexedArray(wrapped_array) + data = indexing.LazilyIndexedArray(wrapped_array) encoding = self.ds.encoding.copy() encoding["original_shape"] = var.data.shape diff --git a/xarray/backends/h5netcdf_.py b/xarray/backends/h5netcdf_.py index 2d6f662f822..532c63089bb 100644 --- a/xarray/backends/h5netcdf_.py +++ b/xarray/backends/h5netcdf_.py @@ -182,7 +182,7 @@ def open_store_variable(self, name, var): import h5py dimensions = var.dimensions - data = indexing.LazilyOuterIndexedArray(H5NetCDFArrayWrapper(name, self)) + data = indexing.LazilyIndexedArray(H5NetCDFArrayWrapper(name, self)) attrs = _read_attributes(var) # netCDF4 specific encoding diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py index cad56e70571..1c82f1975a7 100644 --- a/xarray/backends/netCDF4_.py +++ b/xarray/backends/netCDF4_.py @@ -388,7 +388,7 @@ def ds(self): def open_store_variable(self, name, var): dimensions = var.dimensions - data = indexing.LazilyOuterIndexedArray(NetCDF4ArrayWrapper(name, self)) + data = indexing.LazilyIndexedArray(NetCDF4ArrayWrapper(name, self)) attributes = {k: var.getncattr(k) for k in var.ncattrs()} _ensure_fill_value_valid(data, attributes) # netCDF4 specific encoding; save _FillValue for later diff --git a/xarray/backends/pseudonetcdf_.py b/xarray/backends/pseudonetcdf_.py index 80485fce459..3faa42b1b12 100644 --- a/xarray/backends/pseudonetcdf_.py +++ b/xarray/backends/pseudonetcdf_.py @@ -74,7 +74,7 @@ def ds(self): return self._manager.acquire() def open_store_variable(self, name, var): - data = indexing.LazilyOuterIndexedArray(PncArrayWrapper(name, self)) + data = indexing.LazilyIndexedArray(PncArrayWrapper(name, self)) attrs = {k: getattr(var, k) for k in var.ncattrs()} return Variable(var.dimensions, data, attrs) diff --git a/xarray/backends/pydap_.py b/xarray/backends/pydap_.py index 462a1cc322d..69d93299381 100644 --- a/xarray/backends/pydap_.py +++ b/xarray/backends/pydap_.py @@ -92,7 +92,7 @@ def open(cls, url, session=None): return cls(ds) def open_store_variable(self, var): - data = indexing.LazilyOuterIndexedArray(PydapArrayWrapper(var)) + data = indexing.LazilyIndexedArray(PydapArrayWrapper(var)) return Variable(var.dimensions, data, _fix_attributes(var.attributes)) def get_variables(self): diff --git a/xarray/backends/pynio_.py b/xarray/backends/pynio_.py index 41c99efd076..dfc0efbd6da 100644 --- a/xarray/backends/pynio_.py +++ b/xarray/backends/pynio_.py @@ -74,7 +74,7 @@ def ds(self): return self._manager.acquire() def open_store_variable(self, name, var): - data = indexing.LazilyOuterIndexedArray(NioArrayWrapper(name, self)) + data = indexing.LazilyIndexedArray(NioArrayWrapper(name, self)) return Variable(var.dimensions, data, var.attributes) def get_variables(self): diff --git a/xarray/backends/rasterio_.py b/xarray/backends/rasterio_.py index d776b116ea8..51f0599e8e0 100644 --- a/xarray/backends/rasterio_.py +++ b/xarray/backends/rasterio_.py @@ -335,9 +335,7 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None, loc else: attrs[k] = v - data = indexing.LazilyOuterIndexedArray( - RasterioArrayWrapper(manager, lock, vrt_params) - ) + data = indexing.LazilyIndexedArray(RasterioArrayWrapper(manager, lock, vrt_params)) # this lets you write arrays loaded with rasterio data = indexing.CopyOnWriteArray(data) diff --git a/xarray/backends/zarr.py b/xarray/backends/zarr.py index 04fdeac6450..ca5c2a51fa4 100644 --- a/xarray/backends/zarr.py +++ b/xarray/backends/zarr.py @@ -326,7 +326,7 @@ def __init__( self._write_region = write_region def open_store_variable(self, name, zarr_array): - data = indexing.LazilyOuterIndexedArray(ZarrArrayWrapper(name, self)) + data = indexing.LazilyIndexedArray(ZarrArrayWrapper(name, self)) dimensions, attributes = _get_zarr_dims_and_attrs(zarr_array, DIMENSION_KEY) attributes = dict(attributes) encoding = { diff --git a/xarray/conventions.py b/xarray/conventions.py index 93e765e5622..7b467d3ee2e 100644 --- a/xarray/conventions.py +++ b/xarray/conventions.py @@ -354,7 +354,7 @@ def decode_cf_variable( data = BoolTypeArray(data) if not is_duck_dask_array(data): - data = indexing.LazilyOuterIndexedArray(data) + data = indexing.LazilyIndexedArray(data) return Variable(dimensions, data, attributes, encoding=encoding) diff --git a/xarray/core/indexing.py b/xarray/core/indexing.py index dff6d75d5b7..0c180fdc9f7 100644 --- a/xarray/core/indexing.py +++ b/xarray/core/indexing.py @@ -513,7 +513,7 @@ def __getitem__(self, key): return result -class LazilyOuterIndexedArray(ExplicitlyIndexedNDArrayMixin): +class LazilyIndexedArray(ExplicitlyIndexedNDArrayMixin): """Wrap an array to make basic and outer indexing lazy.""" __slots__ = ("array", "key") @@ -619,10 +619,10 @@ def _updated_key(self, new_key): return _combine_indexers(self.key, self.shape, new_key) def __getitem__(self, indexer): - # If the indexed array becomes a scalar, return LazilyOuterIndexedArray + # If the indexed array becomes a scalar, return LazilyIndexedArray if all(isinstance(ind, integer_types) for ind in indexer.tuple): key = BasicIndexer(tuple(k[indexer.tuple] for k in self.key.tuple)) - return LazilyOuterIndexedArray(self.array, key) + return LazilyIndexedArray(self.array, key) return type(self)(self.array, self._updated_key(indexer)) def transpose(self, order): diff --git a/xarray/core/variable.py b/xarray/core/variable.py index 45553eb9b1e..5081f1dbda1 100644 --- a/xarray/core/variable.py +++ b/xarray/core/variable.py @@ -169,7 +169,7 @@ def _maybe_wrap_data(data): Put pandas.Index and numpy.ndarray arguments in adapter objects to ensure they can be indexed properly. - NumpyArrayAdapter, PandasIndexAdapter and LazilyOuterIndexedArray should + NumpyArrayAdapter, PandasIndexAdapter and LazilyIndexedArray should all pass through unmodified. """ if isinstance(data, pd.Index): diff --git a/xarray/tests/test_dataset.py b/xarray/tests/test_dataset.py index db47faa8d2b..9bc7a1b8566 100644 --- a/xarray/tests/test_dataset.py +++ b/xarray/tests/test_dataset.py @@ -187,7 +187,7 @@ def get_variables(self): def lazy_inaccessible(k, v): if k in self._indexvars: return v - data = indexing.LazilyOuterIndexedArray(InaccessibleArray(v.values)) + data = indexing.LazilyIndexedArray(InaccessibleArray(v.values)) return Variable(v.dims, data, v.attrs) return {k: lazy_inaccessible(k, v) for k, v in self._variables.items()} diff --git a/xarray/tests/test_indexing.py b/xarray/tests/test_indexing.py index 4ef7536e1f2..10641ff54e9 100644 --- a/xarray/tests/test_indexing.py +++ b/xarray/tests/test_indexing.py @@ -224,7 +224,7 @@ def test_lazily_indexed_array(self): original = np.random.rand(10, 20, 30) x = indexing.NumpyIndexingAdapter(original) v = Variable(["i", "j", "k"], original) - lazy = indexing.LazilyOuterIndexedArray(x) + lazy = indexing.LazilyIndexedArray(x) v_lazy = Variable(["i", "j", "k"], lazy) arr = ReturnItem() # test orthogonally applied indexers @@ -244,9 +244,7 @@ def test_lazily_indexed_array(self): ]: assert expected.shape == actual.shape assert_array_equal(expected, actual) - assert isinstance( - actual._data, indexing.LazilyOuterIndexedArray - ) + assert isinstance(actual._data, indexing.LazilyIndexedArray) # make sure actual.key is appropriate type if all( @@ -282,18 +280,18 @@ def test_lazily_indexed_array(self): actual._data, ( indexing.LazilyVectorizedIndexedArray, - indexing.LazilyOuterIndexedArray, + indexing.LazilyIndexedArray, ), ) - assert isinstance(actual._data, indexing.LazilyOuterIndexedArray) + assert isinstance(actual._data, indexing.LazilyIndexedArray) assert isinstance(actual._data.array, indexing.NumpyIndexingAdapter) def test_vectorized_lazily_indexed_array(self): original = np.random.rand(10, 20, 30) x = indexing.NumpyIndexingAdapter(original) v_eager = Variable(["i", "j", "k"], x) - lazy = indexing.LazilyOuterIndexedArray(x) + lazy = indexing.LazilyIndexedArray(x) v_lazy = Variable(["i", "j", "k"], lazy) arr = ReturnItem() @@ -306,7 +304,7 @@ def check_indexing(v_eager, v_lazy, indexers): actual._data, ( indexing.LazilyVectorizedIndexedArray, - indexing.LazilyOuterIndexedArray, + indexing.LazilyIndexedArray, ), ) assert_array_equal(expected, actual) @@ -364,19 +362,19 @@ def test_index_scalar(self): class TestMemoryCachedArray: def test_wrapper(self): - original = indexing.LazilyOuterIndexedArray(np.arange(10)) + original = indexing.LazilyIndexedArray(np.arange(10)) wrapped = indexing.MemoryCachedArray(original) assert_array_equal(wrapped, np.arange(10)) assert isinstance(wrapped.array, indexing.NumpyIndexingAdapter) def test_sub_array(self): - original = indexing.LazilyOuterIndexedArray(np.arange(10)) + original = indexing.LazilyIndexedArray(np.arange(10)) wrapped = indexing.MemoryCachedArray(original) child = wrapped[B[:5]] assert isinstance(child, indexing.MemoryCachedArray) assert_array_equal(child, np.arange(5)) assert isinstance(child.array, indexing.NumpyIndexingAdapter) - assert isinstance(wrapped.array, indexing.LazilyOuterIndexedArray) + assert isinstance(wrapped.array, indexing.LazilyIndexedArray) def test_setitem(self): original = np.arange(10) diff --git a/xarray/tests/test_variable.py b/xarray/tests/test_variable.py index e1ae3e1f258..90dfaa9c121 100644 --- a/xarray/tests/test_variable.py +++ b/xarray/tests/test_variable.py @@ -15,7 +15,7 @@ BasicIndexer, CopyOnWriteArray, DaskIndexingAdapter, - LazilyOuterIndexedArray, + LazilyIndexedArray, MemoryCachedArray, NumpyIndexingAdapter, OuterIndexer, @@ -1095,9 +1095,9 @@ def test_repr(self): assert expected == repr(v) def test_repr_lazy_data(self): - v = Variable("x", LazilyOuterIndexedArray(np.arange(2e5))) + v = Variable("x", LazilyIndexedArray(np.arange(2e5))) assert "200000 values with dtype" in repr(v) - assert isinstance(v._data, LazilyOuterIndexedArray) + assert isinstance(v._data, LazilyIndexedArray) def test_detect_indexer_type(self): """ Tests indexer type was correctly detected. """ @@ -2169,7 +2169,7 @@ def test_coarsen_2d(self): class TestAsCompatibleData: def test_unchanged_types(self): - types = (np.asarray, PandasIndexAdapter, LazilyOuterIndexedArray) + types = (np.asarray, PandasIndexAdapter, LazilyIndexedArray) for t in types: for data in [ np.arange(3), @@ -2340,19 +2340,19 @@ def test_NumpyIndexingAdapter(self): dims=("x", "y"), data=NumpyIndexingAdapter(NumpyIndexingAdapter(self.d)) ) - def test_LazilyOuterIndexedArray(self): - v = Variable(dims=("x", "y"), data=LazilyOuterIndexedArray(self.d)) + def test_LazilyIndexedArray(self): + v = Variable(dims=("x", "y"), data=LazilyIndexedArray(self.d)) self.check_orthogonal_indexing(v) self.check_vectorized_indexing(v) # doubly wrapping v = Variable( dims=("x", "y"), - data=LazilyOuterIndexedArray(LazilyOuterIndexedArray(self.d)), + data=LazilyIndexedArray(LazilyIndexedArray(self.d)), ) self.check_orthogonal_indexing(v) # hierarchical wrapping v = Variable( - dims=("x", "y"), data=LazilyOuterIndexedArray(NumpyIndexingAdapter(self.d)) + dims=("x", "y"), data=LazilyIndexedArray(NumpyIndexingAdapter(self.d)) ) self.check_orthogonal_indexing(v) @@ -2361,9 +2361,7 @@ def test_CopyOnWriteArray(self): self.check_orthogonal_indexing(v) self.check_vectorized_indexing(v) # doubly wrapping - v = Variable( - dims=("x", "y"), data=CopyOnWriteArray(LazilyOuterIndexedArray(self.d)) - ) + v = Variable(dims=("x", "y"), data=CopyOnWriteArray(LazilyIndexedArray(self.d))) self.check_orthogonal_indexing(v) self.check_vectorized_indexing(v) From 3cc18d5a37ab84fef37c0bddc6bc7f7f589f347c Mon Sep 17 00:00:00 2001 From: Aureliana Barghini <35919497+aurghs@users.noreply.github.com> Date: Fri, 5 Mar 2021 22:23:53 +0100 Subject: [PATCH 50/52] Update doc/internals.rst Co-authored-by: Julia Dark --- doc/internals.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/internals.rst b/doc/internals.rst index 10ffa83451c..1e0212faedd 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -485,7 +485,7 @@ You can declare the entrypoint in ``setup.py`` using the following syntax: setuptools.setup( entry_points={ "xarray.backends": [ - "engine_name=your_package.your_module:your_backendentrypoint" + "engine_name=your_package.your_module:YourBackendEntryClass" ], }, ) @@ -496,7 +496,7 @@ in ``setup.cfg``: [options.entry_points] xarray.backends = - engine_name = your_package.your_module:your_backendentrypoint + engine_name = your_package.your_module:YourBackendEntryClass See https://packaging.python.org/specifications/entry-points/#data-model From 9faf5e6d182072cc8cf883c2eb6fc35aa9bbe1ec Mon Sep 17 00:00:00 2001 From: Alessandro Amici Date: Mon, 8 Mar 2021 19:22:51 +0100 Subject: [PATCH 51/52] Update doc/internals.rst Co-authored-by: Julia Dark --- doc/internals.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/doc/internals.rst b/doc/internals.rst index 1e0212faedd..899b5dee9ae 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -502,6 +502,14 @@ in ``setup.cfg``: See https://packaging.python.org/specifications/entry-points/#data-model for more information +If you are using [Poetry](https://python-poetry.org/) for your build system, you can accomplish the same thing using "plugins". In this case you would need to add the following to your ``pyproject.toml`` file: + +.. code-block:: toml + + [tool.poetry.plugins."xarray_backends"] + "engine_name" = "your_package.your_module:YourBackendEntryClass" + +See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins. .. _RST lazy_loading: How to support Lazy Loading From 06371dfaa0317d5f2b16e10e677fa5a8f483e535 Mon Sep 17 00:00:00 2001 From: Alessandro Amici Date: Mon, 8 Mar 2021 19:47:29 +0100 Subject: [PATCH 52/52] Fix broken doc merge --- doc/internals.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 899b5dee9ae..a461a12ec2e 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -508,8 +508,9 @@ If you are using [Poetry](https://python-poetry.org/) for your build system, you [tool.poetry.plugins."xarray_backends"] "engine_name" = "your_package.your_module:YourBackendEntryClass" - + See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins. + .. _RST lazy_loading: How to support Lazy Loading