Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

312 bug fix update geostationary readers to support multiple scan times #427

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
78e1148
Changes from Chris
jsolbrig Sep 1, 2023
b6eda2b
Update ABI reader to ingest multiple scan times
Sep 1, 2023
a5edb42
Use time_dims
jsolbrig Sep 1, 2023
56a0238
Merge branch '312-bug-fix-update-abi-netcdf-reader-to-read-in-multipl…
jsolbrig Sep 1, 2023
b08a4a4
1.12.0 release NRLMMD-GEOIPS/geoips#418
mindyls Dec 28, 2023
75648fc
Removing unused comments
mindyls Dec 28, 2023
17864aa
Update installation.rst
mindyls Jan 3, 2024
e93d23e
Update installation.rst
mindyls Jan 3, 2024
0010507
Update installation.rst
mindyls Jan 3, 2024
2839f4a
Add leading '/' to flake8 per-file-ignores
mindyls Jan 4, 2024
c4594b4
Merge branch 'v1.12.0-release' into 312-bug-fix-update-geostationary-…
Srikant-Kumar-Gampa Feb 1, 2024
fad08f3
multiple scan times support added
Srikant-Kumar-Gampa Feb 1, 2024
1151bd9
regular
srikanth-kumar Feb 5, 2024
ebbd04b
seviri procflow script updated with new test filenames; check beofre …
srikanth-kumar Feb 5, 2024
f3f5b07
multiple scan times support added for SEVIRI reader
srikanth-kumar Feb 5, 2024
8067341
Update black.yaml to use GEOIPS_ACTIVE_BRANCH
mindyls Feb 6, 2024
0336ec2
variables -> vars
mindyls Feb 6, 2024
d3d60ab
Use v4 checkout
mindyls Feb 6, 2024
8950f3a
Update black.yaml
mindyls Feb 6, 2024
5b24231
Update black.yaml
mindyls Feb 6, 2024
7434f45
Update black.yaml
mindyls Feb 6, 2024
45d85d9
Update flake8.yaml for GEOIPS_ACTIVE_BRANCH
mindyls Feb 6, 2024
7b064d0
Update black.yaml to checkout v3
mindyls Feb 6, 2024
2ecbc8c
Add .config to sparse-checkout
mindyls Feb 6, 2024
df8e9f3
Add .config to sparse checkout
mindyls Feb 6, 2024
ead077f
Turn on build docs html
mindyls Feb 6, 2024
01ba586
Use org vars for runner, org
mindyls Feb 6, 2024
6182ab3
Turn off build-html-docs
mindyls Feb 6, 2024
f34e487
Turn off pytest-short.yaml
mindyls Feb 6, 2024
a7ab6d7
Turn off test-interfaces.yaml
mindyls Feb 6, 2024
a6c85b3
Updating black formatting from 1.12.0 release (#430)
mindyls Feb 6, 2024
f991efb
Merge branch 'v1.12.0-release' into 312-bug-fix-update-geostationary-…
mindyls Feb 6, 2024
6e6520c
removed IPython module requirement
srikanth-kumar Feb 13, 2024
b146b67
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
jsolbrig Feb 21, 2024
8d0e179
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
jsolbrig Apr 3, 2024
50c0565
merged this branch with main as it was very out of date
evrose54 Jun 10, 2024
c497c9e
adjusted readers to return metadata from all files, not just a single…
evrose54 Jun 10, 2024
9216b34
made some changes to get the source_name attr out of a dict which inc…
evrose54 Jul 9, 2024
68e578f
merged this branch with main to update it with most recent changes
evrose54 Jul 9, 2024
f4baee4
refactored structure of metadata for multiple source files
evrose54 Jul 10, 2024
dc4faa9
refactored source_name retrieval into function and fixed config based…
evrose54 Jul 10, 2024
8c76484
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
evrose54 Jul 17, 2024
7ce0ecb
made mindy's requested updates to the structure of the metadata
evrose54 Jul 17, 2024
d4eeb28
updated xobj metadata to adhere to xarray_standards.rst
evrose54 Jul 17, 2024
c2c5058
added release notes to this branch
evrose54 Jul 17, 2024
ec4217d
removed old release note
evrose54 Jul 17, 2024
5f39caf
updated geostationary readers to perform concatenate_metadata no matt…
evrose54 Jul 17, 2024
893d413
small bug fix to readers and updated output of seviri test script
evrose54 Jul 17, 2024
7af9368
updated changelog in release notes for this branch
evrose54 Jul 17, 2024
ad1e046
fixed final bugs related to metadata and a tiny change of 2 pixels
evrose54 Jul 17, 2024
f579831
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
evrose54 Jul 18, 2024
82778f2
Reverted change to abi output image. Matches what main has now
evrose54 Jul 18, 2024
c1aa3f3
updated release notes files changed
evrose54 Jul 18, 2024
ffe10bf
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
evrose54 Jul 18, 2024
47984ff
reverted release notes and moved changes to brassy yaml file for this…
evrose54 Jul 18, 2024
2ab63c4
updated yamls to reflect most recent version of brassy
evrose54 Jul 18, 2024
0fedb32
merged with main and fixed conflicts
evrose54 Aug 27, 2024
91cfd51
updated metadata output for readers which have multiple scan times as…
evrose54 Aug 27, 2024
235ccba
added abi_netcdf too
evrose54 Aug 27, 2024
76b3c17
Merge branch 'dev-staging' into 312-bug-fix-update-geostationary-read…
jsolbrig Sep 19, 2024
97d3473
refactored abi and ahi readers, made ami support multi scan times.
evrose54 Oct 8, 2024
913810b
updated ewsg reader to support multiple scan times
evrose54 Oct 9, 2024
058d83b
updated a docstring in readers.py
evrose54 Oct 9, 2024
5d34990
updated seviri_hrit.py to use abstracted functionality placed on read…
evrose54 Oct 9, 2024
9771a76
added release notes for this PR
evrose54 Oct 10, 2024
00413bf
reduced code duplication by adding a new function to the readers inte…
evrose54 Oct 10, 2024
79d0fe9
removed old logging line used for testing
evrose54 Oct 10, 2024
764a629
addressed incorrect/missing source file metadata
evrose54 Oct 10, 2024
c6203c0
updated a comment
evrose54 Oct 10, 2024
6e616f2
added a new korea sector for testing ami updates
evrose54 Oct 11, 2024
d0091ca
added new sector to release notes
evrose54 Oct 11, 2024
044a258
updated ami reader to handle different observation areas, such as LA
evrose54 Oct 15, 2024
1ab00b7
Merge branch 'dev-staging' into 312-bug-fix-update-geostationary-read…
evrose54 Oct 16, 2024
0ec5df3
updated code to support navigation for octopy code
evrose54 Oct 22, 2024
da3dfb3
updated release notes for new changes
evrose54 Oct 22, 2024
9556b08
Merge branch '312-bug-fix-update-geostationary-readers-to-support-mul…
evrose54 Oct 22, 2024
1d9302d
Merge branch '312-bug-fix-update-geostationary-readers-to-support-mul…
evrose54 Oct 22, 2024
b7a6fa8
Merge branch 'dev-staging' into 312-bug-fix-update-geostationary-read…
jsolbrig Oct 22, 2024
7caa057
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
jsolbrig Oct 22, 2024
90700c5
Merge branch '312-bug-fix-update-geostationary-readers-to-support-mul…
jsolbrig Oct 22, 2024
7c437f5
Test commit for CI
biosafetylvl5 Oct 22, 2024
bf68e98
AMI and EWSG Refactor: Multi-scan time functionality (#796)
jsolbrig Oct 22, 2024
18afd13
Merge branch 'main' into 312-bug-fix-update-geostationary-readers-to-…
evrose54 Oct 23, 2024
ad549e2
updated readers.py to collect both start and end datetimes for multi-…
evrose54 Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
bug fix:
- description: |
*From GEOIPS#312: Bug Fix: Update geostationary readers to support multiple scan times*

Geostationary readers previously could only handle one scan time per run through.
This is a limiting factor to these readers and one or more scientists at CIRA needed
these readers to be able to handle multiple scan times per use. We've made the
corresponding updates to geostaionary readers that were requested could read
multiple scan times. If this PR goes well, it might be a good idea to support this
functionality on other readers in the future.
related-issue:
number: 312
repo_url: 'https://github.com/NRLMMD-GEOIPS/geoips/'
title: 'Update Geostationary readers to support multiple scan times'
files:
modified:
- geoips/interfaces/module_based/readers.py
- geoips/plugins/modules/readers/abi_netcdf.py
- geoips/plugins/modules/readers/ahi_hsd.py
- geoips/plugins/modules/readers/seviri_hrit.py
- tests/scripts/seviri.WV-Upper.unprojected_image.sh
- tests/outputs/seviri.WV-Upper.unprojected_image/20231211.080000.msg-2.seviri.WV-Upper.self_register.69p07.nesdisstar.10p0.png
46 changes: 46 additions & 0 deletions geoips/interfaces/module_based/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@

"""Readers interface module."""

from os.path import basename
from xarray import Dataset

from geoips.interfaces.base import BaseModuleInterface


Expand All @@ -19,5 +22,48 @@ class ReadersInterface(BaseModuleInterface):
"standard": ["metadata_only", "chans", "area_def", "self_register"]
}

def concatenate_metadata(self, all_metadata):
"""Merge together metadata sourced from a list of files into one dictionary.

Where the structure of the merged metadata is a nested dictionary of metadata.

Ie. (xarray_obj has no data and is merely just a container for metadata):
{"METADATA": xobj.source_file_attributes: {fname: xobj, ..., "fnamex": xobj}}

Parameters
----------
all_metadata: list of xarray.Datasets
- The incoming metadata from X number of files

Returns
-------
md: dict of xarray Datasets
- All metadata merged into a dictionary of xarray Datasets
"""
md = {"METADATA": Dataset()}

for md_idx in range(len(all_metadata)):
# Set required attributes of the top-level metadata when this loop is
# started
if md_idx == 0:
md["METADATA"].attrs = all_metadata[md_idx].attrs
md["METADATA"].attrs["source_file_names"] = []
md["METADATA"].attrs["source_file_attributes"] = {}
md["METADATA"].attrs["source_file_datetimes"] = []
md["METADATA"].attrs["end_datetime"] = all_metadata[-1].end_datetime

# Add to optional attributes of the top-level metadata for each xobj
# provided
fname = all_metadata[md_idx].attrs["source_file_names"][0]
md["METADATA"].attrs["source_file_names"].append(basename(fname))
md["METADATA"].attrs["source_file_attributes"][basename(fname)] = (
all_metadata[md_idx]
)
md["METADATA"].attrs["source_file_datetimes"].append(
all_metadata[md_idx].start_datetime
)

return md


readers = ReadersInterface()
98 changes: 98 additions & 0 deletions geoips/plugins/modules/readers/abi_netcdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

from scipy.ndimage import zoom

from geoips.interfaces import readers
mindyls marked this conversation as resolved.
Show resolved Hide resolved
from geoips.utils.context_managers import import_optional_dependencies
from geoips.plugins.modules.readers.utils.geostationary_geolocation import (
get_geolocation_cache_filename,
Expand Down Expand Up @@ -509,6 +510,100 @@ def call(fnames, metadata_only=False, chans=None, area_def=None, self_register=F
"""
Read ABI NetCDF data from a list of filenames.

Parameters
----------
fnames : list
* List of strings, full paths to files
metadata_only : bool, default=False
* Return before actually reading data if True
chans : list of str, default=None
* List of desired channels (skip unneeded variables as needed).
* Include all channels if None.
area_def : pyresample.AreaDefinition, default=None
* Specify region to read
* Read all data if None.
self_register : str or bool, default=False
* register all data to the specified dataset id (as specified in the
return dictionary keys).
* Read multiple resolutions of data if False.

Returns
-------
dict of xarray.Datasets
* dictionary of xarray.Dataset objects with required Variables and
Attributes.
* Dictionary keys can be any descriptive dataset ids.

See Also
--------
:ref:`xarray_standards`
Additional information regarding required attributes and variables
for GeoIPS-formatted xarray Datasets.
"""
all_metadata = readers.concatenate_metadata(
[call_single_time([x], metadata_only=True)["METADATA"] for x in fnames]
)
if metadata_only:
# from IPython import embed as shell

# shell()
return all_metadata

start_times = [dt for dt in all_metadata["METADATA"].attrs["source_file_datetimes"]]
times = list(set(start_times))
import collections

ingested_xarrays = collections.defaultdict(list)
for time in times:
scan_time_files = [dt == time for dt in start_times]
data_dict = call_single_time(
np.array(fnames)[scan_time_files],
metadata_only=metadata_only,
chans=chans,
area_def=area_def,
self_register=self_register,
)
for (
dname,
dset,
) in data_dict.items():
ingested_xarrays[dname].append(dset)

if len(times) == 1:
# No need to stack if we are only reading in one scan time
# This is likely temporary to maintain backwards compatibility

# This is not hit if we are provided multiple scan times.
return data_dict

import xarray

# merged_dset = xarray.Dataset()
# Now that we've ingested all scan times, stack along time dimension
dict_xarrays = {}
for dname, list_xarrays in ingested_xarrays.items():
if dname == "METADATA":
continue
merged_dset = xarray.concat(list_xarrays, dim="time_dim")
mindyls marked this conversation as resolved.
Show resolved Hide resolved
merged_dset.attrs["start_datetime"] = min(times)
merged_dset.attrs["end_datetime"] = max(times)
merged_dset = merged_dset.assign_coords({"time_dim": times})
dict_xarrays[dname] = merged_dset
mindyls marked this conversation as resolved.
Show resolved Hide resolved

metadata = all_metadata["METADATA"]
metadata.attrs["source_file_names"] = [os.path.basename(fname) for fname in fnames]
metadata.attrs["start_datetime"] = min(times)
metadata.attrs["end_datetime"] = max(times)
dict_xarrays["METADATA"] = metadata
return dict_xarrays


def call_single_time(
fnames, metadata_only=False, chans=None, area_def=None, self_register=False
):
"""
Read ABI NetCDF data from a list of filenames.

Parameters
----------
fnames : list
Expand Down Expand Up @@ -629,6 +724,9 @@ def call(fnames, metadata_only=False, chans=None, area_def=None, self_register=F
xarray_obj.attrs["end_datetime"] = edt
xarray_obj.attrs["source_name"] = "abi"
xarray_obj.attrs["data_provider"] = "noaa"
xarray_obj.attrs["source_file_names"] = [
os.path.basename(fname) for fname in fnames
]

# G16 -> goes-16
xarray_obj.attrs["platform_name"] = highest_md["file_info"]["platform_ID"].replace(
Expand Down
102 changes: 102 additions & 0 deletions geoips/plugins/modules/readers/ahi_hsd.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
import xarray
from scipy.ndimage import zoom

# GeoIPS Libraries
from geoips.interfaces import readers
from geoips.utils.memusg import print_mem_usage
from geoips.utils.context_managers import import_optional_dependencies
from geoips.plugins.modules.readers.utils.geostationary_geolocation import (
Expand Down Expand Up @@ -946,6 +948,103 @@ def call(
"""
Read AHI HSD data data from a list of filenames.

Parameters
----------
fnames : list
* List of strings, full paths to files
metadata_only : bool, default=False
* Return before actually reading data if True
chans : list of str, default=None
* List of desired channels (skip unneeded variables as needed).
* Include all channels if None.
area_def : pyresample.AreaDefinition, default=None
* Specify region to read
* Read all data if None.
self_register : str or bool, default=False
* register all data to the specified dataset id (as specified in the
return dictionary keys).
* Read multiple resolutions of data if False.

Returns
-------
dict of xarray.Datasets
* dictionary of xarray.Dataset objects with required Variables and
Attributes.
* Dictionary keys can be any descriptive dataset ids.

See Also
--------
:ref:`xarray_standards`
Additional information regarding required attributes and variables
for GeoIPS-formatted xarray Datasets.
"""
LOG.interactive("AHI reader test_arg: %s", test_arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should probably be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed in #796, which branches off of this PR.

all_metadata = readers.concatenate_metadata(
[call_single_time([x], metadata_only=True)["METADATA"] for x in fnames]
)
if metadata_only:
return all_metadata

start_times = [dt for dt in all_metadata["METADATA"].attrs["source_file_datetimes"]]
times = list(set(start_times))
import collections

ingested_xarrays = collections.defaultdict(list)
for time in times:
scan_time_files = [dt == time for dt in start_times]
data_dict = call_single_time(
np.array(fnames)[scan_time_files],
metadata_only=metadata_only,
chans=chans,
area_def=area_def,
self_register=self_register,
)
for (
dname,
dset,
) in data_dict.items():
ingested_xarrays[dname].append(dset)

if len(times) == 1:
# No need to stack if we are only reading in one scan time
# This is likely temporary to maintain backwards compatibility

# This is not hit if we are provided multiple scan times.
return data_dict

import xarray

# merged_dset = xarray.Dataset()
# Now that we've ingested all scan times, stack along time dimension
dict_xarrays = {}
for dname, list_xarrays in ingested_xarrays.items():
if dname == "METADATA":
continue
merged_dset = xarray.concat(list_xarrays, dim="time_dim")
merged_dset.attrs["start_datetime"] = min(times)
merged_dset.attrs["end_datetime"] = max(times)
merged_dset = merged_dset.assign_coords({"time_dim": times})
dict_xarrays[dname] = merged_dset

metadata = all_metadata["METADATA"]
metadata.attrs["source_file_names"] = [os.path.basename(fname) for fname in fnames]
metadata.attrs["start_datetime"] = min(times)
metadata.attrs["end_datetime"] = max(times)
dict_xarrays["METADATA"] = metadata
return dict_xarrays


def call_single_time(
fnames,
metadata_only=False,
chans=None,
area_def=None,
self_register=False,
test_arg="AHI Default Test Arg",
):
"""
Read AHI HSD data data from a list of filenames.

Parameters
----------
fnames : list
Expand Down Expand Up @@ -1097,6 +1196,9 @@ def call(
xarray_obj.attrs["data_provider"] = "jma"
xarray_obj.attrs["platform_name"] = highest_md["block_01"]["satellite_name"].lower()
xarray_obj.attrs["area_definition"] = area_def
xarray_obj.attrs["source_file_names"] = [
os.path.basename(fname) for fname in fnames
]

# If metadata_only requested, return here.
if metadata_only:
Expand Down
Loading
Loading