Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic ingest testing #184

Open
4 tasks
anayeaye opened this issue Jul 9, 2024 · 2 comments
Open
4 tasks

Basic ingest testing #184

anayeaye opened this issue Jul 9, 2024 · 2 comments

Comments

@anayeaye
Copy link
Contributor

anayeaye commented Jul 9, 2024

Purpose

Describe how we can test our most basic VEDA ingest use cases when changes are made to airflow and/or the workflows endpoints for two representative VEDA production collections.

Representative collections

  1. A single asset cog_default collection
  2. A multi-asset collection that will cause the discovery step to group s3 objects within a single item record sharing the same datetime instant

Representative properties for most VEDA collections

Most/all VEDA collections have these properties

  • providers
  • renders
  • assets (for a thumbnail that is displayed in the openveda.cloud stac-browser
    Not all VEDA collections have these properties but they eventually should
  • item_assets that exactly represent a list of the assets available in each item (as in: if every item has a cog_default item_assets should include the key cog_default for an asset of the cog type, if there are multiple assets and/or unique keys are used, we need to describe those in the collection metadata

Missing from this issue (a very incomplete list)

  • Single and multi asset collection with custom datetime configuration
  • Workflows api datasets/ endpoint tests (because multi asset configuration is not yet supported, the additional collection properties and multi asset configuration is truncated by the endpoint before the request triggers a pipeline in airflow--there is work in progress to address this)
  • Tests that should fail (to prove validation is working)
  • The veda_collection_pipeline DAG

Overview

  1. Choose a production collection in veda-data/ingestion-data/production/collections and create a local copy with a new ID and title for testing (append something obvious -temp-test-copy to the collection id and update the title so it is easy to pick out in the browser).
  2. Find the associated config in veda-data/ingestion-data/production/discovery-items and create a local copy and update the id to match the test collection you just prepared. For many collections, there may also be a staging dataset-config associated with the collection you are copying.
  • create a local copy
  • update the collection id to match your test copy collection
  • you may need to change the discovery bucket to veda-data-store-staging from veda-data-store depending on which mwaa, veda-data-airflow, and veda-backend environment you are testing.
  1. Document what is under test so you can keep track of the various urls of both the backend catalog and the ingest systems under test.
  • STAC_API_URL = ___________
  • INGEST_API_URL = ___________
  • WORKFLOWS_API_URL = ___________
  • AIRFLOW UI URL = ___________
  1. Test out the most commonly used ingest patterns and delete your test collection in between each test (and make sure that the delete operation has completed updating the items partitions table by watching for items added to the collection too quickly when you recreate the test collection in your next test).
  • Pattern 1: manually create the collection via the ingest-api/collections endpoint, then trigger a discovery workflow via the workflows-api/discovery endpoint.
  • Pattern 2: manually create the collection via the ingest-api/collections endpoint, then trigger the veda_discover DAG via the veda-data-airflow UI.
  • Pattern 3: generate a composite dataset config with s3 discovery+pseudo-STAC collection object and submit workflows-api/dataset endpoint which manages the collection creation and then triggers a veda_discover DAG. This is not yet able to handle multi asset collections or additional properties yet.
  • Pattern 4: generate a composite dataset config with s3 discovery+pseudo-STAC collection object manually trigger veda_dataset_pipeline via the airflow UI which handles the collection and then discovery as a subdag operation.
  1. Clean-up; delete the copied test collections.

Examples

Single-asset collection

collection.json
{
  "id": "OMI_trno2-COG-deleteme",
  "type": "Collection",
  "links": [],
  "title": "DELETE ME OMI_trno2",
  "extent": {
    "spatial": {
      "bbox": [
        [-180, -90, 180, 90]
      ]
    },
    "temporal": {
      "interval": [
        [null, null]
      ]
    }
  },
  "license": "MIT",
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "item_assets": {
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  },
  "stac_version": "1.0.0",
	"renders": {
        "dashboard": {
            "colormap_name": "reds",
            "rescale": [
                [
                    0,
                    3000000000000000.0
                ]
            ],
            "assets": [
                "cog_default"
            ],
            "title": "VEDA Dashboard Render Parameters"
        }
    },
    "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
    "item_assets": {
        "test_asset": {
            "title": "An item asset description for test",
						"type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": ["test"]
        },
				"cog_default": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Default COG Layer",
            "description": "Cloud optimized default layer to display on map"
        }
    },
    "assets": {
        "thumbnail": {
            "title": "Thumbnail",
            "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
            "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
            "type": "image/jpeg",
            "roles": ["thumbnail"]
        }
    }
}
discovery-config.json
{
    "collection": "OMI_trno2-COG-deleteme",
    "bucket": "veda-data-store-staging",
    "datetime_range": "year",
    "discovery": "s3",
    "filename_regex": "^(.*).tif$",
    "prefix": "OMI_trno2-COG/"
}
dataset-config.json
{
  "assets": {
    "thumbnail": {
      "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
      "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
      "roles": [
        "thumbnail"
      ],
      "title": "Thumbnail",
      "type": "image/jpeg"
    }
  },
  "collection": "OMI_trno2-COG-deleteme",
  "data_type": "cog",
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "discovery_items": [
    {
      "bucket": "veda-data-store-staging",
      "datetime_range": "year",
      "discovery": "s3",
      "filename_regex": "^(.*).tif$",
      "prefix": "OMI_trno2-COG/"
    }
  ],
  "is_periodic": true,
  "license": "MIT",
  "providers": [
    {
      "name": "NASA VEDA",
      "roles": [
        "host"
      ],
      "url": "https://www.earthdata.nasa.gov/dashboard/"
    }
  ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "time_density": "year",
  "title": "DELETE ME OMI_trno2"
}

multi-asset collection

collection.json
{
    "id": "climdex-tmaxxf-access-cm2-ssp126-deleteme",
    "type": "Collection",
    "links": [],
    "title": "DELETE THIS TEST CLIMDEX ACCESS CM2 SSP125 tmaxXF",
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -180,
                    -90,
                    180,
                    90
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2015-01-01T00:00:00+00:00",
                    "2101-12-31T23:59:59+00:00"
                ]
            ]
        }
    },
    "license": "CC-BY-SA-4.0",
    "description": "CLIMDEX ACCESS CM2 SSP125 - variable tmaxXF",
    "item_assets": {
        "cog_default": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Default COG Layer",
            "description": "Cloud optimized default layer to display on map"
        },
        "tmax_above_86": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Tmax Above 86",
            "description": "Tmax Above 86"
        },
        "tmax_above_90": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Tmax Above 90",
            "description": "Tmax Above 90"
        },
        "tmax_above_100": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Tmax Above 100",
            "description": "Tmax Above 100"
        },
        "tmax_above_110": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Tmax Above 110",
            "description": "Tmax Above 110"
        },
        "tmax_above_115": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Tmax Above 115",
            "description": "Tmax Above 115"
        }
    },
    "stac_version": "1.0.0",
    "dashboard:is_periodic": true,
    "dashboard:time_density": "year",
    "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
    "assets": {
        "thumbnail": {
            "title": "Thumbnail",
            "description": "Photo by NASA (CMIP6 Climdex TmaxXF Screenshot)",
            "href": "https://thumbnails.openveda.cloud/cmip6-climdex-tmaxxf-access-cm2.png",
            "type": "image/png",
            "roles": ["thumbnail"]
        }
    }
}
discovery-config.json
{
    "collection": "climdex-tmaxxf-access-cm2-ssp126-deleteme",
    "bucket": "veda-data-store-staging",
    "prefix": "climdex-tmaxxf-access-cm2-ssp126/",
    "filename_regex": ".*-ssp126_209(.*)_tmax.*.tif$",
    "id_regex": ".*-ssp126_(.*)_tmax.*.tif$",
    "id_template": "climdex-tmaxxf-access-cm2-ssp126-{}",
    "datetime_range": "year",
    "assets": {
        "tmax_above_86": {
          "title": "Tmax Above 86",
          "description": "Tmax Above 86",
          "regex": ".*-ssp126_(.*)_tmax_above_86.tif"
        },
        "tmax_above_90": {
          "title": "Tmax Above 90",
          "description": "Tmax Above 90",
          "regex": ".*-ssp126_(.*)_tmax_above_90.tif"
        },
        "tmax_above_100": {
          "title": "Tmax Above 100",
          "description": "Tmax Above 100",
          "regex": ".*-ssp126_(.*)_tmax_above_100.tif"
        },
        "tmax_above_110": {
          "title": "Tmax Above 110",
          "description": "Tmax Above 110",
          "regex": ".*-ssp126_(.*)_tmax_above_110.tif"
        },
        "tmax_above_115": {
          "title": "Tmax Above 115",
          "description": "Tmax Above 115",
          "regex": ".*-ssp126_(.*)_tmax_above_115.tif"
        }
      },
    "discovery": "s3",
    "upload": false
}
dataset-config.json
{ "collection": "climdex-tmaxxf-access-cm2-ssp126-multi-asset",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -180,
    "ymin": -90,
    "xmax": 180,
    "ymax": 90
  },
  "temporal_extent": {
    "startdate": "2015-01-01T00:00:00Z",
    "enddate": "2101-12-31T23:59:59Z"
  },
  "description": "CLIMDEX ACCESS CM2 SSP125 - variable tmaxXF",
  "is_periodic": true,
  "license": "MIT",
  "item_assets": {
    "cog_default": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Default COG Layer",
        "description": "Cloud optimized default layer to display on map"
    },
    "tmax_above_86": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 86",
        "description": "Tmax Above 86"
    },
    "tmax_above_90": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 90",
        "description": "Tmax Above 90"
    },
    "tmax_above_100": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 100",
        "description": "Tmax Above 100"
    },
    "tmax_above_110": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 110",
        "description": "Tmax Above 110"
    },
    "tmax_above_115": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 115",
        "description": "Tmax Above 115"
    }
  },
  "sample_files": ["s3://veda-data-store-staging/climdex-tmaxxf-access-cm2-ssp126/tmaxXF-ACCESS-CM2-ssp126_2099_tmax_above_86.tif"],
  "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "assets": {
    "thumbnail": {
      "title": "Thumbnail",
      "description": "Photo by NASA (CMIP6 Climdex TmaxXF Screenshot)",
      "href": "https://thumbnails.openveda.cloud/cmip6-climdex-tmaxxf-access-cm2.png",
      "type": "image/png",
      "roles": ["thumbnail"]
    }
  },
  "time_density": "year",
  "title": "DELETE ME CLIMDEX",
  "discovery_items": [
    {
      "collection": "climdex-tmaxxf-access-cm2-ssp126-deleteme",
      "bucket": "veda-data-store-staging",
      "prefix": "climdex-tmaxxf-access-cm2-ssp126/",
      "filename_regex": ".*-ssp126_209(.*)_tmax.*.tif$",
      "id_regex": ".*-ssp126_(.*)_tmax.*.tif$",
      "id_template": "climdex-tmaxxf-access-cm2-ssp126-{}",
      "datetime_range": "year",
      "assets": {
          "tmax_above_86": {
            "title": "Tmax Above 86",
            "description": "Tmax Above 86",
            "regex": ".*-ssp126_(.*)_tmax_above_86.tif"
          },
          "tmax_above_90": {
            "title": "Tmax Above 90",
            "description": "Tmax Above 90",
            "regex": ".*-ssp126_(.*)_tmax_above_90.tif"
          },
          "tmax_above_100": {
            "title": "Tmax Above 100",
            "description": "Tmax Above 100",
            "regex": ".*-ssp126_(.*)_tmax_above_100.tif"
          },
          "tmax_above_110": {
            "title": "Tmax Above 110",
            "description": "Tmax Above 110",
            "regex": ".*-ssp126_(.*)_tmax_above_110.tif"
          },
          "tmax_above_115": {
            "title": "Tmax Above 115",
            "description": "Tmax Above 115",
            "regex": ".*-ssp126_(.*)_tmax_above_115.tif"
          }
        },
      "discovery": "s3",
      "upload": false
    }
  ]
}
@anayeaye
Copy link
Contributor Author

Results

veda-data-airflow veda-pipeline-sit + veda-backend-dev (stac-api and ingest-api) testing for PRs #183 and #159

single-asset collection

Pattern 1: manually create the collection via the ingest-api/collections endpoint, then trigger a discovery workflow via the workflows-api/discovery endpoint.

  • BUG the veda_discover DAG is successfully triggered by workflows-api/discovery request but fails on change to expected payload see PR note

Pattern 2: manually create the collection via the ingest-api/collections endpoint, then trigger the veda_discover DAG via the veda-data-airflow UI.

  • BUG the same veda_discover DAG failure also occurs when triggered manually via the UI (worth testing just in case the workflows API was causing a change to the discovery config on the way to airflow)

Pattern 3: generate a composite dataset config with s3 discovery+pseudo-STAC collection object and submit workflows-api/dataset endpoint.

  • Not tested

Pattern 4: generate a composite dataset config with s3 discovery+pseudo-STAC collection object manually trigger veda_dataset_pipeline via the airflow UI.

  • Created collection and discovered items
  • Collection renders, and providers props were retained,
  • BUG assets collection property could not be used in dataset dag because the assets property of a collection is not equivalent to the assets property of s3 discovery config. This bug was not introduced in the PR under test, it just hadn't been tested before.
  • BUG custom item_assets configuration for collection was overwritten with the cog_default default. This bug was not introduced in the PR under test, it just hadn't been tested before.

multi-asset collection

Generated test metadata for future testing and then confirmed that patterns 1 and 2 fail with the same error as the single-asset collection. I haven't had time to generate the composite dataset config input json for this multi asset test collection yet...

@anayeaye anayeaye changed the title WIP Basic ingest testing Basic ingest testing Jul 10, 2024
@anayeaye
Copy link
Contributor Author

More test cases! I'm going to link examples here until we setup a home base for workflow test data.
#179 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant