Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery endpoint should not fail when id_template not in request #194

Open
1 task
anayeaye opened this issue Jul 20, 2024 · 1 comment
Open
1 task
Labels
bug Something isn't working

Comments

@anayeaye
Copy link
Contributor

What

When the discovery endpoint is used to add items to an existing collection it fails when id_template is not provided in the request.

Note

The id_template default value is set in the s3_discovery util.

How to reproduce

  1. POST a collection via the ingest-api/collections endpoint (or choose a test collection in the dev catalog)
collection.json
{
  "id": "omi-19-item-collection-deleteme",
  "type": "Collection",
  "links": [],
  "title": "DELETE ME 19 item collection OMI_trno2",
  "extent": {
    "spatial": {
      "bbox": [
        [-180, -90, 180, 90]
      ]
    },
    "temporal": {
      "interval": [
        [null, null]
      ]
    }
  },
  "license": "MIT",
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "item_assets": {
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  },
  "stac_version": "1.0.0",
	"renders": {
        "dashboard": {
            "colormap_name": "reds",
            "rescale": [
                [
                    0,
                    3000000000000000.0
                ]
            ],
            "assets": [
                "cog_default"
            ],
            "title": "VEDA Dashboard Render Parameters"
        }
    },
    "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
    "item_assets": {
        "test_asset": {
            "title": "An item asset description for test",
						"type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": ["test"]
        },
				"cog_default": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Default COG Layer",
            "description": "Cloud optimized default layer to display on map"
        }
    },
    "assets": {
        "thumbnail": {
            "title": "Thumbnail",
            "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
            "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
            "type": "image/jpeg",
            "roles": ["thumbnail"]
        }
    }
}
2. Submit a `discovery/` request without providing `id_template` in config. For the above collection
{
  "bucket": "veda-data-store-staging",
  "collection": "omi-19-item-collection-deleteme",
  "datetime_range": "year",
  "discovery": "s3",
  "filename_regex": "^(.*).tif$",
  "prefix": "OMI_trno2-COG/",
}

Error log

AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=veda_discover
AIRFLOW_CTX_TASK_ID=subdag_discover.discover_from_s3
AIRFLOW_CTX_EXECUTION_DATE=2024-07-19T15:53:42+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=d222b047-d453-4980-acad-f40d473320c6
[2024-07-19, 15:53:50 UTC] {{logging_mixin.py:137}} INFO - Getting S3 response iterator for bucket: veda-data-store-staging, prefix: OMI_trno2-COG/
[2024-07-19, 15:53:50 UTC] {{taskinstance.py:1768}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 247, in execute
    condition = super().execute(context)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 175, in execute
    return_value = self.execute_callable()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 192, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/veda_data_pipeline/groups/discover_group.py", line 36, in discover_from_s3_task
    return s3_discovery_handler(
  File "/usr/local/airflow/dags/veda_data_pipeline/utils/s3_discovery.py", line 251, in s3_discovery_handler
    item["item_id"] = id_template.format(item["item_id"])
AttributeError: 'NoneType' object has no attribute 'format'
[2024-07-19, 15:53:50 UTC] {{taskinstance.py:1318}} INFO - Marking task as FAILED. dag_id=veda_discover, task_id=subdag_discover.discover_from_s3, execution_date=20240719T155342, start_date=20240719T155349, end_date=20240719T155350
[2024-07-19, 15:53:50 UTC] {{standard_task_runner.py:100}} ERROR - Failed to execute job 2754 for task subdag_discover.discover_from_s3 ('NoneType' object has no attribute 'format'; 24409)
[2024-07-19, 15:53:50 UTC] {{local_task_job.py:208}} INFO - Task exited with return code 1

AC

  • discovery endpoint does not fail when id_template is not provided
@anayeaye anayeaye added the bug Something isn't working label Jul 20, 2024
@ciaransweet
Copy link

@anayeaye Are you able to give me a quick TL;DR run through of this (or rather, what I need to setup to replicate it) when you're awake? 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants