Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clay over aoi #116

Merged
merged 5 commits into from
Jan 15, 2024
Merged

Clay over aoi #116

merged 5 commits into from
Jan 15, 2024

Conversation

yellowcap
Copy link
Member

@yellowcap yellowcap commented Jan 12, 2024

Updated pipeline and added notebook to docs to run clay over custom aois over custom date ranges.

I tested the script locally extensively and it works, having more people trying it won't hurt to see if there are any things that are missing / that I did not catch in my local runs.

@brunosan
Copy link
Member

I merged this PR into #118. Which would close this one when merged.

Copy link
Contributor

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yellowcap, just some minor recommendations/questions below. I think we're trying to move fast, so I'll pre-approve this, but feel free to handle any of the suggestions below if you have time before merging.

Comment on lines 245 to 267
"source": [
"aoi_src = \"\"\"{\n",
" \"type\": \"FeatureCollection\",\n",
" \"name\": \"puri\",\n",
" \"crs\": {\n",
" \"type\": \"name\",\n",
" \"properties\": { \"name\": \"urn:ogc:def:crs:OGC:1.3:CRS84\" }\n",
" },\n",
" \"features\": [\n",
" {\n",
" \"type\": \"Feature\",\n",
" \"properties\": {\n",
" \"Region\": \"Puri\"\n",
" },\n",
" \"geometry\": {\n",
" \"type\": \"Polygon\",\n",
" \"coordinates\": [ [ [ 85.050328768992927, 19.494998302498729 ], [ 85.169749958884921, 19.441346370958311 ], [ 85.968028516820738, 19.799945705366934 ], [ 86.029742823006544, 20.15322442052609 ], [ 86.104280881127053, 20.516965603502392 ], [ 85.404584916189307, 20.564248936237991 ], [ 84.945334300027469, 20.339711117597215 ], [ 84.816295296184407, 19.799945705366934 ], [ 85.050328768992927, 19.494998302498729 ] ] ]\n",
" }\n",
" }\n",
" ]\n",
"}\"\"\"\n",
"aoi = gpd.read_file(aoi_src, driver=\"GeoJSON\")\n",
"aoi.geometry[0]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show how to specify the AOI via shapely.Point or shapely.box instead? GeoJSON is a little harder to construct manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed to use shapely.box

Comment on lines 689 to 690
"! wandb disabled\n",
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should look to disable wandb by default in trainer.py as mentioned at #109 (comment). Relevant lines are here:

model/trainer.py

Lines 53 to 56 in 0145e55

LearningRateMonitor(logging_interval="step"),
LogIntermediatePredictions(),
],
"logger": [WandbLogger(project="CLAY-v0", log_model=False)],

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most definitively, we can remove this line once the trainer has been udpated.

required=False,
default="",
type=str,
help="Comma separated list of date ranges, each provided as yy-mm-dd/yy-mm-dd.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it year as 4 numbers (e.g. 2024) or as 2 numbers (e.g. 24)?

Suggested change
help="Comma separated list of date ranges, each provided as yy-mm-dd/yy-mm-dd.",
help="Comma separated list of date ranges, each provided as yyyy-mm-dd/yyyy-mm-dd.",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I will update accordingly.

Comment on lines 423 to 427
"--subset",
required=False,
default="",
help="For debugging, subset x and y to this pixel window as a comma"
"separated string of 4 integers.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is for debugging only, but just to clarify, this option is for cropping the tiles/chips to a specified height and width?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what this does either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is creating confusion, I'll try to work this better in the help string.

The goal of this is to reduce the amount of data when processing in test mode. Normally, the script would download the entire scene, which is 10k pixels squared. Whit this one can reduce the amount of pixels downloaded and passed to the tiler script. So this is cutting a window from the scene data. The tiler will then work on this smaller input, but be unaltered otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to make it more complex, but I understand we download the entire MGRS tile, even when many of the chips might be outside of the requested AOI. If we drop chips outside of the AoI, we can just use a point or small polygon for testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct and would be the best approach, but for that the datacube.py script needs much more refactoring. Currently it's fully focused on downloading the entire MGRS tile. Filtering that spatially will require passing the AOI itself and handling spatial intersections, so that indeed is quite a bit more complicated.

For now I can state also that more clearly, that the AOI approach right now means getting data for full MGRS tiles that intersect with the AOI.


with rasterio.open(name, "r+") as rst:
rst.colorinterp = color
rst.update_tags(date=date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to handle #70 here, i.e. use ACQUISITIONDATETIME or TIFFTAG_DATETIME? Or do that in a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR, coming up.

"source": [
"import geopandas as gpd\n",
"\n",
"! wget https://clay-mgrs-samples.s3.amazonaws.com/mgrs_full.fgb -O data/mgrs/mgrs_full.fgb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, the full sample file is constructed in the landcover.sh script. There I actually got it from the NASA site but its the same file. I'll link to this in the notebook.

"source": [
"for index, row in mgrs_aoi.iterrows():\n",
" print(index, row)\n",
" ! python scripts/datacube.py --sample data/mgrs/mgrs_aoi.fgb --subset 1500,1500,2524,2524 --localpath data/chips --index {index} --dateranges 2020-01-01/2020-04-01,2021-06-01/2021-09-15"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line requires a PC_SDK_SUBSCRIPTION_KEY token to be set I think, otherwise users will get a requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://planetarycomputer.microsoft.com/api/sas/v1/token/sentinel1euwestrtc/sentinel1-grd-rtc. Need to document this inline perhaps, xref #119.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added reference to the markdown

"source": [
"## Create the embeddings for each training chip\n",
"\n",
"The checkpoints can be downloaded from huggingface."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to HuggingFace page perhaps?

Suggested change
"The checkpoints can be downloaded from huggingface."
"The checkpoints can be downloaded from huggingface at https://huggingface.co/made-with-clay/Clay."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added.

],
"source": [
"! wandb disabled\n",
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightning supports passing in a checkpoint URL (see https://lightning.ai/docs/pytorch/2.1.2/common/checkpointing_advanced.html#resume-training-from-a-cloud-checkpoint), so can remove the wget line above and predict directly from the HuggingFace checkpoint URL. The checkpoint will be downloaded automatically to ~/.cache/torch/hub/checkpoints/

Suggested change
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n",
"! python trainer.py predict --ckpt_path=https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n",

@yellowcap yellowcap merged commit a4377cc into main Jan 15, 2024
4 checks passed
@yellowcap yellowcap deleted the clay-over-aoi branch January 15, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants