Expand README #32

jpolchlo · 2020-07-22T21:29:30Z

This is an effort at making the README a bit more descriptive. I've added usage docs for the main user-facing utilities, attempting to capture all the command line arguments.

Pending:

Description of donate mask options to gather

jpolchlo · 2020-07-23T14:23:07Z

@jamesmcclain This is ready for a review!

jamesmcclain

This looks really good. I just have a few minor points around the margins.

jamesmcclain · 2020-07-24T00:37:41Z

README.md

@@ -2,30 +2,131 @@

 ![Cloud Buster](https://user-images.githubusercontent.com/11281373/72922457-f7a3d080-3d44-11ea-9032-fc80166a5389.jpg)

+Cloud-Buster is a Python library and command-line utility suite for generating cloud-free mosaics from Sentinel-2 imagery.  This package makes use of [RasterFoundry](https://rasterfoundry.azavea.com/) and [GDAL](https://gdal.org) to gather the imagery and assemble the mosaics.  Cloud detection is provided through one of the following mechanisms:
+1. Built-in Sentinel-2 cloud masks.  Results from this method are poor; not recommended.


through one of the following could be changed to any or all of the following (you can use any combination of these). It is also worth providing the caveat that s2cloudless only works on L1C imagery.

jamesmcclain · 2020-07-24T00:42:05Z

README.md

 ```

-## Gather ##
+Attempts to cover the queried geometry using a selection of imagery from a `query_rf` call.  The algorithm will attempt to cover the target area multiple times (`--coverage-count`) to help ensure the final mosaic will be cloud free after masking and merging.  Some small area of the target geometry may be left uncovered (`--max-uncovered`), which may be needed to guarantee the desired coverage.  The number of total images selected may be bounded (`--max-selections`) if, for instance, processing time and/or computational resources are limited.



processing time and/or computational resources are limited this is true, but the main motivation is to limit the amount of imagery downloaded.

Is this in order not to overrun the available disk space?

Yes, and also the fact that it is a cross-region download, so content-scale jobs could get very expensive without this.

Also, with regard to disk space: the current version of the README has a note that recommends the use of a custom AMI with more disk space than the default (other ways are possible too, but I have found that to be the easiest).

Indeed. I preserved that note at the bottom.

jamesmcclain · 2020-07-24T00:44:00Z

README.md


+Unless `--backstop False` is set, a fallback image will be selected to ensure that no holes will be left in the final mosaic, subject to the `--max-selections` constraint.  Any selections that serve as a backstop will have a `backstop` field in the output JSON file set to `True`.
+


subject to the --max-selections constraint backstops are counted separately (with an implicit coverage count of 1)

jamesmcclain · 2020-07-24T00:46:13Z

README.md

+```
+
+Uses AWS Batch jobs to process, in parallel, selected Sentinel-2 imagery to remove clouded areas.  Requires `cloudbuster/gather.py` to be uploaded to S3, and this location provided to the `meta-gather` process (`--gather`).  The Batch job will run in the defined queue (`--jobqueue`) using the specified job definition (`--jobdef`).  One may opt to see the batch job submission command without running it using `--dryrun`.
+


uploaded to S3 this can be read from anywhere on the local filesystem, from S3, or from an http(|s) URI. I typically use a "gather" sourced from GitHub.

I get the s3 or http(s) target, but the local filesystem part is confusing. If meta-gather is relying on a container which provides the download_run script, would local filesystem targets require that the files are packed into the filesystem of the docker container that is targeted by the job definition? This is starting to sound like there should be a short section on the construction of docker containers. Including a mention of the jamesmcclain/aws-batch-ml base image? Perhaps this provides some additional justification for creating some basic templates a la #33?

Actually, I think I got that wrong (conflated it with some other behavior of some other code). Since the gather script is in fact downloaded by the ./download_run.sh script, only s3:// and http(|s):// are supported. Than you for catching that 👍

jamesmcclain · 2020-07-24T00:47:34Z

README.md

+The response from `filter.py` must be provided (`--response`), as well as a name to serve as the base of the filenames (`--name`) that will be saved to a specified S3 location (`--output-path`).  The process will either be based on `L1C` or `L2A` Sentinel-2 tiles (`--kind`), which can be restricted to a desired bounding box (`--bounds-clip`).  That imagery will be downloaded to a local cache, which can be set using the `--tmp` option (defaults to `/tmp`).
+
+Cloud removal takes one of several paths paths:
+1. A pytorch model can be specified if `--architecture` and `--weights` are set, respectively, with the URI of an architecture and weight file.  (In order to use this method, the container referenced by the job definition must provide `pytorch`.)


one of several -> one or more ...

README.md

jamesmcclain · 2020-07-24T17:51:21Z

Looks good to me 👍

jpolchlo added 5 commits July 22, 2020 17:42

Add missing command-line arguments

b2fb736

First pass at improved usage docs

39f31ca

Add executable mode flag

fb72ffd

Make sure temporary donor masks are stored in the working scratch dir

c8701ef

Add donor mask info; minor fixes

fe6c339

jpolchlo force-pushed the docs/expand-readme branch from e0ef1be to fe6c339 Compare July 23, 2020 14:18

jamesmcclain suggested changes Jul 24, 2020

View reviewed changes

jpolchlo added 2 commits July 24, 2020 12:57

Respond to PR review comments

1847336

Fix wording for --merge and --gather arguments

92ef1b3

jamesmcclain approved these changes Jul 24, 2020

View reviewed changes

echeipesh merged commit d810a3f into azavea:master Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand README #32

Expand README #32

jpolchlo commented Jul 22, 2020 •

edited

Loading

jpolchlo commented Jul 23, 2020

jamesmcclain left a comment

jamesmcclain Jul 24, 2020

jamesmcclain Jul 24, 2020

jpolchlo Jul 24, 2020

jamesmcclain Jul 24, 2020

jamesmcclain Jul 24, 2020

jpolchlo Jul 24, 2020

jamesmcclain Jul 24, 2020

jamesmcclain Jul 24, 2020

jpolchlo Jul 24, 2020

jamesmcclain Jul 24, 2020

jamesmcclain Jul 24, 2020

jamesmcclain commented Jul 24, 2020


		Unless `--backstop False` is set, a fallback image will be selected to ensure that no holes will be left in the final mosaic, subject to the `--max-selections` constraint. Any selections that serve as a backstop will have a `backstop` field in the output JSON file set to `True`.

		```

		Uses AWS Batch jobs to process, in parallel, selected Sentinel-2 imagery to remove clouded areas. Requires `cloudbuster/gather.py` to be uploaded to S3, and this location provided to the `meta-gather` process (`--gather`). The Batch job will run in the defined queue (`--jobqueue`) using the specified job definition (`--jobdef`). One may opt to see the batch job submission command without running it using `--dryrun`.

Expand README #32

Expand README #32

Conversation

jpolchlo commented Jul 22, 2020 • edited Loading

jpolchlo commented Jul 23, 2020

jamesmcclain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmcclain commented Jul 24, 2020

jpolchlo commented Jul 22, 2020 •

edited

Loading