-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand README #32
Expand README #32
Conversation
e0ef1be
to
fe6c339
Compare
@jamesmcclain This is ready for a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. I just have a few minor points around the margins.
@@ -2,30 +2,131 @@ | |||
|
|||
![Cloud Buster](https://user-images.githubusercontent.com/11281373/72922457-f7a3d080-3d44-11ea-9032-fc80166a5389.jpg) | |||
|
|||
Cloud-Buster is a Python library and command-line utility suite for generating cloud-free mosaics from Sentinel-2 imagery. This package makes use of [RasterFoundry](https://rasterfoundry.azavea.com/) and [GDAL](https://gdal.org) to gather the imagery and assemble the mosaics. Cloud detection is provided through one of the following mechanisms: | |||
1. Built-in Sentinel-2 cloud masks. Results from this method are poor; not recommended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
through one of the following
could be changed to any or all of the following
(you can use any combination of these). It is also worth providing the caveat that s2cloudless only works on L1C imagery.
``` | ||
|
||
## Gather ## | ||
Attempts to cover the queried geometry using a selection of imagery from a `query_rf` call. The algorithm will attempt to cover the target area multiple times (`--coverage-count`) to help ensure the final mosaic will be cloud free after masking and merging. Some small area of the target geometry may be left uncovered (`--max-uncovered`), which may be needed to guarantee the desired coverage. The number of total images selected may be bounded (`--max-selections`) if, for instance, processing time and/or computational resources are limited. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processing time and/or computational resources are limited
this is true, but the main motivation is to limit the amount of imagery downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in order not to overrun the available disk space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and also the fact that it is a cross-region download, so content-scale jobs could get very expensive without this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, with regard to disk space: the current version of the README has a note that recommends the use of a custom AMI with more disk space than the default (other ways are possible too, but I have found that to be the easiest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I preserved that note at the bottom.
|
||
Unless `--backstop False` is set, a fallback image will be selected to ensure that no holes will be left in the final mosaic, subject to the `--max-selections` constraint. Any selections that serve as a backstop will have a `backstop` field in the output JSON file set to `True`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subject to the --max-selections constraint
backstops are counted separately (with an implicit coverage count of 1)
``` | ||
|
||
Uses AWS Batch jobs to process, in parallel, selected Sentinel-2 imagery to remove clouded areas. Requires `cloudbuster/gather.py` to be uploaded to S3, and this location provided to the `meta-gather` process (`--gather`). The Batch job will run in the defined queue (`--jobqueue`) using the specified job definition (`--jobdef`). One may opt to see the batch job submission command without running it using `--dryrun`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uploaded to S3
this can be read from anywhere on the local filesystem, from S3, or from an http(|s) URI. I typically use a "gather" sourced from GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the s3 or http(s) target, but the local filesystem part is confusing. If meta-gather
is relying on a container which provides the download_run
script, would local filesystem targets require that the files are packed into the filesystem of the docker container that is targeted by the job definition? This is starting to sound like there should be a short section on the construction of docker containers. Including a mention of the jamesmcclain/aws-batch-ml
base image? Perhaps this provides some additional justification for creating some basic templates a la #33?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I got that wrong (conflated it with some other behavior of some other code). Since the gather script is in fact downloaded by the ./download_run.sh
script, only s3://
and http(|s)://
are supported. Than you for catching that 👍
The response from `filter.py` must be provided (`--response`), as well as a name to serve as the base of the filenames (`--name`) that will be saved to a specified S3 location (`--output-path`). The process will either be based on `L1C` or `L2A` Sentinel-2 tiles (`--kind`), which can be restricted to a desired bounding box (`--bounds-clip`). That imagery will be downloaded to a local cache, which can be set using the `--tmp` option (defaults to `/tmp`). | ||
|
||
Cloud removal takes one of several paths paths: | ||
1. A pytorch model can be specified if `--architecture` and `--weights` are set, respectively, with the URI of an architecture and weight file. (In order to use this method, the container referenced by the job definition must provide `pytorch`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of several
-> one or more ...
Looks good to me 👍 |
This is an effort at making the README a bit more descriptive. I've added usage docs for the main user-facing utilities, attempting to capture all the command line arguments.
Pending: