Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand README #32

Merged
merged 7 commits into from
Jul 29, 2020
Merged

Expand README #32

merged 7 commits into from
Jul 29, 2020

Conversation

jpolchlo
Copy link
Contributor

@jpolchlo jpolchlo commented Jul 22, 2020

This is an effort at making the README a bit more descriptive. I've added usage docs for the main user-facing utilities, attempting to capture all the command line arguments.

Pending:

  • Description of donate mask options to gather

@jpolchlo
Copy link
Contributor Author

@jamesmcclain This is ready for a review!

Copy link
Contributor

@jamesmcclain jamesmcclain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. I just have a few minor points around the margins.

@@ -2,30 +2,131 @@

![Cloud Buster](https://user-images.githubusercontent.com/11281373/72922457-f7a3d080-3d44-11ea-9032-fc80166a5389.jpg)

Cloud-Buster is a Python library and command-line utility suite for generating cloud-free mosaics from Sentinel-2 imagery. This package makes use of [RasterFoundry](https://rasterfoundry.azavea.com/) and [GDAL](https://gdal.org) to gather the imagery and assemble the mosaics. Cloud detection is provided through one of the following mechanisms:
1. Built-in Sentinel-2 cloud masks. Results from this method are poor; not recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

through one of the following could be changed to any or all of the following (you can use any combination of these). It is also worth providing the caveat that s2cloudless only works on L1C imagery.

```

## Gather ##
Attempts to cover the queried geometry using a selection of imagery from a `query_rf` call. The algorithm will attempt to cover the target area multiple times (`--coverage-count`) to help ensure the final mosaic will be cloud free after masking and merging. Some small area of the target geometry may be left uncovered (`--max-uncovered`), which may be needed to guarantee the desired coverage. The number of total images selected may be bounded (`--max-selections`) if, for instance, processing time and/or computational resources are limited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processing time and/or computational resources are limited this is true, but the main motivation is to limit the amount of imagery downloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in order not to overrun the available disk space?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and also the fact that it is a cross-region download, so content-scale jobs could get very expensive without this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, with regard to disk space: the current version of the README has a note that recommends the use of a custom AMI with more disk space than the default (other ways are possible too, but I have found that to be the easiest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I preserved that note at the bottom.


Unless `--backstop False` is set, a fallback image will be selected to ensure that no holes will be left in the final mosaic, subject to the `--max-selections` constraint. Any selections that serve as a backstop will have a `backstop` field in the output JSON file set to `True`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subject to the --max-selections constraint backstops are counted separately (with an implicit coverage count of 1)

```

Uses AWS Batch jobs to process, in parallel, selected Sentinel-2 imagery to remove clouded areas. Requires `cloudbuster/gather.py` to be uploaded to S3, and this location provided to the `meta-gather` process (`--gather`). The Batch job will run in the defined queue (`--jobqueue`) using the specified job definition (`--jobdef`). One may opt to see the batch job submission command without running it using `--dryrun`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uploaded to S3 this can be read from anywhere on the local filesystem, from S3, or from an http(|s) URI. I typically use a "gather" sourced from GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the s3 or http(s) target, but the local filesystem part is confusing. If meta-gather is relying on a container which provides the download_run script, would local filesystem targets require that the files are packed into the filesystem of the docker container that is targeted by the job definition? This is starting to sound like there should be a short section on the construction of docker containers. Including a mention of the jamesmcclain/aws-batch-ml base image? Perhaps this provides some additional justification for creating some basic templates a la #33?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I got that wrong (conflated it with some other behavior of some other code). Since the gather script is in fact downloaded by the ./download_run.sh script, only s3:// and http(|s):// are supported. Than you for catching that 👍

The response from `filter.py` must be provided (`--response`), as well as a name to serve as the base of the filenames (`--name`) that will be saved to a specified S3 location (`--output-path`). The process will either be based on `L1C` or `L2A` Sentinel-2 tiles (`--kind`), which can be restricted to a desired bounding box (`--bounds-clip`). That imagery will be downloaded to a local cache, which can be set using the `--tmp` option (defaults to `/tmp`).

Cloud removal takes one of several paths paths:
1. A pytorch model can be specified if `--architecture` and `--weights` are set, respectively, with the URI of an architecture and weight file. (In order to use this method, the container referenced by the job definition must provide `pytorch`.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of several -> one or more ...

README.md Outdated Show resolved Hide resolved
@jamesmcclain
Copy link
Contributor

Looks good to me 👍

@echeipesh echeipesh merged commit d810a3f into azavea:master Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants