-
Notifications
You must be signed in to change notification settings - Fork 30
/
environment_setup.txt
78 lines (52 loc) · 3.87 KB
/
environment_setup.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
This readme explains how data is downloaded and formatted to be used in our models.
1.) Download the data from a google bucket.
ex: `sudo gsutil -m cp -r gs://es262-croptype/Tanzania/* /home/data/tanzania`
This copies files from the bucket called `es262-croptype/Tanzania` into a local
folder called `/home/data/tanzania`. The folder should contain directories
`raster`, `s1`, and `s2`.
`raster` - contains labels of the grid IDs. Values in these labels correspond
to field IDs, which are referred to as `geom_ID`s in the csv files
for each country.
`s1` - contains Sentinel-1 images for gridIDs
`s2` - contains Sentinel-2 images for gridIDs
2.) Next, we want to rename the data such that the gridIDs in the filenames have
leading zeros. We will use the function `rename_w_leading_0s.py`, located in the
`scripts` folder of the `crop-type-mapping` repository. To double check that the
function performs what you want it to, you can use the --dry_run flag
To call on labels,
python rename_w_leading_0s.py --dir /home/data/COUNTRY/raster --tif_content mask
For our working example, we can run:
`python rename_w_leading_0s.py --dir /home/data/tanzania/raster --tif_content mask`
To call on data,
python rename_leading_0s.py --dir /home/data/COUNTRY/SOURCE --tif_content data --country COUNTRY
For our working example, we can run:
`python rename_w_leading_0s.py --dir /home/data/tanzania/s1 --tif_content data --country tanzania`
`python rename_w_leading_0s.py --dir /home/data/tanzania/s2 --tif_content data --country tanzania`
3.) There may be some cases where data exists in the s1 and s2 folders, while their corresponding
label in the raster folder doesn't actually contain any labels. To check for these cases, run:
`python remove_invalid_grids.py`
If you open the script, you can specify parameters at the bottom. Be sure to run with `dryrun = 1`
before actually deleting anything, just to be sure the script acts as expected.
4.) We want to create data stacks for each grid, so that we can have one cube per grid that
incorporates all of our temporal data for that grid. We will save these files as .npy files so
that they can be easily read in. In saving these files, we also want to save a corresponding
.json file that will keep track of the dates of images listed in the array.
To run: `python mk_data_cube.py`, located in the `crop-type-mapping/scripts` directory.
If you open the script, you will specify parameters at the bottom!
5.) We also want to generate cloud masks for each of the Sentinel-2 npy files defined in part 4.
To do so, edit the parameters at the bottom of the file and run:
`python crop-type-mapping/scripts/cloud_classifier.py`
6.) Next, we want to change the values in the raster files so that the labels correspond to crop type
rather than field ID. We also save these as npy files rather than tifs. To do so, run:
`python crop-type-mapping/scripts/mask_tif_npy.py`
7.) Once all the data and labels have been converted to npy files using the steps above, data splits are
created. To do so, run the following command:
`python scripts/data_split.py --save=True --full_seed=14`
This will save a pickled list of gridIDs for each split, saved in /home/data/COUNTRY/
8.) We then create an hdf5 file that contains all necessary inputs that we've created above. To create
the hdf5 file, run:
`python crop-type-mapping/scripts/create_hdf5.py`
If you are not using the default, you can specify --data_dir and --output_dir in the line above
9.) Finally, we remove grids that have less than the required number of timestamps by first finding them using
`python crop-type-mapping/scripts/bad_list_finder.py`
and then removing them by calling `python remove_bad_timestamps.py`