Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve date handling for data pipeline #76

Merged
merged 8 commits into from
Dec 7, 2023
Merged

Conversation

yellowcap
Copy link
Member

If no match is found for a year, others are being tried until a match is found or all years have been tested

Closes #68

yellowcap and others added 3 commits December 7, 2023 14:10
If no match is found for a year, others are being tried until
a match is found or all years have been tested
@yellowcap
Copy link
Member Author

Also added tile size increase to 512x512 pixels to this PR, ref #78

Copy link
Contributor

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one typo, otherwise should be good.

pixels = [part.compute() for part in pixels]
print(f"Starting algorithm for MGRS tile {tile['name']} with index {index}")

# Shuffle years, use index as seed for reproducability but no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Shuffle years, use index as seed for reproducability but no
# Shuffle years, use index as seed for reproducibility but no

@yellowcap
Copy link
Member Author

Got this to work on batch and gave good results. Kicking off a new batch run as we speak, if all goes well we'll have 10x the data in a few hours 🤞🏽

@yellowcap yellowcap merged commit c6a8365 into main Dec 7, 2023
2 checks passed
@yellowcap yellowcap deleted the improve-date-handling branch December 7, 2023 22:31
@weiji14
Copy link
Contributor

weiji14 commented Dec 7, 2023

Good timing! I'm hoping to kick off a new training run with Soumya's code later, and can test things out on the new data batch.

brunosan pushed a commit that referenced this pull request Dec 27, 2023
* Improve date handling for data pipeline

If no match is found for a year, others are being tried until
a match is found or all years have been tested

* Increase tile size to 512x512 pixels.

Closes #78

* Increase dates per location to 3

Closes #79

* Prevent printing s3 sync upload progress logs

* Move counter above cloud filter to ensure index consistency

Like this the tile IDs in the file names should be consistent across dates.

* Fix typo in comment

* Update batch run setup to new bucket name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Catch Nodata for S1 and DEM
2 participants