-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate embeddings from CLAYModule trained with latlon/time encodings #96
Commits on Dec 20, 2023
-
🍻 Implement CLAYModule's predict_step to generate embeddings table
Output embeddings to a geopandas.GeoDataFrame with columns 'source_url', 'date', 'embeddings', and 'geometry'. Essentially copying and adapting the code from a767164 in #73, but modifying how the encoder's masking is disabled, and how the mean/average of the embeddings is computed over a slice of the raw embeddings.
Configuration menu - View commit details
-
Copy full SHA for 218b95e - Browse repository at this point
Copy the full SHA 218b95eView commit details -
🚚 Rename output file to {MGRS}_{MINDATE}_{MAXDATE}_v{VERSION}.gpq
The output GeoParquet file now has a filename with a format like "{MGRS:5}_{MINDATE:8}_{MAXDATE:8}_v{VERSION:3}.gpq", e.g. "12ABC_20210101_20231231_v001.gpq". Have implemented this in model_vit.py, and copied over the same `on_predict_epoch_end` method to model_clay.py. Also, we are no longer saving out the index column to the GeoParquet file.
Configuration menu - View commit details
-
Copy full SHA for f19cf8f - Browse repository at this point
Copy the full SHA f19cf8fView commit details
Commits on Dec 21, 2023
-
✅ Fix failing test by updating to new output filename
Forgot to update the filename in the unit test to conform to the new `{MGRS}_{MINDATE}_{MAXDATE}_v{VERSION}.gpq` format. Patches f19cf8f.
Configuration menu - View commit details
-
Copy full SHA for 6030cf7 - Browse repository at this point
Copy the full SHA 6030cf7View commit details -
✅ Parametrized test to check CLAYModule's predict loop
Splitting the previous integration test on the neural network model into separate fit and predict unit tests. Only testing the prediction loop of CLAYModule, because training/validating the model might be too much for CPU-based Continuous Integration. Also for testing CLAYModule, we are using 32-true precision instead of bf16-mixed, because `torch.cat` doesn't work with float16 tensors on the CPU, see pytorch/pytorch#100932 (should be fixed with Pytorch 2.2).
Configuration menu - View commit details
-
Copy full SHA for 98a39b7 - Browse repository at this point
Copy the full SHA 98a39b7View commit details -
⏪ Save index column to GeoParquet file
Decided that the index column might be good to keep for now, since it might help to speed up row counts? But we are resetting the index first before saving it. Partially reverts f19cf8f.
Configuration menu - View commit details
-
Copy full SHA for f1439e3 - Browse repository at this point
Copy the full SHA f1439e3View commit details
Commits on Jan 11, 2024
-
✅ Fix unit test to include index column
After f1439e3, need to ensure that the index column is checked in the output geodataframe.
Configuration menu - View commit details
-
Copy full SHA for 4c71db3 - Browse repository at this point
Copy the full SHA 4c71db3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 360a522 - Browse repository at this point
Copy the full SHA 360a522View commit details