Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate embeddings from CLAYModule trained with latlon/time encodings #96

Merged
merged 7 commits into from
Jan 12, 2024

Commits on Dec 20, 2023

  1. 🍻 Implement CLAYModule's predict_step to generate embeddings table

    Output embeddings to a geopandas.GeoDataFrame with columns 'source_url', 'date', 'embeddings', and 'geometry'. Essentially copying and adapting the code from a767164 in #73, but modifying how the encoder's masking is disabled, and how the mean/average of the embeddings is computed over a slice of the raw embeddings.
    weiji14 committed Dec 20, 2023
    Configuration menu
    Copy the full SHA
    218b95e View commit details
    Browse the repository at this point in the history
  2. 🚚 Rename output file to {MGRS}_{MINDATE}_{MAXDATE}_v{VERSION}.gpq

    The output GeoParquet file now has a filename with a format like "{MGRS:5}_{MINDATE:8}_{MAXDATE:8}_v{VERSION:3}.gpq", e.g. "12ABC_20210101_20231231_v001.gpq". Have implemented this in model_vit.py, and copied over the same `on_predict_epoch_end` method to model_clay.py. Also, we are no longer saving out the index column to the GeoParquet file.
    weiji14 committed Dec 20, 2023
    Configuration menu
    Copy the full SHA
    f19cf8f View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2023

  1. ✅ Fix failing test by updating to new output filename

    Forgot to update the filename in the unit test to conform to the new `{MGRS}_{MINDATE}_{MAXDATE}_v{VERSION}.gpq` format. Patches f19cf8f.
    weiji14 committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    6030cf7 View commit details
    Browse the repository at this point in the history
  2. ✅ Parametrized test to check CLAYModule's predict loop

    Splitting the previous integration test on the neural network model into separate fit and predict unit tests. Only testing the prediction loop of CLAYModule, because training/validating the model might be too much for CPU-based Continuous Integration. Also for testing CLAYModule, we are using 32-true precision instead of bf16-mixed, because `torch.cat` doesn't work with float16 tensors on the CPU, see pytorch/pytorch#100932 (should be fixed with Pytorch 2.2).
    weiji14 committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    98a39b7 View commit details
    Browse the repository at this point in the history
  3. ⏪ Save index column to GeoParquet file

    Decided that the index column might be good to keep for now, since it might help to speed up row counts? But we are resetting the index first before saving it. Partially reverts f19cf8f.
    weiji14 committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    f1439e3 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2024

  1. ✅ Fix unit test to include index column

    After f1439e3, need to ensure that the index column is checked in the output geodataframe.
    weiji14 committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    4c71db3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    360a522 View commit details
    Browse the repository at this point in the history