Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added OSM landmask as a new source #30

Merged
merged 2 commits into from
Sep 11, 2024
Merged

Conversation

TheSylex
Copy link

Left the GSHHG landmask as the default, but I've added a new method in order to be able to choose between both GSHHG and OSM. It could also be possible to add any arbitrary number of providers but I didn't want to change the project too much, so I kept the changes minimal.

Cargo.toml Outdated Show resolved Hide resolved
@gauteh
Copy link
Owner

gauteh commented Aug 28, 2024

Thank you. This seems very useful. The landmask is derived from the GSSHG shapes, with the OSM shapes the mask probably no longer matches the shapes. So if a point is in the ocean in the mask, but not in OSM, it may not be picked up. I think we have to generate a separate mask, or maybe just the difference to save space (there is a limit on the size of distributed packages to pypi/conda/crates).

@TheSylex
Copy link
Author

How did you derive the mask? I'm unsure on both how it works and how it's generated.
I grabbed the shapefile straight out of the OSM website and made an script to convert it from a shapefile into a single multipolygon .wkb.

Would you be able to kindly explain how it works in order for me to understand it and be able finish this PR? Thanks!

@gauteh
Copy link
Owner

gauteh commented Aug 28, 2024

The mask was derived from a Python version, so it is not super easy: https://github.com/gauteh/roaring-landmask/tree/main/src/devel, I will give you a more detailed answer tomorrow.

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2024

Roaring landmask is fast because it can first exclude a great number of points that are certain to be in the ocean. The mask is generated with a set resolution (https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L24). To generate the mask you need to check whether any of the pixels in the mask has any land in it: either just intersecting, completely covered, or contains an island that does not intersect. The mask is then converted to integers which are stored in the a roaring bitmap / treemap: https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L137.

When roaring landmask checks a point it first checks the mask, if the test is negative it is certain that the point is in the ocean and can return. If not, it needs to consult the shapes since the points may be on land or not.

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2024

I was thinking earlier that this project may be of use for that: https://github.com/regionmask/regionmask (#18 )

@TheSylex
Copy link
Author

TheSylex commented Sep 9, 2024

@gauteh I've already resolved the conflicts with your latest changes.
I think I've figured out how to correctly make the bitmask, and have correctly esparated both shapes and masks.
I've added the scripts that are needed to generate both the landmasks and the bitmasks into the roaring_scripts.py file.
Would you kindly check the PR out again? Thanks :)

@gauteh
Copy link
Owner

gauteh commented Sep 9, 2024

Awesome! Looking forward to seeing how you solved this, will take a look tomorrow.

@gauteh
Copy link
Owner

gauteh commented Sep 10, 2024

This looks very promising! I am missing a python test and benchmark: https://github.com/gauteh/roaring-landmask/blob/main/tests/test_roaring.py to see how the provider should be specified from python (or are you planning to do this in the next round?).

It would be useful to plot the two masks + difference between them to see how big the difference is, and it would also be a good test to see if anything stands out.

@gauteh
Copy link
Owner

gauteh commented Sep 10, 2024

There are a few errors in the the tests related to the provider argument: https://github.com/gauteh/roaring-landmask/actions/runs/10789004492/job/29921006432?pr=30#step:7:448. To run the tests requiring nightly rust you do:

cargo test -r --features nightly,static --verbose

@TheSylex
Copy link
Author

I've fixed the tests, although I couldn't run them locally.
I've also added some images depicting both landmasks better.
And finally, I'll leave here an image with the difference between both landmasks:

difference

@gauteh
Copy link
Owner

gauteh commented Sep 10, 2024

That looks very promising, the difference looks as expected (no shift in coordinates). Some new errors in CI. You may need rust nightly to test.

@TheSylex
Copy link
Author

🎉 tests are passing!
Any next steps before merging?

@gauteh
Copy link
Owner

gauteh commented Sep 10, 2024

Great! Are any of the commits duplicating the datafiles? They should be squashed, to prevent the repo from becoming too big. I can try later if you are unsure how to do this. I think the Python bindings + tests need some updates, but we can do that in the next step.

TheSylex and others added 2 commits September 11, 2024 10:55
ci: pypi token

update deps

disable par test

fix deps to main

fmt

ci: also run rust workflows

set correct semver version

updated bitmask and added the creating/conversion scripts

updated bitmask and added the creating/conversion scripts

.

fixed tests and added images for both masks

fix errors related to paths
@gauteh gauteh merged commit ea99fcc into gauteh:main Sep 11, 2024
9 of 15 checks passed
@gauteh
Copy link
Owner

gauteh commented Sep 11, 2024

Merged! I fixed the python tests to now also testing OSM.

@TheSylex
Copy link
Author

any estimate about when we'll get this into a pypi release?

@TheSylex TheSylex deleted the add_osm_landmask branch September 11, 2024 09:12
@TheSylex TheSylex restored the add_osm_landmask branch September 11, 2024 09:12
@gauteh
Copy link
Owner

gauteh commented Sep 11, 2024

If all tests go through it should be on its way: https://github.com/gauteh/roaring-landmask/actions/runs/10808495500

@TheSylex
Copy link
Author

Looks like there's two problems...

The first one with memory in the x86 version:
rustc-LLVM ERROR: out of memory

and then something seemingly related to compiling geos and numpy deps:

      gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/tmp/pip-build-env-1ll5ewre/overlay/lib/python3.12/site-packages/numpy/_core/include -I/io/.venv/include -I/usr/include/python3.12 -c src/c_api.c -o build/temp.linux-x86_64-cpython-312/src/c_api.o
      error: command 'gcc' failed: No such file or directory

Not really sure why any of these are happening now but didn't happen before. Any insights on what you changed regarding the CI on this PR?

@gauteh
Copy link
Owner

gauteh commented Sep 11, 2024

Memory happens now and then. I had accidentally disabled Python test. The fails are not related to your code changes. I will try to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants