Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-328] Add SedonaPyDeck | [SEDONA-329] Remove geometry_col from SedonaKepler APIs #913

Merged

Conversation

iGN5117
Copy link
Contributor

@iGN5117 iGN5117 commented Jul 23, 2023

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

  • Add SedonaPyDeck
  • Remove geometry_col parameter from SedonaKepler APIs

How was this patch tested?

  • Wrote new tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation update.

iGN5117 added 28 commits July 3, 2023 20:33
Added a utilities file that hardcodes map config
…github.com/iGN5117/sedona into develop_Nilesh_1.5.0_NotebookVisualization

# Conflicts:
#	binder/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb
…ization

# Conflicts:
#	binder/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb
Added comments and documentation on using SedonaKepler
Added keplergl to sedona/python pipfile
Added SedonaKepler import to SedonaContext init
Added basic test cases for map created by SedonaKepler
Reverted jupyter notebook example to sedona 1.4.1 compatible example
Removed unnecessary init.py in python
Refactored method names to follow python convention
Add comprehensive test cases for SedonaKepler
…on 3.7

Updated tests
Refactored naming schemes
Refactor SedonaKepler to remove need for geometry_col in exposed APIs
Add pydeck to Pipfile
Add google buildings and chicago crimes dataset
Add tests for SedonaPyDeck
Update documentation and tests
…ization

# Conflicts:
#	docs/tutorial/sql.md
#	python/Pipfile
#	python/sedona/maps/SedonaKepler.py
#	python/sedona/maps/__init__.py
#	python/sedona/spark/__init__.py
#	python/tests/maps/test_sedonakepler_visualization.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the license of this data file? Please show me the url to its license.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 licenses to this dataset as specified here
License 1
License 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a reference to the CC BY 4.0 license here: https://github.com/apache/sedona/blob/master/LICENSE#L219

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the license of this data file? Please show me the url to its license.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the size of this test data? We usually only include a very small chunk of data to the source code repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://portal.chicagopolice.org/portal/page/portal/ClearPath
This the link to license given on the homepage, however it is not loading as of now. I have raised an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the dataset is 3.1 MB. I wanted to keep a dense dataset so as to render a good scatterplot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the license, I got a reply from the dataset owner pointing me to the terms of use of the dataset. They can be found here:
https://www.chicago.gov/city/en/narr/foia/data_disclaimer.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a copy of this license file and put it here: https://github.com/apache/sedona/tree/master/licenses

Please add a reference to this license here: https://github.com/apache/sedona/blob/master/LICENSE#L219

@@ -509,26 +509,104 @@ There are lots of other functions can be combined with these queries. Please rea

==Sedona >= 1.5.0==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add Sedona >= 1.5.0 here. Sedona website has a version selector that shows docs for a specific version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, removed

geometry_col = SedonaMapUtils.__get_geometry_col__(df)
gdf = SedonaPyDeck._prepare_df_(df, geometry_col=geometry_col)
geom_type = gdf[geometry_col][0].geom_type
# if SedonaMapUtils.__is_geom_collection__(geom_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented code unless really needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Optionally, parameters `initial_view_state`, `map_style`, `map_provider` can be passed to configure the map as per user's liking.
More details on the parameters and their default values can be found on the pydeck website.

#### Creating a Polygon map using SedonaPyDeck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SedonaPyDeck supporting all types of geometries? And what is the difference between create_geometry_map and create_scatterplot_map when it comes to point data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, create_geometry_map() has been tested on polygon, point, linestring, multiLineString, multiPolygon and geometryCollection.

I missed updating its heading in the doc, fixed it.
Regarding scatterplot and createGeometryMap, scatter plot is specialized for point datasets and allows for more customization with respect to radius of the points plotted. I realize that scatterPlot API needs more customization options, I will refactor the API and its corresponding doc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you also updated the implementation of SedonaKepler here, do you need to update its doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the documentation is already updated. Removed the parameter from the docstring now

@jiayuasu
Copy link
Member

@Kontinuation Since @iGN5117 is adding new dependencies to the Pipfile. Does he also need to add keplerGl and pydeck dependecies to setup.py? This way, when the users do pip install, these two libraries will be auto installed.

@Kontinuation
Copy link
Member

@Kontinuation Since @iGN5117 is adding new dependencies to the Pipfile. Does he also need to add keplerGl and pydeck dependecies to setup.py? This way, when the users do pip install, these two libraries will be auto installed.

It is more appropriate to make keplergl and pydeck optional dependencies and add them to extras_require. If we are intentionally making them mandatory, then we should add them to install_requires.

Update SedonaPyDeck scatterplot to add more possible customizations
Updated documentation
@iGN5117 iGN5117 changed the title [SEDONA-328] Add SedonaPyDeck [SEDONA-328] Add SedonaPyDeck | [SEDONA-329] Remove geometry_col from SedonaKepler APIs Jul 24, 2023
Added Chicago Crimes dataset terms of use and referenced them in licenses file
Referenced CC by 4.0 for Google Buildings Dataset in licenses file
LICENSE Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the format of other entries. You need to specify the path of each file that uses this license

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored, the file specifies the path of csv files in test data

@jiayuasu jiayuasu merged commit eb5b8d5 into apache:master Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants