Skip to content

Commit

Permalink
[DOCS] Update Microsoft Fabric tutorial with Spark properties (#1388)
Browse files Browse the repository at this point in the history
* Add the spark properties

* Refactor the doc

* Update docs/setup/fabric.md

Co-authored-by: John Bampton <jbampton@users.noreply.github.com>

---------

Co-authored-by: John Bampton <jbampton@users.noreply.github.com>
  • Loading branch information
jiayuasu and jbampton committed Apr 30, 2024
1 parent 46db88a commit 1d1608a
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 27 deletions.
Binary file added docs/image/fabric/fabric-10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/image/fabric/fabric-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/image/fabric/fabric-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/image/fabric/fabric-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/image/fabric/fabric-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/image/fabric/fabric-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 55 additions & 27 deletions docs/setup/fabric.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,68 +4,67 @@ This tutorial will guide you through the process of installing Sedona on Microso

Go to the [Microsoft Fabric portal](https://app.fabric.microsoft.com/) and choose the `Data Engineering` option.

![](../../image/fabric/fabric-1.png)
![](../image/fabric/fabric-1.png)

## Step 2: Create a Microsoft Fabric Data Engineering environment

On the left side, click `My Workspace` and then click `+ New` to create a new `Environment`. Let's name it `ApacheSedona`.

![](../../image/fabric/fabric-2.png)
![](../image/fabric/fabric-2.png)

## Step 3: Select the Apache Spark version

In the `Environment` page, click the `Home` tab and select the appropriate version of Apache Spark. You will need this version to install the correct version of Apache Sedona.

![](../../image/fabric/fabric-3.png)
![](../image/fabric/fabric-3.png)

## Step 4: Install the Sedona Python package

In the `Environment` page, click the `Public libraries` tab and then type in `apache-sedona`. Please select the appropriate version of Apache Sedona. The source is `PyPI`.

![](../../image/fabric/fabric-4.png)
![](../image/fabric/fabric-4.png)

## Step 5: Save and publish the environment
## Step 5: Set Spark properties

Click the `Save` button and then click the `Publish` button to save and publish the environment. This will create the environment with the Apache Sedona Python package installed. The publishing process will take about 10 minutes.

![](../../image/fabric/fabric-5.png)
In the `Environment` page, click the `Spark properties` tab, then create the following 3 properties:

## Step 6: Download Sedona jars
- `spark.sql.extensions`: `org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions`
- `spark.serializer`: `org.apache.spark.serializer.KryoSerializer`
- `spark.kryo.registrator`: `org.apache.sedona.core.serde.SedonaKryoRegistrator`

1. Learn the Sedona jars you need from our [Sedona maven coordinate](maven-coordinates.md)
2. Download the `sedona-spark-shaded` jars from [Maven Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay attention to the Spark version and Scala version of the jars. If you select Spark 3.4 in the Fabric environment, you should download the Sedona jars with Spark 3.4 and Scala 2.12 and the jar name should be like `sedona-spark-shaded-3.4_2.12-1.5.1.jar`.
3. Download the `geotools-wrapper` jars from [Maven Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you should download the `geotools-wrapper` jar with version 1.5.1 and the jar name should be like `geotools-wrapper-1.5.1-28.2.jar`.
![](../image/fabric/fabric-5.png)

## Step 7: Upload Sedona jars to the Fabric environment LakeHouse storage
## Step 6: Save and publish the environment

In the notebook page, choose the `Explorer` and click the `LakeHouses` option. If you don't have a LakeHouse, you can create one. Then choose `Files` and upload the 2 jars you downloaded in the previous step.
Click the `Save` button and then click the `Publish` button to save and publish the environment. This will create the environment with the Apache Sedona Python package installed. The publishing process will take about 10 minutes.

After the upload, you should be able to see the 2 jars in the LakeHouse storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the paths are
![](../image/fabric/fabric-6.png)

```angular2html
abfss://9e9d4196-870a-4901-8fa5-e24841492ab8@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar
## Step 7: Find the download links of Sedona jars

abfss://9e9d4196-870a-4901-8fa5-e24841492ab8@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar
```
1. Learn the Sedona jars you need from our [Sedona maven coordinate](maven-coordinates.md)
2. Find the `sedona-spark-shaded` jar from [Maven Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay attention to the Spark version and Scala version of the jars. If you select Spark 3.4 in the Fabric environment, you should download the Sedona jars with Spark 3.4 and Scala 2.12 and the jar name should be like `sedona-spark-shaded-3.4_2.12-1.5.1.jar`.
3. Find the `geotools-wrapper` jar from [Maven Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you should download the `geotools-wrapper` jar with version 1.5.1 and the jar name should be like `geotools-wrapper-1.5.1-28.2.jar`.

![](../../image/fabric/fabric-6.png)
The download links are like:

![](../../image/fabric/fabric-7.png)
```
https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar
https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar
```

## Step 8: Start the notebook with the Sedona environment and install the jars

In the notebook page, select the `ApacheSedona` environment you created before.

![](../../image/fabric/fabric-8.png)
![](../image/fabric/fabric-9.png)

In the notebook, you can install the jars by running the following code. Please replace the `spark.jars` with the `ABFS` paths of the 2 jars you uploaded in the previous step.
In the notebook, you can install the jars by running the following code. Please replace the `jars` with the download links of the 2 jars from the previous step.

```python
%%configure -f
{
"conf": {
"spark.jars": "abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar",
}
"jars": ["https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar", "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar"]
}
```

Expand All @@ -85,4 +84,33 @@ sedona.sql("SELECT ST_GeomFromEWKT('SRID=4269;POINT(40.7128 -74.0060)')").show()

If you see the output of the point, then the installation is successful.

![](../../image/fabric/fabric-9.png)
![](../image/fabric/fabric-10.png)

## Optional: manually upload Sedona jars to the Fabric environment LakeHouse storage

If your cluster has no internet access or you want to skip the slow on-the-fly download, you can manually upload the Sedona jars to the Fabric environment LakeHouse storage.

In the notebook page, choose the `Explorer` and click the `LakeHouses` option. If you don't have a LakeHouse, you can create one. Then choose `Files` and upload the 2 jars you downloaded in the previous step.

After the upload, you should be able to see the 2 jars in the LakeHouse storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the paths are

```angular2html
abfss://9e9d4196-870a-4901-8fa5-e24841492ab8@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar
abfss://9e9d4196-870a-4901-8fa5-e24841492ab8@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar
```

![](../image/fabric/fabric-7.png)

![](../image/fabric/fabric-8.png)

If you use this option, the config files in your notebook should be

```python
%%configure -f
{
"conf": {
"spark.jars": "abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar",
}
}
```

0 comments on commit 1d1608a

Please sign in to comment.