Azure ML and Spark

Introduction

This repo

This is an informal collection of demos around Spark on Azure ML via Azure Synapse. I do not know how to write code. Do not take a production dependency on code I write. Use Microsoft official repos and documentation instead.

Data overview

The data is a copy of the NOAA Integrated Surface Data (ISD) moved from Azure Open Datasets moved to the Azure ML workspace's default storage account.

The data is stored in both compressed parquet files and uncompressed CSV files which are ~20 GB and ~150 GB respectively. There are >1000 individual files. Loaded in a dataframe, the data is ~750 GB. There are ~1.4 B rows.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
fromdkdc/synapse		fromdkdc/synapse
01.setup.ipynb		01.setup.ipynb
02.demo.ipynb		02.demo.ipynb
03.demo.ipynb		03.demo.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure ML and Spark

Introduction

This repo

Data overview

Prerequisites

Create a Synapse Spark Pool

Create and setup compute instance

Launch JupyterLab, Jupyter, or use in inline notebook editor

Clone repository

About

Releases

Packages

Languages

lostmygithubaccount/sparky

Folders and files

Latest commit

History

Repository files navigation

Azure ML and Spark

Introduction

This repo

Data overview

Prerequisites

Create a Synapse Spark Pool

Create and setup compute instance

Launch JupyterLab, Jupyter, or use in inline notebook editor

Clone repository

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages