spark-dataframes

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

streaming consumer parquet kafka-producer spark-sql spark-kafka-integration spark-streaming-data spark-transformations spark-to-cassandra-connection spark-dataframes spark-joins spark-hive-context spark-jdbc-connection spark-with-mangodb spark-aggregations-using-dataframe spark-use-cases cassandra-installation spark-datadog spark-mangodb spark-catalog-api

Updated Nov 16, 2022
Scala

neerajkesav / SparkJavaExamples

Star

Apache Spark Basics - Java Examples

java spark apache-spark hadoop hdfs sparkjava spark-java rdd sparkcontext spark-transformations spark-dataframes flatmap spark-example learn-spark spark-actions spark-basics javardd

Updated Sep 9, 2016
Java

NashTech-Labs / Sparkathon

Star

A library having Java and Scala examples for Spark 2.x

scala spark apache-spark spark-streaming java-8 rdd spark-sql spark-mllib spark-dataframes spark-ml knoldus spark-dataset spark-structured-streaming

Updated Dec 29, 2016
Java

yennanliu / spark-etl-pipeline

Star

Various data stream/batch process demo with Apache Scala Spark 🚀

docker dockerfile scala twitter spark apache-spark sbt pipeline stream-processing sbt-plugin spark-streaming sbt-assembly spark-sql spark-dataframes spark-batch spark-rdd

Updated Feb 28, 2020
Scala

jkoth / Data-Lake-with-Spark-and-AWS-S3

Star

Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster

apache-spark aws-s3 aws-emr pyspark data-engineering data-lake json-format udacity-nanodegree spark-dataframes dimensional-model star-schema etl-pipeline

Updated Oct 10, 2019
Python

the-timoye / spark-examples

Star

python spark data-wrangling spark-sql spark-dataframes data-engin

Updated Sep 3, 2023
Python

rajeshsantha / MonitoredStructuredStreaming

Star

Repository for Spark structured streaming use case implementations.

scala kafka apache-spark spark-streaming spark-dataframes spark-streaming-kafka spark-structured-streaming

Updated Apr 13, 2020
Scala

airztz / Python4fun

Star

Some batch processing demos with various data warehouses like local, S3 and HDFS in AWS

aws-s3 batch-processing pandas-dataframes spark-dataframes hadoop-hdfs

Updated Feb 27, 2018
Python

chinmayms / propinvestment

Star

Predict Current Property Investment opportunities using Data Analysis (Big Data Spark ML)

django spark apache pandas spark-dataframes spark-ml

Updated Jun 18, 2017
Python

amanjeetsahu / Apache-Spark-using-Scala

Star

This repo contains my learnings and practices Zepplin notebooks on Spark using Scala. All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.

machine-learning scala big-data spark machine-learning-algorithms bigdata mllib zeppelin zeppelin-notebook spark-dataframes spark-ml spark-dataset

Updated Jul 15, 2020

RahulGupta16 / Pyspark-Theory-and-Code-Basics

Star

Pyspark serves as a Python interface to Apache Spark, enabling the execution of Python and SQL-like instructions for the manipulation and analysis of data within a distributed processing framework.

sql apache-spark python3 pyspark data-engineering sparksql rdd spark-dataframes