Cloud-based AI / ML workflow and data application development framework
-
Updated
Aug 20, 2024 - Python
Cloud-based AI / ML workflow and data application development framework
Terraform module to create AWS EMR resources 🇺🇦
This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
The project will utilize Airflow to orchestrate and manage the data pipeline as it creates and terminates an EMR transient cluster to save on cost. Apache Spark will transform data, and the final dataset will be loaded into Snowflake.
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
Analyzing Spark Cluster Performance in Amazon EMR
Analysis and monitoring system using AWS... Also the comp4442 project
Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.
Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.
Utilize Apache Spark for ETL processes to prepare data, followed by the construction of a Machine Learning model for Natural Language Processing (NLP) classification. Subsequently, deploy the model within a Gradio web application for seamless interaction.
Technology blogging website from Siby Abin. Talks about dataengineering, aws, spark, python, airflow and more
Completed a big data project using Hadoop, HBase, and Sqoop to ingest, process, and analyze a large dataset of taxi ride data on an AWS EMR cluster. Developed MapReduce codes to perform a variety of tasks. Exported the results of each MapReduce task to an RDS instance for visualization and analysis.
Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.
My AWS Playground
Add a description, image, and links to the aws-emr topic page so that developers can more easily learn about it.
To associate your repository with the aws-emr topic, visit your repo's landing page and select "manage topics."