Skip to content

badal-io/airflow2-local-ci-cd

Repository files navigation

Apache Airflow 2 for local development and CI/CD

=======================================================

  This project is an easy-to-use development environment for Apache Airflow version 2. It can be run locally on a variety of OS platforms with simple steps for spinning up Airflow. The project can be integrated into an automated continuous integration/continuous delivery (CI/CD) process using GCP Cloud Build to build, test, and deploy workflows into GCP Cloud Composer. The project is meant to address several "infrastructure challenges" that Airflow developers experience and allows them to focus on workflow development rather than platform installation/configuration.

  The environment is available for local use using Docker containers and Docker-Compose. For most of the deployment options, these are the only prerequisites required. The code has also been successfully tested within the GCP Cloud Shell/Editor which is an ephemeral cloud Linux instance accessible from a web browser. This may be beneficial for those who have "local PC restrictions" and cannot install docker-engine locally.

Main features of local development using Docker & Docker-Compose:

  • Your workspace files are always synchronized with docker containers. With the use of an IDE program, the development process becomes easier and faster.
  • Unit and Integration tests run within a container built from the same image as the Airflow deployment.

  The project provides an opinionated Cloud Build CI/CD pipeline for the GCP Cloud Composer service. It natively integrates with a "local Airflow" development and allows developers to automatically stage, test and deploy their code into a production environment.

Main features of Cloud Build CI/CD pipeline for Composer environment:

  • Container caching - reusing cache of already built images, speeds up the overall build process.
  • Unit & Integration test as steps in CI stage.
  • DAGs integrity validation (smoke test).
  • Code linting check.
  • Custom configuration: env variables, configuration, PyPi packages.
  • Plugin and DAGs deployment into COmposer environment.
  • Automatic email notification upon a successful build.

Project Structure

    .
    ├── ci-cd                     # CI/CD deployment configuration
    ├── dags                      # Airflow DAGs 
    ├── data                      # Airflow DATA 
    ├── docker                    # Docker configuration
    ├── gcp-cloud-shell           # Cloud Shell custom csripts
    ├── helpers                   # Backend scripts
    ├── logs                      # Airflow logs 
    ├── plugins                   # Airflow plugins
    ├── tests                     # Tests
    ├── variables                 # Varibales for environments
    ├── .gitignore                # Git's ignore process
    ├── pre-commit-config.yaml    # Pre-commit hooks
    ├── LICENSE                   # Project license
    ├── README.md                 # Readme guidlines
    └── docker-compose.yaml       # Docker Compose deployemnt code

1. Recommended dev tools to use:

  • OS: MAC OS, Linux Ubuntu, GCP Cloud Shell Note: Windows requires Windows Subsystem for Linux (WSL)
  • Code editing/Development environment: Visual Studio Code (VS Code)
  • Terminal client: Visual Studio Code terminal

Note:   Before working with your local development environment fork the repository, so you can have your own branch for development and custom changes.


2. Dependencies & prerequisities for a local PC or cloud VM:

2.1   GCP Cloud Shell:

Note:   GCP Cloud Shell has several limitations. Everytime when a shell session is expired or closed, you have to re-run the Airflow initializaiton steps given in the section #4 (step 4.1)

  • 2.1.1   Access GCP Cloud Shell from your browser using your credentials: https://ide.cloud.google.com

  • 2.1.2   Open a terminal session (Menu Terminal -- New Terminal) and clone the repo, go to the directory:

    git clone <'Airflow 2 repository'> && cd airflow2-local
    
  • 2.1.3   In the cloud shell UI, click on open folder and select the airflow2-local folder

  • 2.1.4   Run the following commands to initialize the environment and install prerequisites:

    chmod +x ./helpers/scripts/cloud-shell-init.sh && ./helpers/scripts/cloud-shell-init.sh
    
  • 2.1.5   Proceed with the installation and initialization steps ( section #3 and #4 ).

2.2   Linux OS:

  • 2.2.1   Install the latest available version of Docker: https://docs.docker.com/get-docker/

  • 2.2.2   Install the latest available version of Docker compose: https://docs.docker.com/compose/install/

  • 2.2.3   Disable docker compose v2 experimental features via the CLI, run: docker-compose disable-v2

  • 2.2.5   Proceed with the installation and initialization steps ( section #3 and #4 )

2.3   MAC OS:

  • 2.3.1   Install the latest available version of Docker Desktop: https://docs.docker.com/get-docker/

  • 2.3.2   Disable docker compose v2 experimental features via the CLI, run:

    docker-compose disable-v2
    
  • 2.3.3   Clone the repo:

    git clone <'Airflow 2 repository'>
    
  • 2.3.4   Launch Visual Studio Code and open the folder (Open folder) with the Airflow 2 code

  • 2.3.5   Open a terminal window (Menu Terminal -- New Terminal)

  • 2.3.6   Proceed with the installation and initialization steps ( section #3 and #4 )

2.4   Windows 10 OS:

  • 2.4.1   Install WSL (Windows Linux Subsystem): https://docs.microsoft.com/en-us/windows/wsl/install-win10

  • 2.4.2   Install Linux Ubuntu distribution from the Microsoft Store: https://aka.ms/wslstore (this step is part of the previous step)

  • 2.4.3   Launch WLS Ubuntu and create a username (airflow) & password (airflow) when prompted

  • 2.4.4   In the WSL terminal window go to /home/airflow and clone the repo:

    cd /home/airflow && git clone <'Airflow 2 repository'>
    
  • 2.4.5   On Windows 10, install the latest available version of Docker Desktop: https://docs.docker.com/get-docker/

  • 2.4.6   Once installed, launch Docker Desktop, go to Settings --> Resources --> WSL INTEGRATION and toggle "Ubuntu". Once done, click the "Apply & Restart" button

  • 2.4.7   Open a command line in Windows (CMD) and execute the following command to make sure that Ubuntu has been set as a default WSL:

    wsl --setdefault Ubuntu
    
  • 2.4.8   Install (if not already installed ) and launch Visual Studio Code

  • 2.4.9   From the VS code extension tab, search and install a new plugin Remote WLS

  • 2.4.10   On Visual Studio Code , you now see a green WSL indicator in the bottom left corner, click on it and choose Open Folder in WSL . Windows will prompt you to select a folder, provide the follwing path to a folder: \\wsl$\ubuntu\home\airflow , and choose the folder with the Airflow code: ( airflo2-local-cicd )

  • 2.4.11   Open a terminal session in VS code (Menu Terminal -- New Terminal) and run the WLS docker installation script:

    chmod +x ./helpers/scripts/docker-wls.sh && sudo ./helpers/scripts/docker-wls.sh
    
  • 2.4.12   Proceed with the installation and initialization steps ( section #3 and #4 )


3. Customizing Airflow Settings

3.1

  • Add your Py dependencies to the docker/requirements-airflow.txt file.

  • Adapt and install DAGs into the dags folder.

  • Adapt and install Plugins into the plugins folder.

  • Add variables to Airflow: variables\docker-airflow-vars.json file.

  • Add variables to Docker containers' ENV: variables\docker-env-vars file.

  • Add variables that contain secrets and API keys: variables\docker-env-secrets file, the file is added to the gitignore process.

  • If there is a custom Airflow configuration file ready, uncomment the line in Dockerfile in order to include it in the image: COPY airflow.cfg ${AIRFLOW_HOME}/airflow.cfg.

  • Optionally add the send_email.py dag to the .airflowignore file as this dag is only for the CI/CD part (to avoid warnings and errors during unit tests).


3.2 GCP Project ID for GCP Connection

  • Set the projet-id variable in the variables/docker-env-vars or variables/docker-env-secrets file:

    GCP_PROJECT_ID='<project-id here>'


4. First-time initialization and service start:

  • 4.1 Open a terminal and run the following commands (you may need to use sudo before the command in some cases, such as: GCP Cloud Shell, Windows WSL, Cloud Linux VMs):

    ./helpers/scripts/init_airflow.sh
    

    Note:   for GCP Cloud Shell you must re-run this command every time when a shell session is expired or ended.

  • 4.2 Open a new terminal window and run the following command to make sure that all 3 containers (webserver, scheduler, postgres_db) are running and healthy:

    docker ps
    
  • 4.3 Authentificate for GCP services, run the following script and perform the gcp authentification:

    ./helpers/scripts/gcp-auth.sh
    

    Note:   NOT required if you are working via GCP Cloud Shell option, you can skip this step.

  • Airflow 2 is UP and Running!


5. Commands for operations & maintenance:

  • To check if all 3 containers (webserver, scheduler, postgres_db) are running and healthy:

    docker ps
    
  • To stop all Airflow containers (via a new terminal session):

    docker-compose down
    
  • To start Airflow and all services:

    docker-compose up
    
  • To rebuild containers (if changes are applied on Dockerfile or Docker-Compose):

    docker-compose down
    
    docker-compose up --build
    
  • To cleaning up all containers and remove the database:

    docker-compose down --volumes --rmi all
    

6. Running commands inside a container:

  • To run unit tests navigate to the tests directory and run the following command:

    ./airflow "test command"

    example:

    cd tests && ./airflow "pytest tests/unit"
    
  • To run integration tests with GCP navigate to the tests directory and run the following command:

    ./airflow "test command"

    example:

    ./airflow "pytest --tc-file tests/integration/config.ini -v tests/integration"
    
  • To spin up an Ops container with Bash session:

    ./tests/airflow bash
    
  • To run an Airflow command within the environment, spin up an Ops container with a bash sessioin, then execute the command:

    example:

    airflow dags list
    
  • To launch a python session in Airflow:

    ./tests/airflow python
    
  • To access the Airflow Web UI:

    localhost:8080 or Web Preview (GCP Cloud Shell)


7. Code linting and stying - Pre-commit

  • 7.1 Install pre-commit app:

    • For Linux/Windows

      pip3 install pre-commit

    • For MAC-OS

      brew install pre-commit

  • 7.2 Run a pre-commit initialization command (inside the same dir where the code was cloned):

    pre-commit install
    
  • 7.3 Run pre-commit tests:

    pre-commit run --all-files
    

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published