GitHub - JaninaMattes/Autonomous-Explorer-Drone: Machine Learning Controlled Explorer Drone

Autonomous Explorer Drone

Exploring a learning-based method to autonomous flight.
Getting started »

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

The design of a control system for an agile mobile robot in the continuous domain is a central question in robotics. This project specifically addresses the challenge of autonomous drone flight. Model-free reinforcement learning (RL) is utilized as it can directly optimize a task-level objective and leverage domain randomization to handle model uncertainty, enabling the discovery of more robust control responses. The task analyzed in the following is a single agent stabilization task.

Drone Model and Simulation

The gym-pybullet-drones environment is based on the Crazyflie 2.x nanoquadcopter. It implements the OpenAI gym API for single or multi-agent reinforcement learning (MARL).

Fig. 1: The three types of gym-pybullet-drones models, as well as the forces and torques acting on each vehicle.

Training Result

The following shows a training result where the agent has learned to control the four independent rotors to overcome simulated physical forces (e.g. gravity) by the Bullet physics engine, stabilize and go into steady flight.

Fig. 2: Rendering of a gym-pybullet-drones stable flight with a Crazyflie 2.x during inference.

PPO Actor-Critic Architecture

In this project the policy gradient method is used for training with a custom implementation of Proximal Policy Optimization (PPO).

Fig. 3: Overview of the Actor-Critic Proximal Policy Optimisation Algorithm process

The architecture consists of two separate neural networks: the actor network and the critic network. The actor network is responsible for selecting actions given the current state of the environment, while the critic network is responsible for evaluating the value of the current state.

The actor network takes the current state $s_t$ as input and outputs a probability distribution over the possible actions $a_t$. The network is trained using the actor loss function, which encourages the network to select actions that have a high advantage while also penalizing actions that deviate too much from the old policy. The loss function is defined as follows:

$$ L^{actor}(\theta) = \mathbb{E}_{t} \left[ \min\left(r_t(\theta) \hat{A}_t, \text{clip}\left(r_t(\theta), 1-\epsilon, 1+\epsilon\right) \hat{A}_t \right) \right] $$

where $r_t(\theta) = \frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{old}}(a_t|s_t)}$ is the probability ratio of the new and old policies, $\hat{A}_t$ is the estimated advantage function, and $\epsilon$ is a hyperparameter that controls how much the new policy can deviate from the old policy.

The critic network takes the current state $s_t$ as input and outputs an estimate of the value of the state $V_{\theta}(s_t)$. The network is trained using the critic loss function, which encourages the network to accurately estimate the value of the current state, given the observed rewards and the estimated values of future states. The loss function is defined as follows:

$$ L^{critic}(\theta) = \mathbb{E}{t} \left[ \left(V{\theta}(s_t) - R_t\right)^2 \right] $$

where $R_t$ is the target value for the current state, given by the sum of the observed rewards and the estimated values of future states.

Action and Observation Space

The observation space is defined through the quadrotor state, which includes the position, linear velocity, angular velocity, and orientation of the drone. The action space is defined by the desired thrust in the z direction and the desired torque in the x, y, and z directions.

Reward Function

The reward function defines the problem specification as follows:

$$ \text{Reward} = \begin{cases} -5, & \text{height} < 0.02 \\ -\frac{1}{10 \cdot y_{pos}}, & \text{height} \geq 0.02 \end{cases} $$

where $y_{pos}$ is the current height of the drone. The reward function encourages the drone to maintain a certain height while also penalizing excessive movement in the y-axis.

(back to top)

PyBullet Environment & Drone

Environment

The environment is a custom OpenAI Gym environment built using PyBullet for multi-agent reinforcement learning with quadrotors.

Fig. 4: 3D simulation of the drone's orientation in the x, y, and z axes.

PID Controller

stabilize drone flight

(back to top)

Built With

The project was developed using Python and the PyTorch machine learning framework. To simulate the quadrotor's environment, the Bullet physics engine is leveraged. Further, to streamline the development process and avoid potential issues, the pre-built PyBullet drone implementation provided by the gym-pybullet-drones library is utilized.

Programming Languages-Frameworks-Tools

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Requirements and Installation

This repository was written using Python 3.10 and Anaconda tested on macOS 14.4.1.

Installation

Major dependencies are gym, pybullet, stable-baselines3, and rllib

Create virtual environment and install major dependencies

 $ pip3 install --upgrade numpy matplotlib Pillow cycler 
 $ pip3 install --upgrade gym pybullet stable_baselines3 'ray[rllib]'

or requirements.txt

$ pip install -r requirements_pybullet.txt

Video recording requires to have ffmpeg installed, on macOS
```
$ brew install ffmpeg
```
or on Ubuntu
```
$ sudo apt install ffmpeg
```
The gym-pybullet-drones repo is structured as a Gym Environment and can be installed with pip install --editable
```
$ cd gym-pybullet-drones/
$ pip3 install -e .
```

(back to top)

Roadmap

Add Changelog
Add back to top links
Fix sparse reward issue by adding prox rewards
Adjustment of the reward function to achieve the approach of a target
Implement in Unity with ML agents
Adjust Readme file

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Project Link: Autonomous-Explorer-Drone

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflow		.github/workflow
docs		docs
gym_pybullet_drones		gym_pybullet_drones
img		img
results		results
src		src
unity_mlagent_drones		unity_mlagent_drones
.DS_Store		.DS_Store
BLANK_README.md		BLANK_README.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Get_Started.md		Get_Started.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
requirements_pybullet.txt		requirements_pybullet.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Explorer Drone

About The Project

Drone Model and Simulation

Training Result

PPO Actor-Critic Architecture

Action and Observation Space

Reward Function

PyBullet Environment & Drone

Environment

PID Controller

Built With

Getting Started

Requirements and Installation

Installation

Roadmap

License

Contact

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

JaninaMattes/Autonomous-Explorer-Drone

Folders and files

Latest commit

History

Repository files navigation

Autonomous Explorer Drone

About The Project

Drone Model and Simulation

Training Result

PPO Actor-Critic Architecture

Action and Observation Space

Reward Function

PyBullet Environment & Drone

Environment

PID Controller

Built With

Getting Started

Requirements and Installation

Installation

Roadmap

License

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages