This is a repository for exploring Reinforcement Learning algorithms and applications. Currently, the repository includes a Jupyter Notebook that demonstrates the Multi-armed Bandit problem, a classic Reinforcement Learning problem that involves balancing exploration and exploitation.
The Multi-armed Bandit problem involves making a trade-off between exploration and exploitation when selecting from multiple options (arms) with different reward distributions. The notebook in this repository demonstrates the problem and introduces the Upper Confidence Bound (UCB) algorithm as a solution.
In the future, we plan to explore various strategies for solving the Multi-armed Bandit problem, including:
- Epsilon-Greedy algorithm
- Thompson Sampling
- Bayesian Bandits
- Gradient Bandit algorithms
Each strategy will be implemented and compared to the UCB algorithm in terms of performance and complexity.
The notebook in this repository requires the following dependencies:
- Python 3.x
- Jupyter Notebook
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
To run the notebook and explore the Multi-armed Bandit problem, follow these steps:
- Clone this repository to your local machine.
- Install the required dependencies listed above.
- Open a terminal or command prompt and navigate to the directory containing the repository.
- Launch Jupyter Notebook by entering the command
jupyter notebook
in the terminal. - Open the
multiarmed_bandit.ipynb
notebook in your browser. - Run the notebook and experiment with the problem and the UCB algorithm.
We welcome contributions to this repository in the form of pull requests or issues. If you find a bug, have a feature request, or want to contribute code, please feel free to open an issue or submit a pull request.
This repository is licensed under the MIT License. See the LICENSE file for details.