Skip to content

Latest commit



1372 lines (684 loc) · 95.4 KB

File metadata and controls

1372 lines (684 loc) · 95.4 KB

Reinforcement Learning Resources


Articles and tutorials

...More Reinforcement Learning articles on DeepMind site

Partially Observable Markov Processes articles

Multi-Agent Reinforcement Learning (MARL) articles

Reinforcement Learning in Supply Chain Management

introductory material to relevant algorithms by Wouter van Heeswijk

Why Hasn’t Reinforcement Learning Conquered The World (Yet)?, Wouter van Heeswijk, Medium

The Four Policy Classes of Reinforcement Learning, Wouter van Heeswijk, Medium

Policy Gradients In Reinforcement Learning Explained, Wouter van Heeswijk, Medium

Proximal Policy Optimization (PPO) Explained, Wouter van Heeswijk, Medium

Dynamic Pricing with Contextual Bandits: Learning by Doing, Massimiliano Costacurta, Medium

related repo:

related docs:

related video: PyData Tel Aviv Meetup: Contextual Bandit for Pricing - Daniel Hen & Uri Goren

A Unified Framework for Stochastic Optimization, Warren B. Powell, Princeton, 2017

Tutorial on Stochastic Optimization in Energy II: An energy storage illustration, Warren B. Powell, 2015

Challenges of Real World Reinforcement Learning, Gabriel Dulac-Arnold, 2019

From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions, Warren B. Powell, Princeton, 2019

Sequential Decision Analytics for the Truckload Industry, Warren B. Powell, Optimal Dynamics, 2022

Stochastic Optimization, James C. Spall, John Hopkins U., 2012

How to Improve your Supply Chain with Deep Reinforcement Learning with Christian Hubbs, Medium

Deep reinforcement learning for supply chain and price optimization, Ilya Katsov, 2020, blog

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization, Oroojlooyjadid et al, 2020

A Deep Reinforcement Learning Approach to Supply Chain Inventory Management, Francesco Stranieri, 2022

Optimization of Apparel Supply Chain Using Deep Reinforcement Learning, JW. Chong et al, IEEE, 2022

Reinforcement learning for supply chain optimization, L. Kemmer et al, 2018

Deep Reinforcement Learning hands-on for Optimized Ad Placement with NandaKishore Joshi

related repo: ad placement example

link to the book Reinforcement Learning in Action

Online Algorithms and solving them with Reinforcement Learning

The k-server problem: Researchers Refute a Widespread Belief About Online Algorithms, Quanta Magazine, 2023

The Randomized k-Server Conjecture Is False!, S. Bubeck et al, 2023

The Online K-Server Problem, Aris Floratos, Ravi Boppana, Courant Institute, NUY

Decision Transformers - Reinforcement Learning via Sequence Modeling

Decision Transformer: Reinforcement Learning via Sequence Modeling, Lily Chen et al, UC Berkeley, 2021

Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained), Yannick Kilcher, 2022, youtube video

Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling, 2021, youtube video

Decision transformer: Reinforcement Learning via Sequence Modeling, Youseff Fathi CS 885: Reinforcement Learning, U. of Waterloo, 2022

Online Decision Transformer, Q. Zheng et al, 2022

Offline Reinforcement Learning as one Big Sequence Modeling Problem, M. Janner et al, 2021

Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them To Actions, Juergen Schmidhuber, Tech Report, 2020

Training Agents using Upside Down Reinforcement Learning, R. Srivastava et al, 2021

RvS: What is Essential for Offline RL via Supervised Learning, Scott Emmons et al, 2022

Reinforcement Learning in Large Language Models and related algorithms

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, A. Ahmadian et al, 2024

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Yevgen Chebotar et al, DeepMind, 2023

related repo:

Basics of Reinforcement Learning for LLMs with Cameron Wolfe, medium

related paper: An Elementary Proof that Q Learning Converges Almost Surely, Matthew T. Regehr, Alex Ayoub, U of Alberta, 2021

related paper: Deep Reinforcement Learning for Autonomous Driving: A Survey, BR Kiran et al, 2021

related paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, J. Devlin et al, Google, 2021

related paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Y. Bai et al, Anthropic, 2022

related paper: Playing Atari with Deep Reinforcement Learning, V. Mnih et al, DeepMind, 2013

related paper: Distilling the Knowldege in a Neural Network, G. Hinton et al, Google, 2015

related paper: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Antti Tarvainen et al, The Curious AI Company, 2018

related paper: Llama 2: Open Foundation and Fine-Tuned Chat Models, H. Touvron et al, MetaAI, 2023

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Evan Hubinger et al, Anthropic, 2024

Reinforcement Learning from Human Feedback (RLHF)

Deep Reinforcement Learnng from Human Preferences, Paul Christiano et al, OpenAI, 2017

Algorithms for Inverse Reinforcement Learning, Andrew Ng, Stuart Russel, Stanford, 2000

Training Language Models to Follow Instructions With Human Feedback, L. Ouyang et al, OpenAI, 2022

Fine Tuning Language Models from Human Preferences, Daniel M. Ziegler et al, OpenAI, 2020

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Y. Bai et al, Anthropic, 2022

Learning to Summarize from Human Feedback, Nisan Stiennon et al, OpenAI, 2022

Illustrating Reinforcement Learning from Human Feedback (RLHF), Hugging Face article, 2022, Nathan Lambert, Louis Castricato, Leandro von Werra , Alex Havrilla

Learning from human preferences, Dario Amodei, OpenAI blog, 2017

Reinforcement Learning fro Human Feedback, Wikipedia

A General Theoretical Paradygm to Understand Learning from Human Preferences, M. Azar et al, Google DeepMind, 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafel Rafailov et al, Stanford U., 2023

SLiC-HF: Sequence Likelihood Calibration with Human Feedback, Y. Zhao et al, Google Deepmind, 2023

KTO: Model Alignment as Prospect Theoretic Optimization, K. Ethayarajh et al, Stanford U., 2024

ORPO: Monolythic Preference Optimization without Reference Model, Hong, 2024

Human-like Reasoning via Reinforcement Learning and Representation Learning

Adaptive Reinforcement Learning, RL applied to Bayesian Networks

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning, B. Huang et al, CMU, 2022

AdaRL repo:

Swarm-based Reinforcement Learning and its applications in Robotics

Deep Reinforcement Learning for Swarm Systems, Maximilian Hüttenrauch et al, U. of Lincoln, 2019

Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution, Zahi Kakish et al, ASU, 2021

Reinforcement learning for swarm robotics: An overview of applications, algorithms and simulators, MA Blaise, Moulay A. Akhlouf, 2023

Maximum diffusion Reinforcement Learning

Maximum diffusion reinforcement learning, Thomas Beurreta et al, Northwestern U., 2023

Deep Reinforcement Learning for Physical Applications

Predicting disruptive instabilities in controlled fusion plasmas through deep learning, Julian Kates-Harbeck et al, Harvard U, 2019

Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning, Jonas Degrave, 2021

First Nuclear Plasma Control with Digital Twin, Sabine Hossenfelder, Feb 2024, youtube video

Reinforcement Learning for Physical Dynamical Systems: An Alternative Approach: Reintroducing genetic algorithms and comparing to neural networks, Robert Etter, 2024, Towards Data Science

Online tutorials and short readings

OpenAI resources:

DeepMind resources:

Computational Neuroscience Lab's resources:

Richard Sutton's online posts

Andrej Karpathy's blog


Python-based tools, techniques and design pattersn for Reinforcement Learning projects

PyLessons online tutorials using OpenAI Gym environment resources:

StudyWolf's resources:

Jeff Bradberry's blog

online lecture videos

Game Theory Resources



online lecture videos

Online tutorials and short readins
