Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
-
Updated
Jun 3, 2024 - Python
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Implementations and examples of common offline policy evaluation methods in Python.
SCOPE-RL: A python library for offline reinforcement learning, off-policy evaluation, and selection
(WSDM2022 Best Paper Award Runner-Up) "Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model"
Representation Learning for OPE
(KDD2023) "Off-Policy Evaluation of Ranking Policies under Diverse User Behavior"
Off-Policy Interval Estimation withConfounded Markov Decision Process
(NeurIPS2023) "Future-Dependent Value-Based Off-Policy Evaluation in POMDPs"
Robust Offline Reinforcement Learning with Heavy-Tailed Rewards
Implementation of "Deeply-Debiased Off-Policy Interval Estimation" (ICML, 2021) in Python
Official implementation for "On the Reuse Bias in Off-Policy Reinforcement Learning" (IJCAI 2023)
Stateful implementations of OPE algorithms, designed for use in the development of offline RL models
Omitting-States-Irrelevant-to-Return Importance Sampling estimator for off-policy evaluation
Implementation of Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings (NeurIPS, 2021) in Python
Implementation of "A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes" (ICML)
Implementation of "Off-Policy Interval Estimation with Confounded Markov Decision Process" (JASA, 2022+)
HOPES: HVAC optimization with Off-Policy Evaluation and Selection
Add a description, image, and links to the off-policy-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the off-policy-evaluation topic, visit your repo's landing page and select "manage topics."