Code for "Reconciling Rewards with Predictive State Representations", A. Baisero and C. Amato, IJCAI 2021.
The following packages are required, and can be found on github:
rl-parsers <https://github.com/abaisero/rl-parsers>
_gym-pomdps <https://github.com/abaisero/gym-pomdps>
_
Preferably, you should be creating an environment specifically to install the required packages first, then this one, and then run the experiments. For each of the above, in that same order:
-
move to the package directory, and install using
python -m pip install .
-
run the tests to make sure installation was correct
python -m unittest discover
The code to replicate the experiments from the IJCAI 2021 paper is contained in
the experiments/ folder. Move there, then run the following commands. NOTE:
in these scripts, bsr
stands for belief-state representation (i.e. POMDP),
psr
stands for predictive-state representation, and rpsr
stands for
reward-predictvie state representation.
File pomdps.all.txt
contains all the POMDPs tested to verify the significance
of the accuracy problem of PSRs; while pomdps.ijcai21.txt
contains the 6
domains whose PSRs are non-accurate, for which we could run value iteration,
and which form the basis for the more thorough evaluation.
-
Search for core sets of PSRs and R-PSRs:
<pomdps.ijcai21.txt ./search.local
This will compute core sets, print their ranks, and store them in
cores/
. -
Compute reward errors of PSRs and R-PSRs w.r.t. the POMDP rewards:
<pomdps.ijcai21.txt ./info.local
This will compute error measures, print them to standard output, and store them in
infos/
. -
Run POMDP-VI, PSR-VI and R-PSR-VI:
<pomdps.ijcai21.txt ./vi.local
This will run the value iteration algorithms for 150 iterations, and store the resulting value functions in
vfs/
. This is the slowest step; it will take many hours if a single machine is used. -
Plot a quasi-Bellman-error measure to check for convergence of the value functions:
<pomdps.ijcai21.txt ./plot.local
This will plot the convergence properties of the value functions, and store them in
plots/
. -
Evaluate the value functions' respective policies:
<pomdps.ijcai21.txt ./eval.local
This will run the Random, POMDP-VI, PSR-VI and R-PSR-VI policies for 100 episodes of 1000 steps each, calculate the true and estimated returns, and store them in
evals/
. -
Compile the evaluation results into tables:
<pomdps.ijcai21.txt ./tables | tee tables.tex
This will aggregate the results obtained by the evaluation step, print the results in a tex/table format, and save the results in
tables.tex
.