Potential issue in value iteration algorithm #20

racheltrimble · 2022-06-16T10:43:19Z

Hi h2r,

I'd like to query whether the state used to calculate the value of the subtrees in the value iteration algorithm should be sp rather than s:

pomdp-py/pomdp_py/algorithms/value_iteration.pyx

Line 48 in df59f26

subtree_value = self.children[o].values[s] # corresponds to V_{oi(p)} in paper

So:
subtree_value = self.children[o].values[sp] # corresponds to V_{oi(p)} in paper

Rather than:
subtree_value = self.children[o].values[s] # corresponds to V_{oi(p)} in paper

The source paper uses s' when calculating V_{oi(p)} and this is also my understanding of the child nodes starting from "the state that this node brought you to".

I'm new to the area, to the library and to the etiquette of issues on GitHub so apologies in advance if this is my misunderstanding.

Many thanks for your useful and clearly structured library.

Rachel

zkytony · 2022-06-16T11:58:37Z

Thanks for the report! You are correct. It should be the value at the next state.

Will fix this. In the mean time, if you need to run value iteration, you can:

use the "vi_pruning" algorithm listed here. This is an interface for the pomdp-solve package in C++; You can define the POMDP in pomdp-py, and this function converts it to the POMDP file format and feeds it to pomdp-solve, and then parses the output policy. The documentation contains an example. You can also find an example in this test case.
use the qvalue function in value_function.py to all Q(b,a) and select the action with highest value.

The ValueIteration planner that was implemented in pomdp-py is quite naive as there is no pruning. It only works for super small problems. I implemented it as an exercise when reading the '98 paper, but never really used it in practice.

Hope this helps!

racheltrimble · 2022-06-16T12:05:15Z

Great - thanks for the quick confirmation. I realise the unpruned algorithm isn't going to scale far but I was using it to hunt for mistakes in my transition matrix setup before looking at any of the more advanced algorithms. I'm looking at the link to the Cassandra stuff now so I can still progress.

Cheers!

Rachel

zkytony · 2022-06-16T12:08:31Z

Interesting, if it is about debugging transition matrix, I think you can just verify if the transitions are correct (like assert if the next state is what you expect given state and action, for many samples), without having to run a solver?

zkytony · 2022-06-16T12:12:16Z

Also, if it is explicit matrix definition of the POMDP, pomdp-py has some convenient classes like TabularTransitionModel.

This gist is an example of using these tabular classes for the crying-baby problem https://gist.github.com/zkytony/51d43ee6818375434eb3b84a77a47a5c

racheltrimble · 2022-06-16T12:16:37Z

I meant more wrt the integration into the framework. I basically just hacked your tiger example to match my problem to see if I could get something to run to get started with the tools.

Thanks for the links to the templates though - I hadn't spotted those.

* allow updating rollout policy * NotImplemented->NotImplementedError in oopomdp.pyx * s -> sp in ValueIteration (#20) * add __init__ signature for Environment in comments to be visible in docs * added float_precision argument to to_pomdp_file (#29) * add readme instruction to run load_unload * oops - correction * Fix cpdef on variable * version bump; sorting MANIFEST.in * fix setup cython extension in pomdp_problems * auto-generate manifest to include .pyx * update changelog * Fix wheel build * docs html build * change set to list in tiger to tame random.sample in python 3.11 * docs html build * bump minimum python requirement * update docs accordingly --------- Co-authored-by: Juan Jesús Torre Tresols <juanjesustorre@gmail.com> Co-authored-by: Jiuguang Wang <jiuguangw@gmail.com>

zkytony closed this as completed Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential issue in value iteration algorithm #20

Potential issue in value iteration algorithm #20

racheltrimble commented Jun 16, 2022

zkytony commented Jun 16, 2022

racheltrimble commented Jun 16, 2022

zkytony commented Jun 16, 2022

zkytony commented Jun 16, 2022

racheltrimble commented Jun 16, 2022

Potential issue in value iteration algorithm #20

Potential issue in value iteration algorithm #20

Comments

racheltrimble commented Jun 16, 2022

zkytony commented Jun 16, 2022

racheltrimble commented Jun 16, 2022

zkytony commented Jun 16, 2022

zkytony commented Jun 16, 2022

racheltrimble commented Jun 16, 2022