-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential issue in value iteration algorithm #20
Comments
Thanks for the report! You are correct. It should be the value at the next state. Will fix this. In the mean time, if you need to run value iteration, you can:
The ValueIteration planner that was implemented in pomdp-py is quite naive as there is no pruning. It only works for super small problems. I implemented it as an exercise when reading the '98 paper, but never really used it in practice. Hope this helps! |
Great - thanks for the quick confirmation. I realise the unpruned algorithm isn't going to scale far but I was using it to hunt for mistakes in my transition matrix setup before looking at any of the more advanced algorithms. I'm looking at the link to the Cassandra stuff now so I can still progress. Cheers! Rachel |
Interesting, if it is about debugging transition matrix, I think you can just verify if the transitions are correct (like assert if the next state is what you expect given state and action, for many samples), without having to run a solver? |
Also, if it is explicit matrix definition of the POMDP, pomdp-py has some convenient classes like TabularTransitionModel. This gist is an example of using these tabular classes for the crying-baby problem https://gist.github.com/zkytony/51d43ee6818375434eb3b84a77a47a5c |
I meant more wrt the integration into the framework. I basically just hacked your tiger example to match my problem to see if I could get something to run to get started with the tools. Thanks for the links to the templates though - I hadn't spotted those. |
* allow updating rollout policy * NotImplemented->NotImplementedError in oopomdp.pyx * s -> sp in ValueIteration (#20) * add __init__ signature for Environment in comments to be visible in docs * added float_precision argument to to_pomdp_file (#29) * add readme instruction to run load_unload * oops - correction * Fix cpdef on variable * version bump; sorting MANIFEST.in * fix setup cython extension in pomdp_problems * auto-generate manifest to include .pyx * update changelog * Fix wheel build * docs html build * change set to list in tiger to tame random.sample in python 3.11 * docs html build * bump minimum python requirement * update docs accordingly --------- Co-authored-by: Juan Jesús Torre Tresols <juanjesustorre@gmail.com> Co-authored-by: Jiuguang Wang <jiuguangw@gmail.com>
Hi h2r,
I'd like to query whether the state used to calculate the value of the subtrees in the value iteration algorithm should be sp rather than s:
pomdp-py/pomdp_py/algorithms/value_iteration.pyx
Line 48 in df59f26
So:
subtree_value = self.children[o].values[sp] # corresponds to V_{oi(p)} in paper
Rather than:
subtree_value = self.children[o].values[s] # corresponds to V_{oi(p)} in paper
The source paper uses s' when calculating V_{oi(p)} and this is also my understanding of the child nodes starting from "the state that this node brought you to".
I'm new to the area, to the library and to the etiquette of issues on GitHub so apologies in advance if this is my misunderstanding.
Many thanks for your useful and clearly structured library.
Rachel
The text was updated successfully, but these errors were encountered: