Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue in value iteration algorithm #20

Closed
racheltrimble opened this issue Jun 16, 2022 · 5 comments
Closed

Potential issue in value iteration algorithm #20

racheltrimble opened this issue Jun 16, 2022 · 5 comments

Comments

@racheltrimble
Copy link

Hi h2r,

I'd like to query whether the state used to calculate the value of the subtrees in the value iteration algorithm should be sp rather than s:

subtree_value = self.children[o].values[s] # corresponds to V_{oi(p)} in paper

So:
subtree_value = self.children[o].values[sp] # corresponds to V_{oi(p)} in paper

Rather than:
subtree_value = self.children[o].values[s] # corresponds to V_{oi(p)} in paper

The source paper uses s' when calculating V_{oi(p)} and this is also my understanding of the child nodes starting from "the state that this node brought you to".

I'm new to the area, to the library and to the etiquette of issues on GitHub so apologies in advance if this is my misunderstanding.

Many thanks for your useful and clearly structured library.

Rachel

@zkytony
Copy link
Collaborator

zkytony commented Jun 16, 2022

Thanks for the report! You are correct. It should be the value at the next state.

Will fix this. In the mean time, if you need to run value iteration, you can:

  • use the "vi_pruning" algorithm listed here. This is an interface for the pomdp-solve package in C++; You can define the POMDP in pomdp-py, and this function converts it to the POMDP file format and feeds it to pomdp-solve, and then parses the output policy. The documentation contains an example. You can also find an example in this test case.
  • use the qvalue function in value_function.py to all Q(b,a) and select the action with highest value.

The ValueIteration planner that was implemented in pomdp-py is quite naive as there is no pruning. It only works for super small problems. I implemented it as an exercise when reading the '98 paper, but never really used it in practice.

Hope this helps!

@racheltrimble
Copy link
Author

Great - thanks for the quick confirmation. I realise the unpruned algorithm isn't going to scale far but I was using it to hunt for mistakes in my transition matrix setup before looking at any of the more advanced algorithms. I'm looking at the link to the Cassandra stuff now so I can still progress.

Cheers!

Rachel

@zkytony
Copy link
Collaborator

zkytony commented Jun 16, 2022

Interesting, if it is about debugging transition matrix, I think you can just verify if the transitions are correct (like assert if the next state is what you expect given state and action, for many samples), without having to run a solver?

@zkytony
Copy link
Collaborator

zkytony commented Jun 16, 2022

Also, if it is explicit matrix definition of the POMDP, pomdp-py has some convenient classes like TabularTransitionModel.

This gist is an example of using these tabular classes for the crying-baby problem https://gist.github.com/zkytony/51d43ee6818375434eb3b84a77a47a5c

@racheltrimble
Copy link
Author

I meant more wrt the integration into the framework. I basically just hacked your tiger example to match my problem to see if I could get something to run to get started with the tools.

Thanks for the links to the templates though - I hadn't spotted those.

@zkytony zkytony closed this as completed Jan 30, 2023
zkytony added a commit that referenced this issue Jul 25, 2023
* allow updating rollout policy

* NotImplemented->NotImplementedError in oopomdp.pyx

* s -> sp in ValueIteration (#20)

* add __init__ signature for Environment in comments to be visible in docs

* added float_precision argument to to_pomdp_file (#29)

* add readme instruction to run load_unload

* oops - correction

* Fix cpdef on variable

* version bump; sorting MANIFEST.in

* fix setup cython extension in pomdp_problems

* auto-generate manifest to include .pyx

* update changelog

* Fix wheel build

* docs html build

* change set to list in tiger to tame random.sample in python 3.11

* docs html build

* bump minimum python requirement

* update docs accordingly

---------

Co-authored-by: Juan Jesús Torre Tresols <juanjesustorre@gmail.com>
Co-authored-by: Jiuguang Wang <jiuguangw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants