-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questionable result in Gamblers Problem Solution #172
Comments
The book does mention that there is a family of ideal solutions, not just one. Perhaps this is another valid solution from the family? I'm not too sure but it sounds plausible. |
That makes sense. But, to be honest, I don't understand why both policies (the noisy one and the one from the book) are optimal.
Does anyone know the answer to this exercise? |
I think I do have one answer to that, you can check easily: Put the discount factor to ANYTHING lower than 1, and you get pretty much what you expected here : a pyramid with a maximum value at 50. On the other hand, if the p_h is <0.5 and gamma =1 , the algorithm doesn't really mind taking its time to win, and so it tries to maximize value at every step: In particular, at step 50, it doesn't matter if you lose 1 or lose 49, so you might as well go for as much as possible to maximize the potential value. on the other hand, on step 51 you could lose 1 then get back to step 50 and try again. As I see it, the cyclical values are a balance between betting on winning and betting on losing. |
@ArikVoronov That has confused me for a long time, thx for your explanation. |
Why the figures of 'value estimate' and 'final policy' become so weird when the 'ph' is bigger than 0.5? |
The graph you're getting with the erroneous spikes in the first post are probably about floating point arithmetic. I was able to output the graph that looked just like the textbook by comparing values that are rounded to ten digits when taking the max over the sum for updating the value function |
The book gets the solution it does because it chooses the least amount to bet rather than the most (using argmax) There's an alternative optimal solution that looks like an upside-down V (obtained by using argmax but choosing the highest of equal bets rather than the lowest). And, there are mixes of those strategies. When you have an amount of money less than $50, a winning strategy is always to bet it all, but you can also use the value from the figure in the book. They key is to minimize the total number of bets (since you don't have an edge). The original figure shown by bminixhofer is a combination of the two: as you can see it is a mixture of the book figure along with an upside-down V. As stated earlier, this is due to slightly different floating-point values for what should actually end up as equal values. Thus, sometimes that figure shows the lower of the two equally-valid bets and sometimes it shows the higher. |
The original figure shown by bminixhofer is actually correct, as there are a family of correct solutions. Since the gambler doesn't have an edge, the optimal policy is to minimize the number of bets. With $51, optimal policy is either:
The book shows a policy that always shows the minimum optimal bet. Here's a figure that shows the maximum optimal bet: Note that bminixhofer's figure is a mix between the two bets. That's likely caused, as sanakdag points out, because the two actions that should have identical value aren't exactly identical (off in one of the least significant digits). Thus, the figure is showing whichever action happens to be slightly higher. Rounding does address this issue. Huixxi asks why the figure looks weird when ph is > 0.5. That's because the winning strategy is to maximize the number of bets. Thus, no matter what the capital stake, bet $1. |
Wow! This was the issue I had. The plot in the book assumes that you choose the least amount when outcomes are equal, except that floating point issues sometimes make the larger bet amount better in the calculation. Thanks! |
Can confirm the floating point issue - I rounded to 7 digits and now my graph looks like the book.... |
A bit late to the discussion, but the reason is as posted before, with some more details:
It's important to note that the action value for a given action might be a very little bit smaller than another action, but extremely close. Choosing the exact optimal smallest action may not be a graph like the book (it gives the graph of the solution of this repository), but the value of the policy actions like shown in the book are so close to the optimal that they may be considered equal. You can add a value (some epsilon variable or the like) with a very small value to ignore such small differences in the floating point values. I made a python notebook based on this exercise that gives the original graph of this repository (exact best smallest stake), the piramid graph (approximate, almost exact, best highest stake) and the graph of the book (approximate, almost exact, best smallest stake). You can see it in the link (I used a parameter named https://github.com/lucasbasquerotto/ml-demos/blob/main/rl/others/DP/Gamblers%20Problem.ipynb |
I encountered an interesting issue in the Gamblers Problem notebook .
The graph of capital vs policy looks like this with
p_h = 0.4
:In the book (on page 66), we see how the graph is supposed to look. The general notion appears to be the same, but the strange spikes in the notebook do not appear:
If we reduce theta in
value_iteration_for_gamblers
from 0.0001 to 1e-30 the spikes appear to become less, but they are still significant:Does anyone know why this happens?
Thanks!
The text was updated successfully, but these errors were encountered: