Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gambler's Problem: 0 Stake Allowed? #223

Open
mparigi opened this issue Apr 9, 2020 · 1 comment
Open

Gambler's Problem: 0 Stake Allowed? #223

mparigi opened this issue Apr 9, 2020 · 1 comment

Comments

@mparigi
Copy link

mparigi commented Apr 9, 2020

In the solution, it says "Your minimum bet is 1". However, the specification says "The actions are stakes, a ∈ {0, 1, . . . , min(s, 100 − s)}", implying a bet of 0 is fine. Which is correct?

@lucasbasquerotto
Copy link

A bit late to the discussion, but in this problem it's actually not advisable to use 0 as stake because it's an undiscounted MDP.

A stake of 0 gives a reward of 0, which might be ok in a discounted MDP, because the return decreases with time steps, but not in an undiscounted MDP (gamma = 1), especially because the reward is 0 and it ends in the same state, because it will end up considering 0 as a best action (it ends in the same state, and there's no cost because the reward is 0 and is undiscounted, so it has the same value).

If there's a negative reward for the action, or it was a discounted case, it would be ok.

To give it a bit of perspective, you can consider the following cases for a capital of 99 (only stakes of 0 and 1 are allowed):

  • You define the stake as 1 (either win or ends with a capital of 98): this has some value (based on the return, that is defined based on the probabilities of each case (ph and 1 - ph) and rewards, and the values of the next states, which is not relevant here).
  • You define a stake of 0, ends up with 99 of capital (the same as before), repeat betting with a stake of 0 one million times, ends up still with 99, then you define a stake of 1: this has the same value as the previous case (because it's an undiscounted MDP and a stake of 0 gives a reward of 0 and ends in the same state, creating a loop).

That said, you might be able to use 0 as an action if you always consider the highest stake for the policy (the stake 0 might be the best, but not the unique best action: there will be at least one more stake that is the best too). I haven't tried doing this, tough.

You can see differences in policies for the same values, considering the smallest or highest possible stake, at the other issue regarding the same exercise: #172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants