Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Policy Evaluation Solution.ipynb according to David Silver's slides. #166

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

QikeLi
Copy link

@QikeLi QikeLi commented Jul 5, 2018

The solution provided for the Policy Evaluation does not agree with the equation on page 8 of Dr. David Silvers' slides for lecture 3.

@amobiny
Copy link

amobiny commented Nov 30, 2018

What you are saying is correct, but Denny is implementing a more general case.
In fact, in David Silver slides, there's an assumption that taking an action, a, in state s will give a reward, R, no matter what the state transition is. In Denny's implementation, he takes into account that an action could result in different rewards based on what state the environment puts you in. Since this environment is deterministic, both implementation gives the same answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants