Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments #1

Closed
javabean68 opened this issue Sep 2, 2018 · 2 comments
Closed

Experiments #1

javabean68 opened this issue Sep 2, 2018 · 2 comments
Assignees

Comments

@javabean68
Copy link

javabean68 commented Sep 2, 2018

Hallo Maxim,

your book is awesome. I gave it 5 stars on O'Reilly Safari. I modified something in Chapter04/02_frozenlake_naive and after adding it, it seems to converge:

`class FrozenLakeRewardWrapper(gym.RewardWrapper):
def init(self, env):
super(FrozenLakeRewardWrapper, self).init(env)

def reward(self, reward):
    if reward == 0:
        return 1
    else:
        return 2`

I don't know actually what happens :-)

How can I then visualize the images / videos which are created? I uncommented the line:
env= gym.wrappers.Monitor(env, directory="mon", force=True)
and I got some files in mon Folder (e.g. openaigym.episode_batch.0.8090.stats.json) but I have no idea how to play/see them...

Could you give me a tip? Thank you so much!
Regards
Fabio
`

@Shmuma
Copy link
Collaborator

Shmuma commented Sep 5, 2018

Hi Fabio!

thanks for your feedback!

Regarding your question: by changing reward, you're effectively giving the agent more and more reward with every step, which makes it motivated to walk around the frozen lake rather than finding the solution of the problem.

To illustrate, let's consider two episodes (reaching winning goal) in old reward scheme:
0 -> 0 -> 0 -> 0 -> 0 -> 1
0 -> 0 -> 1

If gamma < 1, second episode will give the agent higher reward, which will push it towards reaching terminate state in a shorter time.

With new reward scheme, the same episodes will look like this:
1 -> 1 -> 1 -> 1 -> 1 -> 2
1 -> 1 -> 2

In this case, second episode could give the agent smaller total discounted reward (it depends on actual gamma setting), which will push the agent into totally different objective: walk around keeping episodes as long as possible to get more and more reward.

In regards to getting the images for an environment state, you can call env.render() function, which will return you the array with current game position. What you need to do is to output this position to the screen or into the file. Unfortunately, Monitor class doesn't seem to support capturing such text-based environment at the moment.

@Shmuma Shmuma closed this as completed Sep 5, 2018
@javabean68
Copy link
Author

Hi Maxim
I used env.render() before your Advise and...the Agent goes really Forward without reaching the end! Your Explanation is amazing!

I have ordered your book in Amazon as well and I'll give it a super Evaluation also there!

Thank you very much: the first book in this area which tries to explain Things and isn't a mere copy of articles on Internet..

Ciao
Fabio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants