Experiments #1

javabean68 · 2018-09-02T11:53:00Z

Hallo Maxim,

your book is awesome. I gave it 5 stars on O'Reilly Safari. I modified something in Chapter04/02_frozenlake_naive and after adding it, it seems to converge:

`class FrozenLakeRewardWrapper(gym.RewardWrapper):
def init(self, env):
super(FrozenLakeRewardWrapper, self).init(env)

def reward(self, reward):
    if reward == 0:
        return 1
    else:
        return 2`

I don't know actually what happens :-)

How can I then visualize the images / videos which are created? I uncommented the line:
env= gym.wrappers.Monitor(env, directory="mon", force=True)
and I got some files in mon Folder (e.g. openaigym.episode_batch.0.8090.stats.json) but I have no idea how to play/see them...

Could you give me a tip? Thank you so much!
Regards
Fabio
`

The text was updated successfully, but these errors were encountered:

Shmuma · 2018-09-05T06:52:06Z

Hi Fabio!

thanks for your feedback!

Regarding your question: by changing reward, you're effectively giving the agent more and more reward with every step, which makes it motivated to walk around the frozen lake rather than finding the solution of the problem.

To illustrate, let's consider two episodes (reaching winning goal) in old reward scheme:
0 -> 0 -> 0 -> 0 -> 0 -> 1
0 -> 0 -> 1

If gamma < 1, second episode will give the agent higher reward, which will push it towards reaching terminate state in a shorter time.

With new reward scheme, the same episodes will look like this:
1 -> 1 -> 1 -> 1 -> 1 -> 2
1 -> 1 -> 2

In this case, second episode could give the agent smaller total discounted reward (it depends on actual gamma setting), which will push the agent into totally different objective: walk around keeping episodes as long as possible to get more and more reward.

In regards to getting the images for an environment state, you can call env.render() function, which will return you the array with current game position. What you need to do is to output this position to the screen or into the file. Unfortunately, Monitor class doesn't seem to support capturing such text-based environment at the moment.

javabean68 · 2018-09-06T13:32:08Z

Hi Maxim
I used env.render() before your Advise and...the Agent goes really Forward without reaching the end! Your Explanation is amazing!

I have ordered your book in Amazon as well and I'll give it a super Evaluation also there!

Thank you very much: the first book in this area which tries to explain Things and isn't a mere copy of articles on Internet..

Ciao
Fabio

techsachinkumar assigned Shmuma Sep 3, 2018

Shmuma closed this as completed Sep 5, 2018

alanballard mentioned this issue Jul 3, 2020

Drastic drop in GPU speed after approximately 10 games #80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments #1

Experiments #1

javabean68 commented Sep 2, 2018 •

edited

Loading

Shmuma commented Sep 5, 2018 •

edited

Loading

javabean68 commented Sep 6, 2018

Experiments #1

Experiments #1

Comments

javabean68 commented Sep 2, 2018 • edited Loading

Shmuma commented Sep 5, 2018 • edited Loading

javabean68 commented Sep 6, 2018

javabean68 commented Sep 2, 2018 •

edited

Loading

Shmuma commented Sep 5, 2018 •

edited

Loading