Chaper 14 Deterministic policy gradients results are quite noisy. #86

isu10503054a · 2020-10-27T09:00:58Z

In the results of Chapter 14 Deterministic policy gradients in the book,
why the training is not very stable and noisy?

I read the content repeatedly, but I still don’t understand why.

Shmuma · 2020-10-27T12:34:27Z

Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity вт, 27 окт. 2020 г., 12:01 isu10503054a <notifications@github.com>:

…

In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? I read the content repeatedly, but I still don’t understand why. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ> .

isu10503054a · 2020-10-28T08:41:16Z

Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity вт, 27 окт. 2020 г., 12:01 isu10503054a notifications@github.com:
…
In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? I read the content repeatedly, but I still don’t understand why. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ .

Is there any hyperparameter in the source code that can modification to improve this situation?
thx

Shmuma · 2020-10-28T10:23:47Z

Tons of :). In fact any constant in the code could be seen as hyperparameter: * learning rate * gamma * amount of environments * optimisation method etc, etc, etc

…

On Wed, Oct 28, 2020 at 11:41 AM isu10503054a ***@***.***> wrote: Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity вт, 27 окт. 2020 г., 12:01 isu10503054a ***@***.***: … <#m_7201119268102051534_> In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? I read the content repeatedly, but I still don’t understand why. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86 <#86>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ . Is there any Hyperparameter in the source code that can modification to improve this situation? thx — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#86 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQE2WD2H3KPPJI7OQAZQLSM7KLXANCNFSM4TAQL7BQ> .

-- wbr, Max Lapan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chaper 14 Deterministic policy gradients results are quite noisy. #86

Chaper 14 Deterministic policy gradients results are quite noisy. #86

isu10503054a commented Oct 27, 2020 •

edited

Loading

Shmuma commented Oct 27, 2020 via email

isu10503054a commented Oct 28, 2020 •

edited

Loading

Shmuma commented Oct 28, 2020 via email

Chaper 14 Deterministic policy gradients results are quite noisy. #86

Chaper 14 Deterministic policy gradients results are quite noisy. #86

Comments

isu10503054a commented Oct 27, 2020 • edited Loading

Shmuma commented Oct 27, 2020 via email

isu10503054a commented Oct 28, 2020 • edited Loading

Shmuma commented Oct 28, 2020 via email

isu10503054a commented Oct 27, 2020 •

edited

Loading

isu10503054a commented Oct 28, 2020 •

edited

Loading