Skip to content

DDPG/TD3/SAC for robotics tasks #65

Answered by Toni-SM
sjywdxs asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @sjywdxs

This is an interesting topic for discussion and research!

Off-policy algorithms are generally more sample-efficient than on-policy algorithms because they can learn from a broader set of experiences (off-policy algorithms can learn from experiences generated by any policy, not just the policy currently being executed). The problem with running multiple environments in parallel is that the samples collected are strongly correlated in time, and for each time step, a lot of information (equal to the number of environments) accumulates. Off-policy algorithms are more suitable for problems with a small number of environments, at least with standard sampling memories.

However, for R…

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@psh9002
Comment options

@Toni-SM
Comment options

Answer selected by sjywdxs
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants