-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training multiple models in same environment #1094
Comments
Hello, if you need something more custom, then you will probably need to fork SB3. |
Thank you for your reply. The reason behind this is because I am kind of combining two different action types when my network outputs both actions. To give a solid example, you may think about the output layer of a policy deciding on both X-Y-Z coordinate displacements and at the same time deciding which robot to apply these displacements, assuming you have multiple robots present. So the first 3 nodes are for the displacement, and the remaining nodes do a binary encoding for deciding on which robot to apply these displacements. I was just thinking that two individual policies would better fit such a scenario. Would training a single policy for this purpose work anyway? |
Then it is a duplicate of #527 You can also discretize a continuous output if you want to try things quickly.
I would expect similar performance as your using the same data to train both. |
Hello,
I want to train an agent that takes actions from two independent networks. So the final action of the agent would be the concatenation of these two independent policies. The point is that the same reward function will be used to train these policies, and the reward could only be calculated when both of the actions from these independent networks are available, and because of this I cannot train the first model and then move on to the next model.
Would there be any suggestion for this?
The text was updated successfully, but these errors were encountered: