You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In MarkovVectorEnv.step() dones returned by the parallel env are transformed to numpy.uint8 with the following lines:
dns = np.array(
[dones.get(agent, False) for agent in self.par_env.possible_agents],
dtype=np.uint8,
)
Why is this the case?
This behavior might lead to unexpected consequences.
For example, if we want to do parameter sharing with StableBaselines3, we wrap the parallel env into a MarkovVectorEnv with pettingzoo_env_to_vec_env_v1() and then use concat_vec_envs_v1() as seen in the tutorial. However, if we want to use batch normalization with StableBaselines3.VecNormalize, we get a silent bug, as the line
self.returns[dones] = 0
in VecNormalize.step_wait() then always sets the returns of env_0 to 0 if any done is False.
The text was updated successfully, but these errors were encountered:
Oh, wow, did not realize that sb3 did stuff like that. The reason for this is that the memory layout of np.bool type wasn't super obvious, so to make things a bit less ambiguous, and in the meantime, I decided it would be easier to understand the cross-process data transfer if the data type was uint8.
However, you make a good point that before it is returned to the user, it should probably be converted to a bool.
In
MarkovVectorEnv.step()
dones returned by the parallel env are transformed to numpy.uint8 with the following lines:Why is this the case?
This behavior might lead to unexpected consequences.
For example, if we want to do parameter sharing with StableBaselines3, we wrap the parallel env into a MarkovVectorEnv with
pettingzoo_env_to_vec_env_v1()
and then useconcat_vec_envs_v1()
as seen in the tutorial. However, if we want to use batch normalization with StableBaselines3.VecNormalize, we get a silent bug, as the linein
VecNormalize.step_wait()
then always sets the returns of env_0 to 0 if any done is False.The text was updated successfully, but these errors were encountered: