fix: autoreset wrappers #223

sash-a · 2024-02-12T12:15:43Z

There was an issue with the autoreset wrappers: they never showed you the final timestep. This is an issue if the final timestep is a truncation (discount = 1, timestep.last() = true), then we'd want this observation in order to get the next value for our value target, but currently this would not be possible.

Gym fixes does it in the same way I am proposing as can be seen here they place the final observation in infos. What this PR does is always place either the current observation or the terminal observation in timestep.extras["real_next_obs"]

Here's how I'm currently using for SAC (and it's working well there):

    def step(
        action: Array, obs: Observation, env_state: State, buffer_state: BufferState
    ) -> Tuple[Array, State, BufferState, Dict]:
        """Given an action, step the environment and add to the buffer."""
        env_state, timestep = jax.vmap(env.step)(env_state, action)
        next_obs = timestep.observation
        rewards = timestep.reward
        terms = ~(timestep.discount).astype(bool)
        infos = timestep.extras

        real_next_obs = infos["real_next_obs"]

        transition = Transition(obs, action, rewards, terms, real_next_obs)
        buffer_state = rb.add(buffer_state, transition)

        return next_obs, env_state, buffer_state, infos["episode_metrics"]

jumanji/wrappers.py

clement-bonnet · 2024-03-04T13:09:20Z

Hi Sasha,
This issue seems a bit related to #106.
Returning the reset state/observation instead of the terminal state/observation when auto-resetting has always been the desired feature. This is because none of the Jumanji environments uses truncation, so one does not need the terminal state to train an actor-critic agent.
Now, if a user implements a new jumanji environments using the Environment abstraction and other tools from Jumanji, including truncation, one may want to use the truncated state/observation in their own training loop, which seems to be your use case, right? Passing it to the extras seems legit to me. 🙌

sash-a · 2024-03-05T08:56:57Z

Hi Sasha, This issue seems a bit related to #106. Returning the reset state/observation instead of the terminal state/observation when auto-resetting has always been the desired feature. This is because none of the Jumanji environments uses truncation, so one does not need the terminal state to train an actor-critic agent. Now, if a user implements a new jumanji environments using the Environment abstraction and other tools from Jumanji, including truncation, one may want to use the truncated state/observation in their own training loop, which seems to be your use case, right? Passing it to the extras seems legit to me. 🙌

Yup this is exactly the use case!

jumanji/wrappers.py

fix: autoreset wrappers

8bd1713

sash-a self-assigned this Feb 12, 2024

sash-a commented Feb 12, 2024

View reviewed changes

jumanji/wrappers.py Outdated Show resolved Hide resolved

sash-a mentioned this pull request Feb 12, 2024

feat: auto reset wrapper instadeepai/Mava#1017

Merged

Merge branch 'main' into fix/autoreset

52c71e4

Merge branch 'main' into fix/autoreset

34a6a72

clement-bonnet requested changes Mar 5, 2024

View reviewed changes

jumanji/wrappers.py Outdated Show resolved Hide resolved

jumanji/wrappers.py Outdated Show resolved Hide resolved

clement-bonnet reviewed Mar 5, 2024

View reviewed changes

jumanji/wrappers.py Show resolved Hide resolved

clement-bonnet and others added 2 commits March 5, 2024 15:56

Merge branch 'main' into fix/autoreset

f6db065

chore: OBS_IN_EXTRAS_KEY -> NEXT_OBS_KEY_IN_EXTRAS

06971cf

clement-bonnet approved these changes Mar 7, 2024

View reviewed changes

sash-a merged commit ce8b873 into instadeepai:main Mar 8, 2024
3 checks passed

clement-bonnet mentioned this pull request Mar 8, 2024

fix: default value for obs in extras #228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: autoreset wrappers #223

fix: autoreset wrappers #223

sash-a commented Feb 12, 2024

clement-bonnet commented Mar 4, 2024

sash-a commented Mar 5, 2024

fix: autoreset wrappers #223

fix: autoreset wrappers #223

Conversation

sash-a commented Feb 12, 2024

clement-bonnet commented Mar 4, 2024

sash-a commented Mar 5, 2024