Skip to content

Latest commit

 

History

History
74 lines (63 loc) · 5.46 KB

jumanji_rware_comparison.md

File metadata and controls

74 lines (63 loc) · 5.46 KB

Differences in performance using Jumanji's version of RWARE

There is a core difference in the way collisions are handled in the stateless JAX-based implementation of RWARE (called RobotWarehouse) found in Jumanji and the original RWARE environment.

As mentioned in the original repo, collisions are handled as follows:

The dynamics of the environment are also of particular interest. Like a real, 3-dimensional warehouse, the robots can move beneath the shelves. Of course, when the robots are loaded, they must use the corridors, avoiding any standing shelves.

Any collisions are resolved in a way that allows for maximum mobility. When two or more agents attempt to move to the same location, we prioritise the one that also blocks others. Otherwise, the selection is done arbitrarily. The visuals below demonstrate the resolution of various collisions.

In contrast to the collision resolution strategy above, the current version of the Jumanji implementation will not handle collisions dynamically but instead terminates an episode upon agent collision. In our experience, this appeared to make the task at hand more challenging and made it easier for agents to get trapped in local optima where episodes are never rolled out for the maximum length.

To investigate this, we ran our algorithms on a version of Jumanji's RWARE where episodes do not terminate upon agent collision, but rather multiple agents are allowed to occupy the same grid position. This setup is not identical to that of the original environment but represents a closer version to its dynamics, allowing agents to easily reach the end of an episode.

Please see below for Mava's recurrent and feedforward implementations of IPPO and MAPPO on the regular version of Jumanji as well as the adapted version of Jumanji without termination upon agent collision.

Mava ff mappo tiny 2ag Mava ff mappo tiny 4ag Mava ff mappo small 4ag

Mava feedforward MAPPO performance on the tiny-2ag, tiny-4ag and small-4ag RWARE tasks.

Mava ff ippo tiny 2ag Mava ff ippo tiny 4ag Mava ff ippo small 4ag

Mava feedforward IPPO performance on the tiny-2ag, tiny-4ag and small-4ag RWARE tasks.

Mava rec ippo tiny 2ag Mava rec ippo tiny 4ag Mava rec ippo small 4ag

Mava recurrent IPPO performance on the tiny-2ag, tiny-4ag and small-4ag RWARE tasks.

Mava rec mappo tiny 2ag Mava rec mappo tiny 4ag Mava rec mappo small 4ag

Mava recurrent MAPPO performance on the tiny-2ag, tiny-4ag and small-4ag RWARE tasks.