Zero reward in Overcooked environment regardless of algorithm/length of training #238

promiseve · 2024-05-31T20:19:52Z

Hi,
i added the overcooked env as described here:https://marllib.readthedocs.io/en/latest/handbook/env.html#id64.
However, once i running the script as described in https://marllib.readthedocs.io/en/latest/handbook/quick_start.html#id11 but will modification for the overcooked environment.
script:

`from marllib import marl

prepare env

env = marl.make_env(environment_name="overcooked", map_name="asymmetric_advantages")

initialize algorithm with appointed hyper-parameters

vdn = marl.algos.vdn(hyperparam_source="common")

build agent model based on env + algorithms + user preference

model = marl.build_model(env, vdn, {"core_arch": "mlp", "encode_layer": "128-256"})

start training

vdn.fit(env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group", checkpoint_end= True)
#render
#mappo.render(env, model, local_mode =True)`

I get a reward of 0.0 no matter how long i train or the algorithm i use.So far i have tried mappo and VDN, when i train on mpe environment , there are changes in the reward.
Please i would appreciate any ideas or suggestions.

promiseve changed the title ~~Zero reward in Overcooked environment regardless of algorithm/lenght of training~~ Zero reward in Overcooked environment regardless of algorithm/length of training May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero reward in Overcooked environment regardless of algorithm/length of training #238

Zero reward in Overcooked environment regardless of algorithm/length of training #238

promiseve commented May 31, 2024

Zero reward in Overcooked environment regardless of algorithm/length of training #238

Zero reward in Overcooked environment regardless of algorithm/length of training #238

Comments

promiseve commented May 31, 2024

prepare env

initialize algorithm with appointed hyper-parameters

build agent model based on env + algorithms + user preference

start training