Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 2.24 KB

2312.09238.md

File metadata and controls

20 lines (15 loc) · 2.24 KB

Background

  • Background This paper discusses challenges in learning complex tasks in Minecraft due to the reliance on sparse rewards by traditional reinforcement learning agents.

  • Existing Work The existing dense reward signals proposed for efficient learning in specific sparse reward tasks do not adequately address the complexity and long-horizon tasks in Minecraft, highlighting the limitations of current methods.

Core Contributions

  • Introduced Auto MC-Reward, an advanced learning system
    • Challenge 1: Sparsity of reward signals and the complexity of the environment and tasks Auto MC-Reward leverages Large Language Models (LLMs) to automate the design of dense reward functions, uses Reward Designer, Reward Critic, and Trajectory Analyzer to encode Python functions from environment information and task descriptions, and iteratively refines the reward function through self-verification, error checking, and feedback.

    • Challenge 2: Limitations of existing methods By utilizing the task understanding and experience summarization capabilities of LLMs to provide detailed and immediate learning guidance rewards, Auto MC-Reward resolves the limitations of existing methods by analyzing collected trajectories and automatically assisting the Reward Designer in improving the reward functions.

Implementation and Deployment

Auto MC-Reward significantly improved the success rate and learning efficiency in complex Minecraft tasks such as obtaining diamonds efficiently, avoiding lava, and exploring sparse elements in plains biomes. Experiments demonstrate its superior performance over original sparse and existing dense reward methods, reflecting its capacity to enhance learning efficiency in sparse reward tasks. Additionally, Auto MC-Reward achieved a 36.5% success rate in obtaining diamonds using only raw information, proving its effectiveness in solving long-horizon tasks.

Summary

Auto MC-Reward is an advanced learning system that uses LLMs to automatically design dense rewards for Minecraft tasks. By leveraging LLMs' abilities to understand tasks and summarize experience, it effectively improves agents' learning of new behaviors and completion of long-term tasks in complex environments.