Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 2.8 KB

2405.14751.md

File metadata and controls

20 lines (15 loc) · 2.8 KB

Background

  • Background Recent advances in LLMs have demonstrated their impressive capabilities in instruction following, reasoning, and zero-shot learning, fuelling the development of autonomous agents powered by LLMs, also referred to as LLM agents. The integration and end-to-end optimization of the components for LLM agents, like planning, reflection, tool-use, and lifelong learning, remain unresolved.

  • Existing Work Current works have suggested several key components or workflows to enhance the LLM agents' abilities. However, the challenge lies in consolidating these components into a unified framework and optimizing them from an end-to-end perspective.

Core Contributions

  • Introduced AGILE, a new LLM agent framework that Interacts and Learns from Environments
    • Challenge 1: Unification and Optimization AGILE integrates four key modules: LLM, memory, tools, and executor, providing a cohesive architecture. The framework utilizes a novel end-to-end training method based on reinforcement learning. This method not only trains the policy for invoking various modules but also the LLM agent's abilities for reasoning, planning, reflection, and proactively seeking advice from human experts when encountering unsolvable issues.

    • Challenge 2: Benchmarking for Complex Question Answering Existing QA benchmarks typically test for specific abilities, failing to assess the agent's integrated performance across all modules and capabilities simultaneously. To tackle this, the team developed a new benchmark named ProductQA for a comprehensive evaluation of the agent's abilities across various aspects including handling past interactions, accumulated knowledge, utilizing tools, human interaction, self-assessment, and reflection.

Implementation and Deployment

The AGILE agent framework was evaluated on two tasks—ProductQA and MedMCQA—and demonstrated the capability of outperforming GPT-4 agents when trained with PPO on 13B and 7B LLMs. Ablation studies revealed the critical contribution of memory, tool use, consultation, reflection, and reinforcement learning to the agent’s robust performance. Removing tools or memory resulted in a significant increase in the frequency of seeking advice and a decrease in the overall score. Furthermore, PPO training showed its importance by achieving an increase in the total score relative to the baseline achieved via imitation learning.

Summary

The paper proposed a new framework for LLM agents known as AGILE, which streamlines different components and leverages reinforcement learning to achieve end-to-end training. The framework showcases superior performance in complex QA tasks, underscoring the efficacy of component integration and end-to-end optimization. The release of the dataset and code encourages further research in this area.