Background

Background Event reasoning is a fundamental skill required across multiple applications, involving event schema knowledge for global reasoning and the handling of diverse inter-event relations and paradigms. The extent to which LLMs can perform event reasoning across different relations and paradigms is not well understood.
Existing Work Despite a wide evaluation of LLMs' abilities in reasoning, such as commonsense reasoning, sentence relations, and mathematical reasoning, comprehensive studies on event reasoning within LLMs are rare. Current work mainly focuses on instance-level events and doesn't clearly define how LLMs understand and utilize event schema knowledge, overlooking the diversity of relations and paradigms.

Core Contributions

Introduced a new benchmark, EV2
- Challenge 1: Understanding and Utilizing Event Schema Knowledge The researchers pinpoint that while LLMs are capable of event reasoning, their performance is far from satisfying, with imbalances across different relations and reasoning paradigms. Moreover, even though LLMs possess event schema knowledge, their alignment with humans in utilizing this knowledge is off. To address this, the researchers introduce two methods to guide LLMs in leveraging event schema knowledge.
- Challenge 2: Measuring LLMs' Event Reasoning Capabilities EV2 evaluates LLMs' capabilities in event reasoning across two dimensions: schema and instance. At the schema level, EV2 assesses how LLMs handle event schema knowledge; at the instance level, EV2 verifies their event reasoning capabilities. To evaluate across various types of relations and reasoning paradigms, EV2 includes two tasks: Contextual Event Classification (CEC) and Contextual Relation Reasoning (CRR).

Implementation and Deployment

Extensive experiments on EV2 have revealed insights into event reasoning: LLMs have the capability for event reasoning but are lacking in different relations and reasoning paradigms; They possess event schema knowledge and can respond to schema-level questions with similar accuracy to instance-level ones but schema-level abilities lag behind instance-level capabilities. LLMs are not aligned with humans in leveraging event schema knowledge. Based on these findings, two methods were designed to guide LLMs to utilize event schema knowledge. It was found that LLMs perform better in event reasoning when explicit guidance in leveraging event schema knowledge is provided, especially with significant improvements under direct guidance.

Summary

This paper comprehensively evaluates the event reasoning capabilities of LLMs by introducing a new benchmark, EV2. The findings suggest that despite having capabilities for event reasoning, LLMs do not align with humans in using event schema knowledge, and with explicit guidance, they can better understand and execute event reasoning tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2404.17513.md

2404.17513.md

Background

Core Contributions

Implementation and Deployment

Summary

Files

2404.17513.md

Latest commit

History

2404.17513.md

File metadata and controls

Background

Core Contributions

Implementation and Deployment

Summary