Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 2.11 KB

2312.03987.md

File metadata and controls

20 lines (15 loc) · 2.11 KB

Background

  • Background The paper discusses entity resolution (ER), a key task in data cleaning and integration, with significant applications across domains like healthcare, finance, customer relationship management, law enforcement, and more.

  • Existing Work Existing solutions predominantly employ deep learning approaches using transformer-based models trained on extensive datasets with numerous labeled entity pairs. The main issue with these approaches is their monetary cost, as they require providing a task description and a set of demonstrations for each query, making it very expensive for large-scale ER tasks.

Core Contributions

  • Introduced a Batch Prompting Framework named BATCHER
    • Challenge 1: Developing a cost-effective batch prompting method The researchers introduce BATCHER, consisting of demonstration selection and question batching, and explore various design choices supporting batch prompting for ER. They specifically investigate how to select a batch of questions and a set of demonstrations that can cover all questions to optimally guide Large Language Models (LLMs) in providing answers.

    • Challenge 2: Achieving an effective balance between matching accuracy and monetary cost The team developed a covering-based demonstration selection strategy that selects the most representative demonstrations after determining a batch of questions, thus achieving an effective balance between matching accuracy and cost.

Implementation and Deployment

Through extensive experimentation, the researchers found that batch prompting is highly cost-effective for ER compared to PLM-based methods extensively fine-tuned with labeled data and LLM-based methods with manually designed prompting. The results also offer guidance on how to select the appropriate design options for batch prompting.

Summary

This paper delivers a comprehensive study aimed at exploring a cost-effective batch prompting approach to entity resolution. The main contributions include the introduction of the BATCHER framework and the proposal of a covering-based demonstration selection strategy.