Background

Background The paper discusses how Large Language Models (LLMs) employ In-context Learning (ICL) to learn to handle new tasks. ICL involves using a series of training instances as prompts to achieve this. However, existing methods for selecting ICL instances often require a lot of time and are costly, limiting their practical applicability.
Existing work Past studies have attempted to reduce annotation costs by selecting a small subset of unlabeled instances for annotation. The goal of selecting these instances is diversity and representativeness. Although these methods outperform random selection, they have apparent shortcomings in computational efficiency.

Core Contributions

Introduced a Fast Graph-based Annotation Selection (FastGAS) method
Challenge 1: How to improve the diversity and representativeness of instance selection FastGAS uses a graph partitioning algorithm to divide the data similarity graph into different segments, with each segment viewed as a set of instances, thus ensuring the diversity of the selected instances. It selects instances with the maximum node degree in each segment, ensuring the representativeness of the selected instances.
Challenge 2: Minimize the time required for instance selection as much as possible FastGAS employs a multi-level graph bisection algorithm to accelerate the graph partitioning process, followed by a simple yet effective greedy algorithm for selecting instances within each segment. By applying the greedy algorithm on each component, compared to baseline methods which iteratively select over the entire graph, FastGAS significantly reduces computation time.

Implementation and Deployment

The effectiveness of FastGAS was evaluated on various datasets covering different task categories, and it outperformed established baselines in identifying an optimal selective annotation subset. The experiments compared FastGAS with other selective annotation methods such as Vote-k and IDEAL, as well as other widely recognized methods such as Top-degree, PageRank, etc. The results showed that FastGAS performed better than these two existing baselines in most scenarios (13 out of 14). Specifically, with an annotation budget of 18, all annotated examples fit within the prompt limitations of LLMs, making the evaluation results a direct reflection of the quality of selected instances. When the annotation budget is 18, FastGAS performed better than the baseline on most datasets, indicating that FastGAS can select higher-quality data. Moreover, FastGAS has drastically reduced the time cost compared to existing methods.

Summary

The FastGAS approach proposed in the paper significantly improves the diversity and representativeness of selected instances for ICL while also considerably reducing the time and computational resources required. The experimental results verify its efficiency and efficacy on multiple datasets, demonstrating its potential as an effective instance selection method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2406.0373.md

2406.0373.md

Background

Core Contributions

Implementation and Deployment

Summary

Files

2406.0373.md

Latest commit

History

2406.0373.md

File metadata and controls

Background

Core Contributions

Implementation and Deployment

Summary