Background

Background The paper discusses AI assistants built on large language models (LLMs) and their struggle to accurately handle knowledge-intensive tasks, resulting in factual errors.
Existing Work The paper points out that current AI assistants have "hallucination" issues, including factual errors in their responses, which reduce the credibility of AI assistants and can harm society. Existing methods have not effectively tackled the challenge of AI assistants recognizing and admitting what they don't know.

Core Contributions

Created a model-specific "I don't know" (Idk) dataset
Challenge 1: Constructing the Idk dataset The challenge is how to enable the AI assistant to recognize and admit the scope of its knowledge. The paper determines whether the assistant knows the answer to a question by evaluating its average accuracy across multiple responses.
Challenge 2: Aligning with the Idk dataset The challenge is ensuring that the AI assistant selectively remains silent on unknown questions. The methods used include prompting, supervised fine-tuning, and preference-aware optimization.

Implementation and Deployment

According to the experimental results, after alignment with the Idk dataset, the assistant could refuse to answer most of the questions it did not know. The accuracy for the attempted answers significantly improved. The paper used prompting, supervised fine-tuning (SFT), and preference-aware optimization to guide the AI assistant in identifying and refusing the questions it does not know. The specific results showed changes in the classification of knowledge quadrants before and after adjustment with the Idk dataset, indicating the effectiveness of the methods in increasing the accuracy of knowledge-intensive questions.

Summary

This paper focuses on the capacity of AI assistants to recognize their knowledge boundaries and by constructing an Idk dataset and aligning the assistant accordingly, the paper achieves making AI assistants recognize and admit what they don’t know, reducing factual errors in their responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2401.13275.md

2401.13275.md

Background

Core Contributions

Implementation and Deployment

Summary

Files

2401.13275.md

Latest commit

History

2401.13275.md

File metadata and controls

Background

Core Contributions

Implementation and Deployment

Summary