Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 2.41 KB

2401.02115.md

File metadata and controls

20 lines (15 loc) · 2.41 KB

Background

  • Background The paper addresses the issue that Text-to-SQL models can generate candidate SQL query lists where often the best query is not at the top. An effective re-ranking method that selects the correct SQL query from these candidates could improve model performance.

  • Existing Work Past studies in code generation automatically generate test cases and utilize them to re-rank candidate codes. However, automatic test case generation for text-to-SQL is a lesser-explored area. Previous attempts at creating methods for generating test cases require manual annotation for expected execution results, or depend on test cases generated by models, both of which restrict the automation of test case generation for text-to-SQL.

Core Contributions

  • Introduced a method for automatically generating test cases for text-to-SQL
    • Challenge 1: Automatic Test Case Generation There has been a lack of methods for automatically obtaining expected execution results without knowing the correct SQL queries. The paper proposes to use LLMs for predicting expected execution outcomes, generating new databases, and investigates, through experiments, how to create databases that are easily predicted by LLMs and how to design prompts that are easy to understand.

    • Challenge 2: Re-ranking Method The best query is not often at the top of candidate lists in text-to-SQL, necessitating an effective method for choosing the correct query from the list. This paper presents a three-step re-ranking method. It classifies candidate lists by their execution results on given databases, generates a test suite with multiple test cases, and finally re-ranks the candidate lists based on the number of test cases passed and their generation probabilities, selecting the top SQL query.

Implementation and Deployment

The experimental results on the validation dataset of Spider indicate that some state-of-the-art models' performance improved by 3.6% after applying the re-ranking method. The paper leverages GPT-4-turbo and GPT-4 for generating test cases and follows two state-of-the-art models, DAIL-SQL and RESDSQL, to generate candidate lists.

Summary

This research proposes a method for automatically generating test cases for text-to-SQL using LLMs and presents a three-step re-ranking process. The method significantly improves the performance of existing text-to-SQL models, as evidenced by experiments.