Background

Background The paper discusses the challenges in summarizing source code using Large Language Models (LLMs), which include the necessity of specialized knowledge and an understanding of LLMs. Existing methods like manually designed prompts or task-oriented fine-tuning can improve performance but are costly and require substantial training resources.
Existing Work Existing work has been challenging for research teams due to the requirement to construct high-cost training environments and the need for specialized knowledge, and manually designed prompts may not always be effective.

Core Contributions

Introduced a new prompt learning framework called PromptCS
- Challenge 1: Requirement of specialized knowledge and complex manual prompt design PromptCS simplifies the process of generating source code summaries, eliminating the need for complex manual prompt design by lowering the demand for training resources in a non-invasive manner. It retains LLM parameters static during training, only updating the prompt agent's parameters.
- Challenge 2: High training costs PromptCS significantly reduces training costs compared to the task-oriented fine-tuning approach. For instance, when using the StarCoderBase-7B LLM, PromptCS takes about 67 hours to train the prompt agent, whereas task-oriented fine-tuning takes about 211 hours for one epoch of training.

Implementation and Deployment

Extensive experiments were conducted on PromptCS, and its performance was evaluated on a widely used benchmark dataset. The results demonstrated that across all evaluation metrics (i.e., BLEU, METEOR, ROUGE-L, and SentenceBERT), PromptCS significantly outperformed the instruction prompting schemes with zero-shot and few-shot learning and matched the task-oriented fine-tuning scheme. On certain LLMs such as StarCoderBase-1B and StarCoderBase-3B, PromptCS even surpassed the fine-tuning scheme. The code for PromptCS has been made open source to support further research and application.

Summary

This paper introduced a novel PromptCS framework for source code summarization, capable of generating high-quality summaries while reducing training costs and providing open-source code for further research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2312.16066.md

2312.16066.md

Background

Core Contributions

Implementation and Deployment

Summary

Files

2312.16066.md

Latest commit

History

2312.16066.md

File metadata and controls

Background

Core Contributions

Implementation and Deployment

Summary