Background

Background The paper discusses issues with computational cost and energy consumption in large language models (LLM) and aims to reduce these expenses through "skimming acceleration," which dynamically drops unimportant tokens progressively across the LLM layers while preserving semantically important tokens. However, the authors reveal for the first time that this acceleration strategy might be vulnerable to Denial-of-Service (DoS) attacks.
Existing Work Existing works have proposed various skimming-based language models to decrease the number of unimportant tokens within the computation budget. However, the decision to drop tokens could be manipulated by inconspicuous, adversarial text input changes, thus potentially increasing computational complexity—posing a serious challenge for real-time service platforms and resource-limited edge device deployments.

Core Contributions

Introduced No-Skim Framework
- Challenge 1: Measuring the robustness of acceleration schemes The No-Skim framework aims to help owners of skimming-based LLM understand and measure the robustness of their acceleration scheme. Specifically, this framework generates adversarial inputs by searching for minimal and unnoticeable perturbations at the character level and token level to increase the remaining token ratio, thus increasing computation costs and energy consumption.
- Challenge 2: Broadly evaluating the vulnerability of skimming acceleration The framework systematically evaluates the vulnerability of skimming acceleration across various LLM architectures, including BERT and RoBERTa, using the GLUE benchmark. In the worst-case scenario, the perturbations found by No-Skim could increase the running cost of LLMs by over 145% on average. Additionally, No-Skim extends the evaluation framework to various scenarios, making the evaluation conductible at different knowledge levels.

Implementation and Deployment

The No-Skim framework designed a general adversarial input generation pipeline, dividing it into three steps: (1) Word Importance Ranking to identify the most relevant words to model efficiency performance; (2) Candidate Set Generation, representing the potential search space for perturbations on original texts; and (3) Best Candidate Searching to select the most effective perturbations to increase computational complexity. The evaluation framework uses different and extensible plug-in components to process text, which ensures its generality and practical evaluation at various levels of knowledge and access to the target skimming-based language model.

Summary

This paper is the first to systematically study the vulnerability of skimming-based language models from the perspective of efficiency and proposes No-Skim, an effective and general efficiency robustness evaluation framework that generates adversarial inputs to increase computational complexity. Additionally, the framework is modularized to accommodate different plug-in modules, enabling evaluations to be conducted across three different knowledge levels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2312.09494.md

2312.09494.md

Background

Core Contributions

Implementation and Deployment

Summary

Files

2312.09494.md

Latest commit

History

2312.09494.md

File metadata and controls

Background

Core Contributions

Implementation and Deployment

Summary