Handle concurrent grammar requests #1610

drbh · 2024-02-28T19:58:36Z

This PR fixes parallel grammar requests, currently grammar states are not concatenated correctly when a new request is added to the batch and this results in incorrect generation. This PR updates the concatenate function to correctly include the previous states.

fixes: #1601

drbh · 2024-02-28T20:00:42Z

script used during dev

import json
import time
import random
import requests
import sys

from concurrent.futures import ThreadPoolExecutor

## read arguments (number of requests to fire)
n = 1
if len(sys.argv) > 1:
    n = int(sys.argv[1])

print("Firing", n, "requests...")

# build the request
json_schema = {
    "properties": {
        "choose_color": {
            "description": "Choose a color",
            "examples": ["red", "green", "blue"],
            "title": "Choose a color",
            "type": "string",
        }
    },
    "required": ["choose_color"],
}


def get_input():
    return random.choice(
        [
            "the sky",
            "the grass",
            "the sea",
            "the sun",
            "fire",
            "a tomato",
            "a strawberry",
            "a blueberry",
        ]
    )


# prepare function to fire requests
def fire_request():
    try:
        data = {
            "inputs": f"What color is {get_input()}?",
            "parameters": {
                "best_of": None,
                "temperature": 0.3,
                "repetition_penalty": 1.1,
                "frequency_penalty": None,
                "top_k": 30,
                "top_p": 0.95,
                "typical_p": 0.95,
                "do_sample": True,
                "max_new_tokens": 30,
                "return_full_text": False,
                "stop": [],
                "truncate": None,
                "watermark": False,
                "details": True,
                "decoder_input_details": False,
                "seed": None,
                "top_n_tokens": None,
                "grammar": {"type": "json", "value": json_schema},
            },
        }

        headers = {
            "Content-Type": "application/json",
        }

        response = requests.post(
            "http://127.0.0.1:3000/generate", headers=headers, json=data
        )
        # print(response.text)
        response_data = response.json()
        grammar_response = json.loads(response_data["generated_text"])
        print(json.dumps(grammar_response, indent=2))
    except Exception as e:
        # expect that some requests will fail
        print(e, response.text)


## fire N requests in parallel
start = time.time()
with ThreadPoolExecutor() as executor:
    for _ in range(n):
        executor.submit(fire_request)
end = time.time()
print("Time taken:", end - start)

run the following to send 10 requests at the same time

python concurrent.py 10

OlivierDehaene · 2024-02-29T10:16:45Z

@drbh you can use k6 for this kind of stuff. You have some template load testing scripts in load_tests. If the script go to completion then you can be confident that your model is able to concat/filter correctly.

This PR fixes parallel grammar requests, currently grammar states are not concatenated correctly when a new request is added to the batch and this results in incorrect generation. This PR updates the `concatenate` function to correctly include the previous states. fixes: huggingface#1601

drbh added 2 commits February 28, 2024 19:44

fix: persist grammar state after batch concat

4ff9cb8

fix: simplify changes

0370b0f

OlivierDehaene approved these changes Feb 29, 2024

View reviewed changes

OlivierDehaene merged commit 343aa7a into main Feb 29, 2024
7 checks passed

OlivierDehaene deleted the handle-concurrent-grammar-requests branch February 29, 2024 10:17

maziyarpanahi mentioned this pull request Mar 26, 2024

Concurrent grammar and non-grammar requests #1661

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle concurrent grammar requests #1610

Handle concurrent grammar requests #1610

drbh commented Feb 28, 2024

drbh commented Feb 28, 2024 •

edited

Loading

OlivierDehaene commented Feb 29, 2024

Handle concurrent grammar requests #1610

Handle concurrent grammar requests #1610

Conversation

drbh commented Feb 28, 2024

drbh commented Feb 28, 2024 • edited Loading

OlivierDehaene commented Feb 29, 2024

drbh commented Feb 28, 2024 •

edited

Loading