add sample_idx in InputRequest for debugging #32

morgandu · 2024-04-09T00:33:49Z

No description provided.

gangji · 2024-04-09T03:39:16Z

benchmarks/benchmark_serving.py


-  tokenized_dataset = tokenize_dataset(dataset, tokenizer)
+  sampled_dataset = []


Just curious, numpy has numpy.take?

FanhaiLu1 · 2024-04-09T05:38:57Z

benchmarks/benchmark_serving.py

@@ -98,6 +98,7 @@ class InputRequest:
  prompt_len: int = 0
  output: str = ""
  output_len: int = 0
+  sample_idx: int = -1


There is right mapping between request and output in benchmark. what is the purpose of sample_idx? Will it help you find the related request easily?

+1, your goal is to be able to print the input prompt for the output for debugging right?

My goal is to be able to locate the original order / index from the original dataset file.

Say, if there are 10k data samples in the original dataset file, after randomly sampling, being passed as input request into server, the requests are returned based on decode completions. Currently we save the prompt, original result, and generated result in the request output file. If there are other metadata I am interested in checking in the original dataset file. How can I locate them?

agree, it's hard to locate original dateset. Please also feel free to add the important metadata into the result.

morgandu requested a review from vipannalla as a code owner April 9, 2024 00:33

morgandu requested review from JoeZijunZhou and patemotter April 9, 2024 00:34

morgandu force-pushed the mor--add-request-id branch 6 times, most recently from f8ad178 to d283ee8 Compare April 9, 2024 02:04

add sample_idx for debugging

741d9e7

morgandu force-pushed the mor--add-request-id branch from d283ee8 to 741d9e7 Compare April 9, 2024 03:22

morgandu changed the title ~~add request_id in InputRequest for debugging~~ add sample_idx in InputRequest for debugging Apr 9, 2024

gangji reviewed Apr 9, 2024

View reviewed changes

FanhaiLu1 approved these changes Apr 9, 2024

View reviewed changes

morgandu assigned morgandu and vipannalla and unassigned morgandu Apr 9, 2024

morgandu merged commit f103216 into main Apr 9, 2024
3 checks passed

morgandu deleted the mor--add-request-id branch April 9, 2024 20:09

jwyang-google pushed a commit that referenced this pull request May 6, 2024

add sample_idx for debugging (#32)

ee387ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sample_idx in InputRequest for debugging #32

add sample_idx in InputRequest for debugging #32

morgandu commented Apr 9, 2024

gangji Apr 9, 2024

FanhaiLu1 Apr 9, 2024

vipannalla Apr 9, 2024

morgandu Apr 9, 2024

FanhaiLu1 Apr 9, 2024


		tokenized_dataset = tokenize_dataset(dataset, tokenizer)
		sampled_dataset = []

add sample_idx in InputRequest for debugging #32

add sample_idx in InputRequest for debugging #32

Conversation

morgandu commented Apr 9, 2024

gangji Apr 9, 2024

Choose a reason for hiding this comment

FanhaiLu1 Apr 9, 2024

Choose a reason for hiding this comment

vipannalla Apr 9, 2024

Choose a reason for hiding this comment

morgandu Apr 9, 2024

Choose a reason for hiding this comment

FanhaiLu1 Apr 9, 2024

Choose a reason for hiding this comment