Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sample_idx in InputRequest for debugging #32

Merged
merged 1 commit into from
Apr 9, 2024
Merged

Conversation

morgandu
Copy link
Contributor

@morgandu morgandu commented Apr 9, 2024

No description provided.

@morgandu morgandu force-pushed the mor--add-request-id branch 6 times, most recently from f8ad178 to d283ee8 Compare April 9, 2024 02:04
@morgandu morgandu changed the title add request_id in InputRequest for debugging add sample_idx in InputRequest for debugging Apr 9, 2024

tokenized_dataset = tokenize_dataset(dataset, tokenizer)
sampled_dataset = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, numpy has numpy.take?

@@ -98,6 +98,7 @@ class InputRequest:
prompt_len: int = 0
output: str = ""
output_len: int = 0
sample_idx: int = -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is right mapping between request and output in benchmark. what is the purpose of sample_idx? Will it help you find the related request easily?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, your goal is to be able to print the input prompt for the output for debugging right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to be able to locate the original order / index from the original dataset file.

Say, if there are 10k data samples in the original dataset file, after randomly sampling, being passed as input request into server, the requests are returned based on decode completions. Currently we save the prompt, original result, and generated result in the request output file. If there are other metadata I am interested in checking in the original dataset file. How can I locate them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it's hard to locate original dateset. Please also feel free to add the important metadata into the result.

@morgandu morgandu assigned morgandu and vipannalla and unassigned morgandu Apr 9, 2024
@morgandu morgandu merged commit f103216 into main Apr 9, 2024
3 checks passed
@morgandu morgandu deleted the mor--add-request-id branch April 9, 2024 20:09
jwyang-google pushed a commit that referenced this pull request May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants