Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeepSparse Evaluation API] Perplexity #1555

Merged

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jan 23, 2024

This PR adds the general support for perplexity integration evaluation
Supported and tested datasets to evaluate on openai_humaneval. The module is modular and expandable to support additional datasets

Example usage

Example using CLI:

(deepsparse_venv) damian@gpuserver6:/nm/drive0/damian/deepsparse$ deepsparse.eval hf:mgoin/TinyStories-1M-ds --dataset openai_humaneval --integration perplexity --limit 2 --batch_size 2

2024-02-05:14:08:00,089 INFO     [utils.py:148] Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-05:14:08:00,089 INFO     [utils.py:160] NumExpr defaulting to 8 threads.
2024-02-05 14:08:07 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: hf:mgoin/TinyStories-1M-ds
2024-02-05 14:08:07 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['openai_humaneval']
Batch size: 2
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 2}
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
Fetching 11 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 25518.44it/s]
2024-02-05 14:08:10 deepsparse.evaluation.integrations.perplexity INFO     Argument `splits` is None. Defaulting to `test` split.██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:28<00:00, 37.16s/it]
2024-02-05 14:10:39 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625, 7300.88671875, 6835.68017578125]), Metric(name='mean_perplexity', value=8176.197265625)], samples=None)]
2024-02-05:14:10:39,889 INFO     [cli.py:204] Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625, 7300.88671875, 6835.68017578125]), Metric(name='mean_perplexity', value=8176.197265625)], samples=None)]
2024-02-05 14:10:39 deepsparse.evaluation.cli INFO     Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json
2024-02-05:14:10:39,889 INFO     [cli.py:212] Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json

Example using evaluate function:

from deepsparse import evaluate


result = evaluate(model="hf:mgoin/TinyStories-1M-ds",
         datasets="openai_humaneval", limit = 2, integration='perplexity')

print(result)
2024-02-05:14:00:38,576 INFO     [utils.py:148] Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-05:14:00:38,576 INFO     [utils.py:160] NumExpr defaulting to 8 threads.
2024-02-05:14:00:45,891 WARNING  [__init__.py:194] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 52015.04it/s]
2024-02-05:14:00:48,083 INFO     [perplexity.py:68] Argument `splits` is None. Defaulting to `test` split.
100%|██████████| 2/2 [01:43<00:00, 51.59s/it]
formatted=[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625]), Metric(name='mean_perplexity', value=9284.111328125)], samples=None)] raw=defaultdict(<class 'str'>, {'openai_humaneval': defaultdict(None, {'results': {'perplexities': array([ 7047.76 , 11520.463], dtype=float32), 'mean_perplexity': 9284.111}, 'split': 'test'})})

@dbogunowicz dbogunowicz changed the base branch from main to split_prep_operator January 26, 2024 11:25
Base automatically changed from split_prep_operator to main January 26, 2024 18:42
@dbogunowicz dbogunowicz changed the base branch from main to feature/damian/generate_until February 5, 2024 13:35
@dbogunowicz dbogunowicz changed the title [WIP][DeepSparse Evaluation API] [Feature Branch] Perplexity [DeepSparse Evaluation API] Perplexity Feb 5, 2024
dataset, max_steps=None if limit is None else limit * batch_size
):
# TODO: To remove when we have support for more datasets
sample = sample["prompt"] + sample["canonical_solution"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a plan in the in-code comment on how this can be abstracted out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for bookkeeping, comment not relevant given the follow-up PR: #1586

@dbogunowicz dbogunowicz merged commit b82b49b into feature/damian/generate_until Feb 9, 2024
@dbogunowicz dbogunowicz deleted the feature/damian/perplexity_eval branch February 9, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants