[DeepSparse Evaluation API] Perplexity #1555

dbogunowicz · 2024-01-23T12:53:19Z

This PR adds the general support for perplexity integration evaluation
Supported and tested datasets to evaluate on openai_humaneval. The module is modular and expandable to support additional datasets

Example usage

Example using CLI:

(deepsparse_venv) damian@gpuserver6:/nm/drive0/damian/deepsparse$ deepsparse.eval hf:mgoin/TinyStories-1M-ds --dataset openai_humaneval --integration perplexity --limit 2 --batch_size 2

2024-02-05:14:08:00,089 INFO     [utils.py:148] Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-05:14:08:00,089 INFO     [utils.py:160] NumExpr defaulting to 8 threads.
2024-02-05 14:08:07 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: hf:mgoin/TinyStories-1M-ds
2024-02-05 14:08:07 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['openai_humaneval']
Batch size: 2
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 2}
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
Fetching 11 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 25518.44it/s]
2024-02-05 14:08:10 deepsparse.evaluation.integrations.perplexity INFO     Argument `splits` is None. Defaulting to `test` split.██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:28<00:00, 37.16s/it]
2024-02-05 14:10:39 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625, 7300.88671875, 6835.68017578125]), Metric(name='mean_perplexity', value=8176.197265625)], samples=None)]
2024-02-05:14:10:39,889 INFO     [cli.py:204] Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625, 7300.88671875, 6835.68017578125]), Metric(name='mean_perplexity', value=8176.197265625)], samples=None)]
2024-02-05 14:10:39 deepsparse.evaluation.cli INFO     Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json
2024-02-05:14:10:39,889 INFO     [cli.py:212] Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json

Example using `evaluate` function:

from deepsparse import evaluate


result = evaluate(model="hf:mgoin/TinyStories-1M-ds",
         datasets="openai_humaneval", limit = 2, integration='perplexity')

print(result)

2024-02-05:14:00:38,576 INFO     [utils.py:148] Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-05:14:00:38,576 INFO     [utils.py:160] NumExpr defaulting to 8 threads.
2024-02-05:14:00:45,891 WARNING  [__init__.py:194] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 52015.04it/s]
2024-02-05:14:00:48,083 INFO     [perplexity.py:68] Argument `splits` is None. Defaulting to `test` split.
100%|██████████| 2/2 [01:43<00:00, 51.59s/it]
formatted=[Evaluation(task='perplexity', dataset=Dataset(type=None, name='openai_humaneval', config=None, split='test'), metrics=[Metric(name='perplexities', value=[7047.759765625, 11520.462890625]), Metric(name='mean_perplexity', value=9284.111328125)], samples=None)] raw=defaultdict(<class 'str'>, {'openai_humaneval': defaultdict(None, {'results': {'perplexities': array([ 7047.76 , 11520.463], dtype=float32), 'mean_perplexity': 9284.111}, 'split': 'test'})})

…xity_eval

src/deepsparse/evaluation/integrations/__init__.py

…test to cover this case, revert debugging changes

…e/damian/perplexity_eval

…nto feature/damian/perplexity_eval

bfineran · 2024-02-05T15:58:12Z

src/deepsparse/evaluation/integrations/perplexity.py

+        dataset, max_steps=None if limit is None else limit * batch_size
+    ):
+        # TODO: To remove when we have support for more datasets
+        sample = sample["prompt"] + sample["canonical_solution"]


can you add a plan in the in-code comment on how this can be abstracted out?

Just for bookkeeping, comment not relevant given the follow-up PR: #1586

…nto feature/damian/perplexity_eval

…lexity_eval

…eval`, `c4`, `wikitext2` (#1586) * fix tests 2 * initial commit * add return to a function * make script more robust

…lexity_eval

dbogunowicz added 2 commits January 22, 2024 16:55

initial commit

02b8c88

Merge remote-tracking branch 'origin/main' into feature/damian/perple…

b565119

…xity_eval

dbogunowicz commented Jan 23, 2024

View reviewed changes

src/deepsparse/evaluation/integrations/__init__.py Outdated Show resolved Hide resolved

dbogunowicz and others added 15 commits January 23, 2024 14:04

Update src/deepsparse/evaluation/integrations/__init__.py

351b77d

design ready, time to define additional features

395a7dd

split prep_for_generation operator

4016121

fix logits

77c5d16

update non-kv cache pipeline and tests

5c6e665

add tests to address edge cases

89a3eb4

add condition to check of kv_cache full during prompt inference, add …

0672a59

…test to cover this case, revert debugging changes

fix typing

35e528b

remove commented code

6dff7db

remove irrelevant condition

7c723b7

Merge branch 'main' into split_prep_operator

7a4c4ed

Merge branch 'main' into feature/damian/perplexity_eval

48c09cd

perplexity for non-kv cache pipelines works!

246725b

Merge remote-tracking branch 'origin/split_prep_operator' into featur…

d73d8d6

…e/damian/perplexity_eval

logic is working

5ea2f9a

dbogunowicz changed the base branch from main to split_prep_operator January 26, 2024 11:25

Base automatically changed from split_prep_operator to main January 26, 2024 18:42

dbogunowicz changed the base branch from main to feature/damian/generate_until February 5, 2024 13:35

Merge remote-tracking branch 'origin/feature/damian/generate_until' i…

faad643

…nto feature/damian/perplexity_eval

dbogunowicz changed the title ~~[WIP][DeepSparse Evaluation API] [Feature Branch] Perplexity~~ [DeepSparse Evaluation API] Perplexity Feb 5, 2024

ready for review

135254a

dbogunowicz mentioned this pull request Feb 5, 2024

[DeepSparse Evaluation API][Feature Branch] 1.7 Updates #1579

Closed

dbogunowicz requested review from rahul-tuli and bfineran February 5, 2024 14:29

dbogunowicz mentioned this pull request Feb 5, 2024

[DeepSparse Evaluation API] UX Improvements #1568

Merged

bfineran approved these changes Feb 5, 2024

View reviewed changes

Merge remote-tracking branch 'origin/feature/damian/generate_until' i…

38f2007

…nto feature/damian/perplexity_eval

dbogunowicz added 6 commits February 6, 2024 14:11

Merge branch 'feature/damian/generate_until' into feature/damian/perp…

60c1761

…lexity_eval

Merge branch 'feature/damian/generate_until' into feature/damian/perp…

9d70ee2

…lexity_eval

Merge branch 'feature/damian/generate_until' into feature/damian/perp…

25389bb

…lexity_eval

Merge branch 'feature/damian/generate_until' into feature/damian/perp…

c17d8c6

…lexity_eval

[DeepSparse Evaluation API] Perplexity eval support for `openai_human…

898f677

…eval`, `c4`, `wikitext2` (#1586) * fix tests 2 * initial commit * add return to a function * make script more robust

Merge branch 'feature/damian/generate_until' into feature/damian/perp…

a7153c0

…lexity_eval

dbogunowicz merged commit b82b49b into feature/damian/generate_until Feb 9, 2024

dbogunowicz deleted the feature/damian/perplexity_eval branch February 9, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSparse Evaluation API] Perplexity #1555

[DeepSparse Evaluation API] Perplexity #1555

dbogunowicz commented Jan 23, 2024 •

edited

Loading

bfineran Feb 5, 2024

dbogunowicz Feb 7, 2024

[DeepSparse Evaluation API] Perplexity #1555

[DeepSparse Evaluation API] Perplexity #1555

Conversation

dbogunowicz commented Jan 23, 2024 • edited Loading

Example usage

Example using CLI:

Example using evaluate function:

bfineran Feb 5, 2024

Choose a reason for hiding this comment

dbogunowicz Feb 7, 2024

Choose a reason for hiding this comment

dbogunowicz commented Jan 23, 2024 •

edited

Loading

Example using `evaluate` function: