[Feature Branch][LLM Testing] LLM Testing Suite #1227

dbogunowicz · 2023-09-06T10:48:51Z

Covers the testing as outlined in the internal LLM testing doc. Note, not the full scope is covered here, but the scope required to reliably test the LLM pipelines.

Here is the outline of the introduced tests:

test_freeze_first_position:
- Tests whether the first token is "frozen" after the KV cache is full.
- Verifies whether the pipeline correctly handles the behavior of "freezing" the first token.
test_ort_model:
- Asserts that the ONNX model with KV Cache support runs in ONNXRuntime and produces correct results.
- Compares the results of the ONNX model with those of the Torch model.
test_ort_single_token_prefill:
- Tests the scenario where prompt preprocessing is performed by a single-token engine.
- The KV Cache is never filled up.
- KV Cache can be managed externally or internally.
test_ort_multi_token_prefill:
- Tests the scenario where prompt preprocessing is performed by a multi-token engine.
- The KV Cache is never filled up.
- KV Cache can be managed externally or internally.
test_ort_generation_after_kv_cache_has_been_filled:
- Tests the scenario where prompt preprocessing is performed by a multi-token engine.
- The KV Cache is filled up (old entries are removed).
- KV Cache can be managed externally or internally.
test_deepsparse_single_token_prefill:
- Tests the pipeline that uses deepsparse engine with single-token prompt preprocessing.
- The KV Cache is never filled up.
- KV Cache can be managed externally or internally.
test_deepsparse_multi_token_prefill:
- Tests the pipeline that uses deepsparse engine with multi-token prompt preprocessing.
- The KV Cache is never filled up.
- KV Cache can be managed externally or internally.
test_deepsparse_generation_after_kv_cache_has_been_filled:
- Tests the pipeline that uses deepsparse engine with multi-token prompt preprocessing.
- The KV Cache is filled up (old entries are removed).
- KV Cache can be managed externally or internally.
test_run_same_prompt_multiple_times:
- Tests the scenario where the same prompt is run multiple times.
- Ensures that each run produces the same output.
_test_output (a helper function):
- Compares the pipeline's output with target values (logits, generated text, and cache).
- Checks if the generated logits match the target logits.
- Verifies that the generated text is as expected.
- Validates the state of the cache.
- If we generate output from the pipeline where the kv cache buffer has been filled, we use the max abs different between logits as a criterion.
_test_kv_cache_state (a helper function):
- Compares the expected cache state with the target cache state.
- Specifically focuses on prompt cache entries and their alignment.

Note: This PR assumes changes from #1198

…deepsparse into dbogunowicz-patch-1

… main

…nto feature/damian/new_tests

…ruth' into feature/damian/new_tests

…on_tests.md

bfineran

LGTM - great job with the inline comments

* initial commit * finish creation of helper objects * Update tests/conftest.py * small refactor * [Feature Branch][LLM Testing] LLM Testing Suite (#1227) * Update README.md * Update src/deepsparse/yolov8/README.md * Update text_generation.py * quality * readability * all tests passing * added some full kv cache tests * initial commit * ready for review * Delete tests/deepsparse/transformers/pipelines/proposal_text_generation_tests.md

* initial commit * initial commit * [Feature Branch][LLM Testing] Create GroundTruthSource objects (#1219) * initial commit * finish creation of helper objects * Update tests/conftest.py * small refactor * [Feature Branch][LLM Testing] LLM Testing Suite (#1227) * Update README.md * Update src/deepsparse/yolov8/README.md * Update text_generation.py * quality * readability * all tests passing * added some full kv cache tests * initial commit * ready for review * Delete tests/deepsparse/transformers/pipelines/proposal_text_generation_tests.md * fix tests * Dipika's comments plus adjusting the script to renamed variables * remove ORT ground truth * add OPT tests * rebase and disable tests in GHA * quality

dbogunowicz and others added 24 commits May 26, 2023 12:41

Update README.md

fe3769b

Update src/deepsparse/yolov8/README.md

fd07e3a

Merge branch 'main' into dbogunowicz-patch-1

5a59e60

Update text_generation.py

a1a2dbc

Merge branch 'dbogunowicz-patch-1' of https://github.com/neuralmagic/…

9509a11

…deepsparse into dbogunowicz-patch-1

Merge branch 'main' into dbogunowicz-patch-1

499f970

Merge branch 'dbogunowicz-patch-1' of https://github.com/neuralmagic/…

635d3fd

…deepsparse into dbogunowicz-patch-1

quality

7f2ac29

Merge branch 'main' into dbogunowicz-patch-1

353de69

Merge branch 'main' into dbogunowicz-patch-1

d429f6c

Merge branch 'main' into dbogunowicz-patch-1

7596b18

readability

64296f1

Merge branch 'main' into dbogunowicz-patch-1

68c1b31

all tests passing

0bdfece

added some full kv cache tests

4293592

Merge branch 'main' into dbogunowicz-patch-1

9ca6280

Merge branch 'main' into dbogunowicz-patch-1

5ff2b7b

Merge branch 'main' of https://github.com/neuralmagic/deepsparse into…

65a176f

… main

Merge remote-tracking branch 'origin/dbogunowicz-patch-1' into main

fb4b6b0

initial commit

347a5e6

Merge branch 'main' into dbogunowicz-patch-1

8351836

Merge branch 'dbogunowicz-patch-1' into feature/damian/fix_continuous

f333517

Merge remote-tracking branch 'origin/feature/damian/fix_continuous' i…

7449ad3

…nto feature/damian/new_tests

ready for review

afe072b

dbogunowicz changed the base branch from main to feature/damian/testing_sources_truth September 6, 2023 10:49

dbogunowicz mentioned this pull request Sep 6, 2023

[Feature Branch][LLM Testing] Full Testing Harness for LLMs #1216

Merged

dbogunowicz and others added 2 commits September 6, 2023 10:54

Merge remote-tracking branch 'origin/feature/damian/testing_sources_t…

486d174

…ruth' into feature/damian/new_tests

Delete tests/deepsparse/transformers/pipelines/proposal_text_generati…

d577347

…on_tests.md

dbogunowicz requested review from dsikka and bfineran September 6, 2023 11:33

bfineran approved these changes Sep 6, 2023

View reviewed changes

dbogunowicz merged commit 8985a9b into feature/damian/testing_sources_truth Sep 7, 2023

dbogunowicz deleted the feature/damian/new_tests branch September 7, 2023 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Branch][LLM Testing] LLM Testing Suite #1227

[Feature Branch][LLM Testing] LLM Testing Suite #1227

dbogunowicz commented Sep 6, 2023 •

edited

Loading

bfineran left a comment

[Feature Branch][LLM Testing] LLM Testing Suite #1227

[Feature Branch][LLM Testing] LLM Testing Suite #1227

Conversation

dbogunowicz commented Sep 6, 2023 • edited Loading

bfineran left a comment

Choose a reason for hiding this comment

dbogunowicz commented Sep 6, 2023 •

edited

Loading