-
Notifications
You must be signed in to change notification settings - Fork 168
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WiP] [KV Cache Interface] Text Generation & Decoder Engine Implement…
…ation (#1089) * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * initial commit * [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * [CodeGen][Documentation] (#956) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * initial commit * change order * Update examples/codegen/README.md Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> * reimplementation for generative pipelines * restore text generation from examples * [CodeGen] ONNX model loading to support >2Gb models / two engines (#991) * refactor sucessfull * Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented! * First iteration with Sage * Apply suggestions from code review * ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still * dynamic ORT vs static DS * pipeline handles OPT multitoken pass * fixes to get static pipeline a little further along * adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated * migrate from cache_length to positions input * got if working for multitoken + single token scenario * cleanup the pipeline * further cleanup post merge * Pipeline working for single-token inference only * do not load the onnx model with external files twice * pipeline never redundantly saves the external data + more robust tokenizer * Stop saving tmp files, otherwise the engine looks for external files in the wrong place * Left pad support * cleanup * cleanup2 * Add in pipeline timing * add in force tokens logic * remove input validation for text generation pipelines * remove multitoken support for now * remove kv cache engine and other fixes * nest input shape override * comment out input shape override * add non batch override for ORT * clean up generation pipeline * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * fix the erronous Makefile * perhaps fixed GHA * take into consideration that GHA creates four files * initial commit * tested with actual model * remove val_inp argument * Update README.md * Apply suggestions from code review * Update README.md * initial implementation * initial implementation * Revert "initial implementation" This reverts commit 765a5f7. * rebase * add tests * strip down complexity out of text generation pipeline * initial implementation * In a good state for the review on 22.06 * remove files to make review easier * Revert "remove files to make review easier" This reverts commit ea82e99. * Merge DecoderKVCache with KVCacheORT (KVCacheORT will not exist, it is just an abstraction) * rebase * add tests * Delete decoder_kv_cache.py * Delete test_decoder_kv_cache.py * DecoderKVCache that manipulates cache state and additionally passes info to the engine via KVCache object * fix formatting of the transformers/utils/__init__.py * improvements after the sync with Mark * All changes applied, time for testing * Scaffolding to also run multitoken * add delay_overwriting_inputs * multitoken is working (although in limited capacity) * fix no kv cache inference * Do not create engine if not needed * remove the prefill option * fix docstring * remove prefill * fix the computation of total cache capacity * merge * addressed PR comments * quality --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com>
- Loading branch information
1 parent
0d6a423
commit 0809aea
Showing
12 changed files
with
914 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# flake8: noqa | ||
from .nl_decoder_engine import * |
Oops, something went wrong.