Skip to content

Commit

Permalink
Respect sequence_len in config for type: llama2_chat (#926)
Browse files Browse the repository at this point in the history
* Respect sequence_len in config for `type: llama2_chat`

It was hardcoded to `4096` I am not sure why?  This updates it to pull from the config. 

cc: @winglian

* Update llama2_chat.py

* apply black formatting

* fix tokenizer

* update test data

* lint fixtures
  • Loading branch information
hamelsmu committed Dec 12, 2023
1 parent 7fabc4d commit f1de29d
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
5 changes: 3 additions & 2 deletions src/axolotl/prompt_strategies/llama2_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,9 @@ class LLama2ChatTokenizingStrategy(PromptTokenizingStrategy):

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.sequence_len = 4096
self.tokenizer.add_special_tokens({"pad_token": "<pad>"})
self.tokenizer.add_special_tokens(
{"pad_token": getattr(self.tokenizer, "pad_token", "<pad>")}
)
# https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/added_tokens.json

def tokenize_prompt(self, prompt):
Expand Down
2 changes: 1 addition & 1 deletion tests/fixtures/conversation.tokenized_llama2chat.json

Large diffs are not rendered by default.

0 comments on commit f1de29d

Please sign in to comment.