Skip to content

Commit

Permalink
Update data.py for signature generation (axolotl-ai-cloud#851)
Browse files Browse the repository at this point in the history
* Update data.py

Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature.

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
  • Loading branch information
MilesQLi and winglian committed Nov 15, 2023
1 parent be4d494 commit cddb52b
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/axolotl/utils/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,12 @@ def load_tokenized_prepared_datasets(
str(cfg.sequence_len)
+ "@"
+ "|".join(
sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets])
sorted(
[
f"{d.path}:{d.type}:{d.shards}:{d.conversation}"
for d in cfg.datasets
]
)
)
+ "|"
+ tokenizer_name
Expand Down

0 comments on commit cddb52b

Please sign in to comment.