Can't use `chat_template: phi_3` with `type: sharegpt` #1683

ccdv-ai · 2024-06-04T23:11:01Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Should be able to use chat_template: phi_3 and type: sharegpt for the dataset

Current behaviour

  File "/home/user/codes/train/instruct/instruct-v1/axolotl/src/axolotl/utils/data/sft.py", line 403, in load_tokenized_prepared_datasets
    dataset_wrapper, dataset_prompter = get_dataset_wrapper(
                                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/codes/train/instruct/instruct-v1/axolotl/src/axolotl/utils/data/sft.py", line 689, in get_dataset_wrapper
    raise ValueError(
ValueError: unhandled prompt tokenization strategy: sharegpt

Steps to reproduce

Similar to phi3-ft.yml

base_model: microsoft/Phi-3-mini-4k-instruct
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: phi_3

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: lightblue/tagengo-gpt4
    type: sharegpt
    conversation: 
    field_messages: conversations
    message_field_role: from
    message_field_content: value

dataset_prepared_path:
val_set_size: 0.01
output_dir: ./out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_torch
adam_beta2: 0.95
adam_epsilon: 0.00001
max_grad_norm: 1.0
lr_scheduler: cosine
learning_rate: 5.0e-6

train_on_inputs: false
group_by_length: false
bf16: auto

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: True
early_stopping_patience: 3
logging_steps: 1
flash_attention: true

eval_steps: 1000
save_steps: 5000
eval_table_size: 2
eval_batch_size: 2
eval_sample_packing: false
eval_max_new_tokens: 32
eval_causal_lm_metrics: ["perplexity"]
do_causal_lm_eval: true

warmup_ratio: 0.2
debug: true
weight_decay: 0.1
resize_token_embeddings_to_32x: true

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.11

axolotl branch-commit

main/a82a711

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

ccdv-ai added the bug Something isn't working label Jun 4, 2024

hammoudhasan mentioned this issue Jun 5, 2024

Phi-3 conversation format, example training script and perplexity metric #1582

Merged

ccdv-ai changed the title ~~Can use chat_template: phi_3 with type: sharegpt~~ Can't use chat_template: phi_3 with type: sharegpt Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use `chat_template: phi_3` with `type: sharegpt` #1683

Can't use `chat_template: phi_3` with `type: sharegpt` #1683

ccdv-ai commented Jun 4, 2024

Can't use chat_template: phi_3 with type: sharegpt #1683

Can't use chat_template: phi_3 with type: sharegpt #1683

Comments

ccdv-ai commented Jun 4, 2024

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Can't use `chat_template: phi_3` with `type: sharegpt` #1683

Can't use `chat_template: phi_3` with `type: sharegpt` #1683