cohere - command-r #1422

darkacorn · 2024-03-19T13:26:17Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

command-r has a new attention mechanism which is a bit different from llama2

✔️ Solution

implementation of q/lora training for cohere

❓ Alternatives

No response

📝 Additional Context

ggerganov/llama.cpp#6033 - its merged in llama.cpp already
it would be great if we can get a way to train that model as its amazing in writing / rag / function calling

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

EwoutH · 2024-04-04T15:47:41Z

They now also released a larger, 104B parameter model: C4AI Command R+

NanoCode012 · 2024-04-09T11:47:01Z

Hey, regular training should work as-is except for sample_packing.

Undi95 · 2024-04-12T10:51:52Z

Hey, regular training should work as-is except for sample_packing.

It never load the model on my side.

It get stuck at this step, I wanted to toy with it but I can't load the model.
This was tested on 4xH100 with the following configuration:

base_model: ./CohereForAI_c4ai-command-r-plus
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: ./datasets/0.json
    type: sharegpt
    conversation: chatml
  - path: ./datasets/1.json
    type: sharegpt
    conversation: chatml
dataset_prepared_path: ./last_run_prepared
val_set_size: 0.05
output_dir: ./lora-out
chat_template: cohere

sequence_len: 8192
sample_packing: false
pad_to_sequence_len:

adapter: qlora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Cohere-Noromaid
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
saves_per_epoch: 1
debug:
deepspeed: ./axolotl/deepspeed_configs/zero3.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

I tried without deepspeed (screenshot show that) and with it, but nothing let me load it for whatever reasons.
I tried with trust_remote_code: true also, but no luck.

Here is what I use replacing chatml template to have the correct prompt format in sharegpt.py:

def register_chatml_template(system_message=None):
    system_message = system_message or "You are a helpful assistant."
    register_conv_template(
        Conversation(
            name="chatml",
            system_template="<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{system_message}<|END_OF_TURN_TOKEN|>",
            system_message=system_message,
            roles=["<|START_OF_TURN_TOKEN|><|USER_TOKEN|>", "<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>"],
            sep_style=SeparatorStyle.NO_COLON_SINGLE,
            sep="<|END_OF_TURN_TOKEN|>",
        )
    )

EDIT: After waiting some times I get this error before the process got killed:

[rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=600000) ran for 600436 milliseconds before timing out.

NanoCode012 · 2024-04-12T12:26:45Z

@Undi95 , can you try preprocess it separately first?

also, make sure trust remote code is on

Undi95 · 2024-04-12T12:28:28Z

@Undi95 , can you try preprocess it separately first?

also, make sure trust remote code is on

I always preprocess my dataset before launching a train with python -m axolotl.cli.preprocess config.yml --debug
Trust remote code was on for some try but same result

NanoCode012 · 2024-04-30T14:07:33Z

I forgot to mention that there's an untested PR for sample packing with cohere: #1547 . If anyone else is following, do you also get the same issue as Undi?

darkacorn added the enhancement New feature or request label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cohere - command-r #1422

cohere - command-r #1422

darkacorn commented Mar 19, 2024

EwoutH commented Apr 4, 2024

NanoCode012 commented Apr 9, 2024

Undi95 commented Apr 12, 2024 •

edited

Loading

NanoCode012 commented Apr 12, 2024

Undi95 commented Apr 12, 2024

NanoCode012 commented Apr 30, 2024

cohere - command-r #1422

cohere - command-r #1422

Comments

darkacorn commented Mar 19, 2024

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

EwoutH commented Apr 4, 2024

NanoCode012 commented Apr 9, 2024

Undi95 commented Apr 12, 2024 • edited Loading

NanoCode012 commented Apr 12, 2024

Undi95 commented Apr 12, 2024

NanoCode012 commented Apr 30, 2024

Undi95 commented Apr 12, 2024 •

edited

Loading