Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cohere - command-r #1422

Open
5 tasks done
darkacorn opened this issue Mar 19, 2024 · 6 comments
Open
5 tasks done

cohere - command-r #1422

darkacorn opened this issue Mar 19, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@darkacorn
Copy link

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

command-r has a new attention mechanism which is a bit different from llama2

✔️ Solution

implementation of q/lora training for cohere

❓ Alternatives

No response

📝 Additional Context

ggerganov/llama.cpp#6033 - its merged in llama.cpp already
it would be great if we can get a way to train that model as its amazing in writing / rag / function calling

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@darkacorn darkacorn added the enhancement New feature or request label Mar 19, 2024
@EwoutH
Copy link

EwoutH commented Apr 4, 2024

They now also released a larger, 104B parameter model: C4AI Command R+

@NanoCode012
Copy link
Collaborator

Hey, regular training should work as-is except for sample_packing.

@Undi95
Copy link

Undi95 commented Apr 12, 2024

Hey, regular training should work as-is except for sample_packing.

It never load the model on my side.
image
image

It get stuck at this step, I wanted to toy with it but I can't load the model.
This was tested on 4xH100 with the following configuration:

base_model: ./CohereForAI_c4ai-command-r-plus
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: ./datasets/0.json
    type: sharegpt
    conversation: chatml
  - path: ./datasets/1.json
    type: sharegpt
    conversation: chatml
dataset_prepared_path: ./last_run_prepared
val_set_size: 0.05
output_dir: ./lora-out
chat_template: cohere

sequence_len: 8192
sample_packing: false
pad_to_sequence_len:

adapter: qlora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Cohere-Noromaid
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
saves_per_epoch: 1
debug:
deepspeed: ./axolotl/deepspeed_configs/zero3.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

I tried without deepspeed (screenshot show that) and with it, but nothing let me load it for whatever reasons.
I tried with trust_remote_code: true also, but no luck.

Here is what I use replacing chatml template to have the correct prompt format in sharegpt.py:

def register_chatml_template(system_message=None):
    system_message = system_message or "You are a helpful assistant."
    register_conv_template(
        Conversation(
            name="chatml",
            system_template="<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{system_message}<|END_OF_TURN_TOKEN|>",
            system_message=system_message,
            roles=["<|START_OF_TURN_TOKEN|><|USER_TOKEN|>", "<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>"],
            sep_style=SeparatorStyle.NO_COLON_SINGLE,
            sep="<|END_OF_TURN_TOKEN|>",
        )
    )

EDIT: After waiting some times I get this error before the process got killed:

[rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=600000) ran for 600436 milliseconds before timing out.

@NanoCode012
Copy link
Collaborator

@Undi95 , can you try preprocess it separately first?

also, make sure trust remote code is on

@Undi95
Copy link

Undi95 commented Apr 12, 2024

@Undi95 , can you try preprocess it separately first?

also, make sure trust remote code is on

I always preprocess my dataset before launching a train with python -m axolotl.cli.preprocess config.yml --debug
Trust remote code was on for some try but same result

@NanoCode012
Copy link
Collaborator

I forgot to mention that there's an untested PR for sample packing with cohere: #1547 . If anyone else is following, do you also get the same issue as Undi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants