Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to train #131

Open
riyajatar37003 opened this issue Jun 18, 2024 · 2 comments
Open

unable to train #131

riyajatar37003 opened this issue Jun 18, 2024 · 2 comments

Comments

@riyajatar37003
Copy link

riyajatar37003 commented Jun 18, 2024

these are steps followed to setup :

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/texttron/tevatron.git
cd tevatron
git checkout tevatron-v1 also git checkout main
pip install transformers datasets peft
pip install deepspeed accelerate
pip install faiss-cpu
pip install -e .

run the following command to run

python -m torch.distributed.run --nproc_per_node=1 -m tevatron.driver.train
--output_dir retriever-mistral
--model_name_or_path "/Mixtral-7b-instruct"
--lora
--lora_target_modules q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj
--save_steps 50
--dataset_name Tevatron/msmarco-passage-aug
--query_prefix "Query: "
--passage_prefix "Passage: "
--pooling eos
--append_eos_token
--normalize
--fp16
--temperature 0.01
--per_device_train_batch_size 4
--gradient_checkpointing
--train_group_size 16
--learning_rate 1e-4
--query_max_len 32
--passage_max_len 156
--num_train_epochs 1
--logging_steps 10
--overwrite_output_dir
--gradient_accumulation_steps 4

i always got this error

/opt/conda/bin/python: Error while finding module specification for 'tevatron.driver.train' (ModuleNotFoundError: No module named 'tevatron.driver')

@riyajatar37003
Copy link
Author

[rank0]: raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
[rank0]: ValueError: Some specified arguments are not used by the HfArgumentParser: ['--lora', '--lora_target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj', '--query_prefix', 'Query: ', '--passage_prefix', 'Passage: ', '--pooling', 'eos', '--append_eos_token', '--temperature', '0.01', '--train_group_size', '16', '--query_max_len', '32', '--passage_max_len', '156']

@riyajatar37003
Copy link
Author

DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 447 with name encoder.base_model.model.layers.31.mlp.down_proj.lora_B.default.weight has been marked as ready twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant