Update README with some explanations #700

seungduk-yanolja · 2023-10-07T18:05:07Z

Description

Added some explanation and examples to the YAML config documentation to help future users.

Motivation and Context

Sharing lesson-learned

How has this been tested?

MD file viewer

Screenshots (if appropriate)

N/A

Types of changes

Comments

winglian · 2023-10-07T18:21:34Z

README.md

 gradient_accumulation_steps: 1
+# The number of samples to accumulate gradients for, before performing a backward/update pass.


micro batch size is the per gpu number of samples to accumulate in each forward pass.
micro batch size * gradient accumulations steps * # of gpus = total batch size

added more explanation below

flexchar · 2023-10-07T19:46:02Z

I absolutely appreciate the work on this! If I may, I'd really like to ask for a couple short examples (a sentence or two) on each option for dummies - technical people who are not from Machine Learning/AI background.

For example:
lora_r - specifies how many layers should be trained. The more layers the longer the training will take but it can yield better results with bigger dataset. It's recommended to use 32 or 16.
num_epochs - how many times should the whole training be repeated. It's also known as training steps. The more epochs the better can model learn the data. Recommended starting point is 10.
micro_batch_size - how many trainers run at the same time. Running more can speed up the progress but can also cause OOM... best to leave at 2.
lr_scheduler - this means abc and xyz. Recommended to use cosine unless you know what you are doing. And so on so for.

NOTE I have no idea if this is correct, I have technical non ML background and my goal is to learn the practical aspect of training with some navigation in theory too. :)

seungduk-yanolja · 2023-10-08T04:03:20Z

I absolutely appreciate the work on this! If I may, I'd really like to ask for a couple short examples (a sentence or two) on each option for dummies - technical people who are not from Machine Learning/AI background.

For example: lora_r - specifies how many layers should be trained. The more layers the longer the training will take but it can yield better results with bigger dataset. It's recommended to use 32 or 16. num_epochs - how many times should the whole training be repeated. It's also known as training steps. The more epochs the better can model learn the data. Recommended starting point is 10. micro_batch_size - how many trainers run at the same time. Running more can speed up the progress but can also cause OOM... best to leave at 2. lr_scheduler - this means abc and xyz. Recommended to use cosine unless you know what you are doing. And so on so for.

NOTE I have no idea if this is correct, I have technical non ML background and my goal is to learn the practical aspect of training with some navigation in theory too. :)

I added a link in the doc for more details about the LoRA hyperparameters.
https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2

PTAL

winglian · 2023-10-08T06:02:03Z

@seungduk-yanolja thanks for doing this! much needed. are you happy with the state of this? Should I go ahead and merge?

seungduk-yanolja · 2023-10-08T06:19:37Z

@seungduk-yanolja thanks for doing this! much needed. are you happy with the state of this? Should I go ahead and merge?

yes, please! thanks

seungduk-yanolja · 2023-10-08T16:11:41Z

I added one more explanation about lora_modules_to_save as follows.

# If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
# For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
# `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
# https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
lora_modules_to_save:
#  - embed_tokens
#  - lm_head

flexchar · 2023-10-08T19:45:21Z

Many thanks for this! It means the world to me and to many others. I also found this resource written up for StableDiffusion LoRAs. It seems to be quite relevant https://github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters.

* Update README with some explanations * revert commit-hook change * add more explanation about batch size and gradient accum * not use latex foromat * decorate * git hook again * Attach a link that explains about LoRA hyperparameters * update table of content * Explanation about lora_modules_to_save

seungduk-yanolja added 2 commits October 8, 2023 03:01

Update README with some explanations

f883064

revert commit-hook change

ff97222

seungduk-yanolja mentioned this pull request Oct 7, 2023

YAML options need to be more descriptive #698

Closed

5 tasks

winglian reviewed Oct 7, 2023

View reviewed changes

seungduk-yanolja added 3 commits October 8, 2023 04:16

add more explanation about batch size and gradient accum

f04a8b6

not use latex foromat

173f8b5

decorate

5b50838

winglian approved these changes Oct 7, 2023

View reviewed changes

git hook again

e1c3853

seungduk-yanolja added 2 commits October 8, 2023 12:46

Attach a link that explains about LoRA hyperparameters

6c6035d

update table of content

6e48622

seungduk-yanolja requested a review from winglian October 8, 2023 04:02

Explanation about lora_modules_to_save

81841f7

winglian merged commit 77c84e0 into axolotl-ai-cloud:main Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README with some explanations #700

Update README with some explanations #700

seungduk-yanolja commented Oct 7, 2023

winglian Oct 7, 2023

seungduk-yanolja Oct 7, 2023

flexchar commented Oct 7, 2023

seungduk-yanolja commented Oct 8, 2023

winglian commented Oct 8, 2023

seungduk-yanolja commented Oct 8, 2023

seungduk-yanolja commented Oct 8, 2023

flexchar commented Oct 8, 2023

		gradient_accumulation_steps: 1
		# The number of samples to accumulate gradients for, before performing a backward/update pass.

Update README with some explanations #700

Update README with some explanations #700

Conversation

seungduk-yanolja commented Oct 7, 2023

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

winglian Oct 7, 2023

Choose a reason for hiding this comment

seungduk-yanolja Oct 7, 2023

Choose a reason for hiding this comment

flexchar commented Oct 7, 2023

seungduk-yanolja commented Oct 8, 2023

winglian commented Oct 8, 2023

seungduk-yanolja commented Oct 8, 2023

seungduk-yanolja commented Oct 8, 2023

flexchar commented Oct 8, 2023