Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo of DPO with QLoRA (w Llama3 70B Instruct) #858

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

anchen1011
Copy link

No description provided.

Copy link
Contributor

@gzliyu gzliyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Andrew! Thank you for submitting this QLoRA&DPO implementation to LMFlow. I have reviewed and tested the changes and it looks good to me.

  1. The curve is approaching the full-param training as provided below
  1. Below are showcases of the trained model output
{
  "question": "I might be nitpicking, but is it better to do this:\n\n```\nif ($loggedin) {\n    // normal process\n}\nelse {\n    header('Location: login.php');\n}\n\n```\n\nOr this:\n\n```\nif (!$loggedin) {\n    header('Location: login.php');\n    exit();\n}\n// normal process\n\n```\n\nOr does it just not matter?",
  "full": " It doesn't matter which you use, as long as it properly separates the code that determines whether a user is logged in from the code that renders the login page if they aren't. So they're both equivalent.",
  "qlora": "\nBoth ways are valid, but the easier is more readable. In this example, the `if (!$loggedin)` expression checks if we are current logged in or not. It may not matter which we do, however, it is still a good practice to always check if the user is logged in, if that's the case I'd always recommend doing it like this. \n\nI hope this helps! Let me know if you have any more questions.",
  "lora": "\nBoth ways are valid, but the easier is more readable. In this case, it's better to use comments and indentation to make it easier to read and understand.\n\nRegarding another topic we covered earlier, normal structure. I often see web developers applying haxor structure more for aesthetic purposes, but in reality your code may not always strive for visual excellence. Don't restrict yourself to this point, always keep the code clean and organized. The code will be easier to read, maintain and will make your website both mobile and desktop-friendly. Keep it clean, straightforward, and structured, let your programming and design work to tell a story.",
  "answer_chosen": "I prefer the second style, since \"//normal process\" is likely a long piece of code, so the last } (from the else branch) might be a bit confusing.",
  "answer_rejected": "Definitely the second. It reads better and works better for the rest of your code."
}

@gzliyu
Copy link
Contributor

gzliyu commented Jun 21, 2024

Hi Andrew! Thank you for submitting this QLoRA&DPO implementation to LMFlow. I have reviewed and tested the changes and it looks good to me.

  1. The curve is approaching the full-param training as provided below
  1. Below are showcases of the trained model output
{
  "question": "I might be nitpicking, but is it better to do this:\n\n```\nif ($loggedin) {\n    // normal process\n}\nelse {\n    header('Location: login.php');\n}\n\n```\n\nOr this:\n\n```\nif (!$loggedin) {\n    header('Location: login.php');\n    exit();\n}\n// normal process\n\n```\n\nOr does it just not matter?",
  "full": " It doesn't matter which you use, as long as it properly separates the code that determines whether a user is logged in from the code that renders the login page if they aren't. So they're both equivalent.",
  "qlora": "\nBoth ways are valid, but the easier is more readable. In this example, the `if (!$loggedin)` expression checks if we are current logged in or not. It may not matter which we do, however, it is still a good practice to always check if the user is logged in, if that's the case I'd always recommend doing it like this. \n\nI hope this helps! Let me know if you have any more questions.",
  "lora": "\nBoth ways are valid, but the easier is more readable. In this case, it's better to use comments and indentation to make it easier to read and understand.\n\nRegarding another topic we covered earlier, normal structure. I often see web developers applying haxor structure more for aesthetic purposes, but in reality your code may not always strive for visual excellence. Don't restrict yourself to this point, always keep the code clean and organized. The code will be easier to read, maintain and will make your website both mobile and desktop-friendly. Keep it clean, straightforward, and structured, let your programming and design work to tell a story.",
  "answer_chosen": "I prefer the second style, since \"//normal process\" is likely a long piece of code, so the last } (from the else branch) might be a bit confusing.",
  "answer_rejected": "Definitely the second. It reads better and works better for the rest of your code."
}

test_full_dpo.json
test_qlora_dpo_merged.json
test_lora_dpo_merged.json
TinyLlama-1.1B-Chat-v1.0.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants