Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Spellcheck #345

Merged
merged 44 commits into from
Aug 23, 2024
Merged

feat: Spellcheck #345

merged 44 commits into from
Aug 23, 2024

Conversation

jeremyarancio
Copy link
Collaborator

What

  • LLMs with QLoRA development.

Description

  • LLM fine-tuning on the Spellcheck
  • Re-evaluation of Foundational models on the benchmark
  • Normalization evaluation algorithm to not consider some types of errors
  • Add data processing pipeline

Part of

  • Spellcheck

"flavour" -> "flavor" - "ï" -> "i" - "â" -> "a" - "oe"
Scripts are customed to handle training in the cloud using Sagemaker Training Jobs
Prompt was intentionally overfitted on the benchmark to create later the synthetic training dataset. Examples from benchmark are removed from the prompt.
@teolemon teolemon changed the title Spellcheck feat: Spellcheck Aug 6, 2024
@raphael0202 raphael0202 merged commit 24adbb2 into develop Aug 23, 2024
1 of 2 checks passed
@raphael0202 raphael0202 deleted the spellcheck branch August 23, 2024 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants