-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added more details to the discussion of optimizers/teleprompters. #951
Added more details to the discussion of optimizers/teleprompters. #951
Conversation
Add class diagrams for the teleprompters, to be included in the documentation. Expanded the discussion to try to clarify what the various optimizers do.
There should be no change here to the function of the code, just some documentation to make it easier for the user/maintainer to follow. Added this in the course of figuring out the teleprompters well enough to figure out how to use them.
This pull request needs a bit of screening. In particular,
Finally, I don't know how docusaurus works so the method I used to put an image into |
Hi @rpgoldman , thanks for the contributions to the documentation. These are much needed! The diagram is great. Could you remove the other image formats and add just the I made a pass over the documentation and made some corrections. Also removed the in-line comments/questions and moved them here. Feel free to follow up on any if needed.
Optuna is similar to the BootstrapFewShotWithRandomSearch optimization, simply replacing the random search optimization with the Optuna objective optimization, treating the candidate score as the variable to optimize over for each program candidate and running it over a set of trials. The outputted compiled program mirrors the automatic selection of few-shot examples in the prompt.
The KNNFewShot optimizer essentially clusters the provided set of training examples and applies the fewshot optimization given this example clustering. Feel free to check out this reference notebook for how to use it! Line 141 in 733a127
This may lead to some overfitting and BootstrapFewShot in facts covers this with
Fixed the typo. The defaults are just configurations used during experiments in the paper and may not be appropriate for all use cases, hence left configurable and as maximums.
|
Would it be OK to retain the
I marked 2 comments that weren't questions, just explanations, that I thought might be worth maintaining.
See my comment on the file -- even though Optuna is a hyperparameter optimizer, it doesn't look like dspy uses it for that purpose here. It looks like it's just optimizing the choice of examples, which isn't a hyperparameter.
It wasn't clear to me what the purpose of the clustering was. That's what I was trying to explain -- does dspy use the clusters as I suggested, to make sure that the space is covered by choosing elements from different clusters, instead of choosing a bunch of examples from a single cluster.
One thing I still don't understand is why the term |
P.S. I don't know what the Ruff fix is, I'm afraid. If there's a pointer somewhere that explains it, please let me know. |
This notebook refers to "kNN Few-Shot":
I figure it would help to add a reference. Do you know if this article is what that refers to? If so, I could add that link to the notebook in this MR |
Is it important for the documentation to keep the .dot file? I think it would be best to only include final product images on the repo, as done with other documentation this way (
Running
Yes, DSPy uses the KNN technique to pick a diverse set of examples from different clusters and then optimize using FewShot with examples pre-optimized using KNN (making the bootstrapping process stronger). This will be more useful when there's a lot of data over random spaces and using KNN helps optimize the trainset using for BootstrapFewShot (related to #77). The notebook details this with an example of DSPy KNN few-shot.
I think this is also a bit semantics-related and can remain unchanged for now, unless there is a strong reason to change otherwise (and will likely need refactoring across the rest of the repo if so). |
Done!
Done! I see now that it's a linter. |
Add a comment to the markdown to explain how the class diagram was generated, so that it can be updated as more teleprompters are added.
I added a comment to the markdown to explain the process of generating the class hierarchy figure, so that it can be updated later. |
- Add key ideas from Arnav's KNN explanation (in the issue) and - Clarify that only MIPRO optimizes the demonstration set.
I think it would be best to simply note this deviation from the otherwise standard use of "devset" somewhere in the documentation. If one wanted to do more, I'd say just introduce |
If you are happy with what's there now, I think it's ok to merge. |
@arnavsinghvi11 Your explanation of KNN was very helpful; I pulled a couple of sentences into the Markdown. |
P.S. The pointer to the KNN notebook probably should go somewhere else, but I suggest keeping it here until there's a page for the KNN optimizer added to the Teleprompters/Optimizers section of the "Deep Dive." |
Sorry, forgot to clear the "Draft" flag. |
|
||
|
||
## What DSPy Optimizers are currently available? | ||
|
||
<!-- The following diagram was generated by: --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give yourself credit! :)
Thanks @rpgoldman for this amazing PR on documentation. Left a small comment for you to give yourself credit for the PNG and should be ready to merge. (I left the comments you had for generating the PNG since it makes sense for that process, but lmk if you wanted to remove that before merging). |
Thanks @rpgoldman ! |
Added more details to the discussion of optimizers/teleprompters.
Add class diagrams for the teleprompters, to be included in the
documentation.
Expanded the discussion to try to clarify what the various optimizers do.