Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more details to the discussion of optimizers/teleprompters. #951

Merged
merged 13 commits into from
May 11, 2024

Conversation

rpgoldman
Copy link
Contributor

Add class diagrams for the teleprompters, to be included in the
documentation.

Expanded the discussion to try to clarify what the various optimizers do.

Add class diagrams for the teleprompters, to be included in the
documentation.

Expanded the discussion to try to clarify what the various optimizers do.
There should be no change here to the function of the code, just some
documentation to make it easier for the user/maintainer to follow.
Added this in the course of figuring out the teleprompters well enough
to figure out how to use them.
@rpgoldman
Copy link
Contributor Author

This pull request needs a bit of screening. In particular,

  1. there are comments inline in 6-optimizers.md which indicate places where I was not sure that I was explaining correctly. Those should be corrected if necessary, and then the comments removed.
  2. There's a minor FIXME in optuna that could be taken care of (or, if it's wrong, could just be removed).
  3. There's another minor FIXME in bootstrap.py
  4. There's a "QUESTION:" in bootstrap.py that shows a place where I got puzzled. It might be that the variables could be given better names, or it might be that I'm just missing something.

Finally, I don't know how docusaurus works so the method I used to put an image into 6-bootstrap.md might have been wrong.

@arnavsinghvi11
Copy link
Collaborator

arnavsinghvi11 commented May 5, 2024

Hi @rpgoldman , thanks for the contributions to the documentation. These are much needed!

The diagram is great. Could you remove the other image formats and add just the png to the relevant documentation as is done for images like here:

I made a pass over the documentation and made some corrections. Also removed the in-line comments/questions and moved them here. Feel free to follow up on any if needed.

TBQH, I don't understand how Optuna does this. As far as I can tell it simply chooses best based on multiple evaluations, rather than a single one, and mention of "hyperparameters" seems to be a red herring.

Optuna is similar to the BootstrapFewShotWithRandomSearch optimization, simply replacing the random search optimization with the Optuna objective optimization, treating the candidate score as the variable to optimize over for each program candidate and running it over a set of trials. The outputted compiled program mirrors the automatic selection of few-shot examples in the prompt.

I'm not at all sure that this is right. I couldn't follow the KNN code in the repo, so I just assumed that dspy was trying to cover the space of possible examples by picking centers of different clusters.

The KNNFewShot optimizer essentially clusters the provided set of training examples and applies the fewshot optimization given this example clustering. Feel free to check out this reference notebook for how to use it!

"compiled_knn = knn_teleprompter.compile(BasicQABot(), trainset=trainset)"

Wouldn't it make sense to simply use LabeledFewShot with k set to use all of the demos?

This may lead to some overfitting and BootstrapFewShot in facts covers this with max_labeled_demos but also provides bootstrapped examples from the model to offer more model-representative behavior in the compiled prompt. In the case of less examples, it may make more sense to use a model with larger parameters in compile-time to have more accurate bootstrapped examples, and then use a smaller model in inference time with this learned behavior.

The following example says that "we want to "bootstrap" (i.e., self-generate) 8-shot examples of your program's steps."  But won't it actually give
     6 demonstrations, 3 taken from the examples (max_labeled_demos=3) and 3 self-generated (max_bootstrapped_demos=3)?  Also, aren't the defaults
     16 labeled + 4 bootstrapped for a total of 20-shot prompting awfully high?

Fixed the typo. The defaults are just configurations used during experiments in the paper and may not be appropriate for all use cases, hence left configurable and as maximums.

QUESTION: What is the meaning of self.validation and self.valset? Why is it that valset
overrides validation if it is supplied? What is the relationship between the valset
parameter to compile and the trainset parameter? I note that none of the examples in the
docs seem to use this parameter.

valset is simply if you have a validation split from a trainset that you would like to optimize the program on and can be particularly useful in BootstrapFewShotWithRandomSearch when determining scores over a set of candidates. This is not to be confused with an "evalset"! This is left as an optional parameter if the user wants to explore this validation split or else the optimization takes care of it with a randomized selection of train examples to bootstrap and validate on.

docs/docs/building-blocks/6-optimizers.md Outdated Show resolved Hide resolved
docs/docs/building-blocks/6-optimizers.md Outdated Show resolved Hide resolved
docs/docs/building-blocks/6-optimizers.md Outdated Show resolved Hide resolved
@rpgoldman
Copy link
Contributor Author

Hi @rpgoldman , thanks for the contributions to the documentation. These are much needed!

The diagram is great. Could you remove the other image formats and add just the png to the relevant documentation as is done for images like here:

Would it be OK to retain the .dot file, since that is the source of all the other formats? Would it help to put a comment into the Markdown file to explain how I generated the dot file? I've removed the pdf file for now.

I made a pass over the documentation and made some corrections. Also removed the in-line comments/questions and moved them here. Feel free to follow up on any if needed.

I marked 2 comments that weren't questions, just explanations, that I thought might be worth maintaining.

TBQH, I don't understand how Optuna does this. As far as I can tell it simply chooses best based on multiple evaluations, rather than a single one, and mention of "hyperparameters" seems to be a red herring.

Optuna is similar to the BootstrapFewShotWithRandomSearch optimization, simply replacing the random search optimization with the Optuna objective optimization, treating the candidate score as the variable to optimize over for each program candidate and running it over a set of trials. The outputted compiled program mirrors the automatic selection of few-shot examples in the prompt.

See my comment on the file -- even though Optuna is a hyperparameter optimizer, it doesn't look like dspy uses it for that purpose here. It looks like it's just optimizing the choice of examples, which isn't a hyperparameter.

I'm not at all sure that this is right. I couldn't follow the KNN code in the repo, so I just assumed that dspy was trying to cover the space of possible examples by picking centers of different clusters.

The KNNFewShot optimizer essentially clusters the provided set of training examples and applies the fewshot optimization given this example clustering. Feel free to check out this reference notebook for how to use it!

It wasn't clear to me what the purpose of the clustering was. That's what I was trying to explain -- does dspy use the clusters as I suggested, to make sure that the space is covered by choosing elements from different clusters, instead of choosing a bunch of examples from a single cluster.

QUESTION: What is the meaning of self.validation and self.valset? Why is it that valset
overrides validation if it is supplied? What is the relationship between the valset
parameter to compile and the trainset parameter? I note that none of the examples in the
docs seem to use this parameter.

valset is simply if you have a validation split from a trainset that you would like to optimize the program on and can be particularly useful in BootstrapFewShotWithRandomSearch when determining scores over a set of candidates. This is not to be confused with an "evalset"! This is left as an optional parameter if the user wants to explore this validation split or else the optimization takes care of it with a randomized selection of train examples to bootstrap and validate on.

One thing I still don't understand is why the term valset is used for the argument instead of devset. I will see about tweaking the docstring to clarify according to your explanation, but it might be helpful to say why this new term is introduced.

@rpgoldman
Copy link
Contributor Author

P.S. I don't know what the Ruff fix is, I'm afraid. If there's a pointer somewhere that explains it, please let me know.

@rpgoldman
Copy link
Contributor Author

The KNNFewShot optimizer essentially clusters the provided set of training examples and applies the fewshot optimization given this example clustering. Feel free to check out this reference notebook for how to use it!

"compiled_knn = knn_teleprompter.compile(BasicQABot(), trainset=trainset)"

This notebook refers to "kNN Few-Shot":

This notebook shows how KNN few-shot can be implemented...

I figure it would help to add a reference. Do you know if this article is what that refers to? If so, I could add that link to the notebook in this MR

@arnavsinghvi11
Copy link
Collaborator

Would it be OK to retain the .dot file, since that is the source of all the other formats? Would it help to put a comment into the Markdown file to explain how I generated the dot file? I've removed the pdf file for now.

Is it important for the documentation to keep the .dot file? I think it would be best to only include final product images on the repo, as done with other documentation this way (

![Dataset Loading Process in HotPotQA Class](./img/data-loading.png)
). (We'd like to avoid adding too much non-code related files besides the hosted dspy-docs subtree).

P.S. I don't know what the Ruff fix is

Running ruff check . --fix-only and pushing will fix it!

It wasn't clear to me what the purpose of the clustering was. That's what I was trying to explain -- does dspy use the clusters as I suggested, to make sure that the space is covered by choosing elements from different clusters, instead of choosing a bunch of examples from a single cluster.

Yes, DSPy uses the KNN technique to pick a diverse set of examples from different clusters and then optimize using FewShot with examples pre-optimized using KNN (making the bootstrapping process stronger). This will be more useful when there's a lot of data over random spaces and using KNN helps optimize the trainset using for BootstrapFewShot (related to #77). The notebook details this with an example of DSPy KNN few-shot.

One thing I still don't understand is why the term valset is used for the argument instead of devset. I will see about tweaking the docstring to clarify according to your explanation, but it might be helpful to say why this new term is introduced.

I think this is also a bit semantics-related and can remain unchanged for now, unless there is a strong reason to change otherwise (and will likely need refactoring across the rest of the repo if so).

@rpgoldman
Copy link
Contributor Author

Would it be OK to retain the .dot file, since that is the source of all the other formats? Would it help to put a comment into the Markdown file to explain how I generated the dot file? I've removed the pdf file for now.

Is it important for the documentation to keep the .dot file? I think it would be best to only include final product images on the repo, as done with other documentation this way (

![Dataset Loading Process in HotPotQA Class](./img/data-loading.png)

). (We'd like to avoid adding too much non-code related files besides the hosted dspy-docs subtree).

Done!

P.S. I don't know what the Ruff fix is

Running ruff check . --fix-only and pushing will fix it!

Done! I see now that it's a linter.

Add a comment to the markdown to explain how the class diagram was
generated, so that it can be updated as more teleprompters are added.
@rpgoldman
Copy link
Contributor Author

I added a comment to the markdown to explain the process of generating the class hierarchy figure, so that it can be updated later.

- Add key ideas from Arnav's KNN explanation (in the issue) and
- Clarify that only MIPRO optimizes the demonstration set.
@rpgoldman
Copy link
Contributor Author

One thing I still don't understand is why the term valset is used for the argument instead of devset. I will see about tweaking the docstring to clarify according to your explanation, but it might be helpful to say why this new term is introduced.

I think this is also a bit semantics-related and can remain unchanged for now, unless there is a strong reason to change otherwise (and will likely need refactoring across the rest of the repo if so).

I think it would be best to simply note this deviation from the otherwise standard use of "devset" somewhere in the documentation. If one wanted to do more, I'd say just introduce devset as an alternative parameter name, and bind valset to the value of the devset parameter if supplied. In the best of all possible worlds, I'd suggest trying to make the usage consistent across the library, but this is only a minor point.

@rpgoldman
Copy link
Contributor Author

If you are happy with what's there now, I think it's ok to merge.

@rpgoldman
Copy link
Contributor Author

@arnavsinghvi11 Your explanation of KNN was very helpful; I pulled a couple of sentences into the Markdown.

@rpgoldman
Copy link
Contributor Author

P.S. The pointer to the KNN notebook probably should go somewhere else, but I suggest keeping it here until there's a page for the KNN optimizer added to the Teleprompters/Optimizers section of the "Deep Dive."

@rpgoldman rpgoldman marked this pull request as ready for review May 7, 2024 03:10
@rpgoldman
Copy link
Contributor Author

Sorry, forgot to clear the "Draft" flag.



## What DSPy Optimizers are currently available?

<!-- The following diagram was generated by: -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give yourself credit! :)

@arnavsinghvi11
Copy link
Collaborator

Thanks @rpgoldman for this amazing PR on documentation. Left a small comment for you to give yourself credit for the PNG and should be ready to merge. (I left the comments you had for generating the PNG since it makes sense for that process, but lmk if you wanted to remove that before merging).

@arnavsinghvi11 arnavsinghvi11 merged commit bcf47c8 into stanfordnlp:main May 11, 2024
4 checks passed
@arnavsinghvi11
Copy link
Collaborator

Thanks @rpgoldman !

arnavsinghvi11 added a commit that referenced this pull request Jul 12, 2024
Added more details to the discussion of optimizers/teleprompters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants