feat(dspy): Experiment with adding image data with GPT-4o and Gemini #1099
+108
−29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for Vision data for various LLM vendor (Gemini, GPT, Azure OpenAI GPT).
This implements feature requested in #624
This adds the
is_image
property toInputField
. We expect this image data to be encoded in Base64 JPG - this is the format expected the vendors listed above, and also will be easy to serialize / transform to other format.This should be compatible with existing APIs and should not cause breakage.
To manually test this, run:
Design notes
is_audio
to handle other modal).MIPro
optimizer is updated to allow theexample_stringify_fn
since otherwise those images is going to take up large amount of input context for themipro_optimizer.DatasetDescriptor
signature and run out of context. See below.For
MIRPO
: Usingexample_stringify_fn
signature to handle large contextMost LLM only understand images which is passed in a separate content chunk, which makes referencing the example image within prompt difficult. Hypothesising one way to help is to try to provide some alt text and prompting models to describe the alt text during the chain of thought (such alt-text can be generated potentially in previous automated steps).
Also that with an image example data (which is large and frequently exceed context window limit) - it can cause MIPRO prompt to go over the context window size.
Therefore we provide the
example_stringify_fn
function which allow custom expressing the example.Now you can call MIPRO with: