Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(llm): check multi modal input with provider and fix cost calculation #13445

Merged
merged 6 commits into from
Aug 16, 2024

Conversation

fffonion
Copy link
Contributor

@fffonion fffonion commented Aug 5, 2024

Summary

For multi modal inputs (openai or bedrock), the input format is changed from:

messages=[
    {
      "role": "user",
      "content": "What's an apple?
    }
]

to

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
]

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

AG-61

@fffonion fffonion changed the title Multi model inpuit fix(llm): check multi modal input with provider and fix cost calculation Aug 5, 2024
@fffonion
Copy link
Contributor Author

fffonion commented Aug 5, 2024

needs tests

@fffonion fffonion added this to the 3.8.0 milestone Aug 8, 2024
@team-eng-enablement team-eng-enablement added author/community PRs from the open-source community (not Kong Inc) and removed author/community PRs from the open-source community (not Kong Inc) labels Aug 13, 2024
local content
if type(v.content) == "table" then
content = v.content
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker. Are we going to keep compatible to the old format?

Copy link
Contributor Author

@fffonion fffonion Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both formats are not deprecated, but right not only openai and bedrock supports the multi-modal format as well as the non multi-modal formats ("old format")

@fffonion fffonion merged commit acbbdf3 into master Aug 16, 2024
26 checks passed
@fffonion fffonion deleted the multi-model-inpuit branch August 16, 2024 13:12
@team-gateway-bot
Copy link
Collaborator

Successfully created cherry-pick PR for master:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants