Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genai: fix pydantic structured_output with array #469

Merged
merged 3 commits into from
Oct 7, 2024

Conversation

nobu007
Copy link
Contributor

@nobu007 nobu007 commented Aug 28, 2024

PR Description

Fix pydantic structured_output with array

Relevant issues

#24225
langchain-ai/langchain#24225

Type

🐛 Bug Fix

Testing(optional)

This PR can pass like here.

class TestClass(BaseModel):
    """_summary_

    Args:
        BaseModel (_type_): _description_
    """
    test_val: str = Field(default="", description="value for test(works).")
    test_val_list: List[str] = Field(
        default_factory=list,
        description="values for test(not works).",
    )

Test result

jinno@jinno-desktop:~/git/drill/gamebook/codeinterpreter_api_agent/langchain-google/libs/genai$ poetry run pytest --extended tests/integration_tests/test_chat_models.py::test_chat_vertexai_gemini_function_calling
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.10, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/jinno/git/drill/gamebook/codeinterpreter_api_agent/langchain-google/libs/genai
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 1 item                                                                                                                                 

tests/integration_tests/test_chat_models.py .                                                                                              [100%]

================================================================ warnings summary ================================================================
../../../../../../../.cache/pypoetry/virtualenvs/langchain-google-genai-ov3b6nnP-py3.10/lib/python3.10/site-packages/_pytest/config/__init__.py:1373
  /home/jinno/.cache/pypoetry/virtualenvs/langchain-google-genai-ov3b6nnP-py3.10/lib/python3.10/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Note1

This code shoud be updated later.
_set_schema_items() can't create correct function-calling scema.
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling?hl#schema

Note2

This PR may be fix #492.
integration_test filed by this issue.

@nobu007 nobu007 force-pushed the fix_with_structured_output branch 4 times, most recently from 8408eb7 to e0d26db Compare August 28, 2024 19:25
@lkuligin
Copy link
Collaborator

could you add a unit test too, please?

@nobu007 nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 2f7e53d to 2ecdfa3 Compare August 30, 2024 14:25
@nobu007
Copy link
Contributor Author

nobu007 commented Aug 30, 2024

@lkuligin
I added condition for unittest and integtest.
Please confirm it.

poetry run pytest --extended tests/integration_tests/test_chat_models.py::test_chat_vertexai_gemini_function_calling

@nobu007 nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 259c9d3 to d25395f Compare September 1, 2024 05:23
return TYPE_ENUM[stype]
else:
pass
return _get_type_from_str(stype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maybe just have a one-liner here instead creating a separate function?
TYPE_ENUM.get(stype, "str")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkuligin
I fixed as one-liner.

@TannerW
Copy link

TannerW commented Sep 16, 2024

Any progress on this?

@nobu007 nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 8ec641c to 056cfdf Compare September 20, 2024 16:09
@nobu007
Copy link
Contributor Author

nobu007 commented Sep 20, 2024

@lkuligin
#506 will fix integration_test.
So I think #469 can be merged.
It looks randomly failed. I retried but failed again.
If passing integration_test is needed, please merge #506 first.

@nobu007 nobu007 force-pushed the fix_with_structured_output branch 3 times, most recently from 8d08df9 to 9212f29 Compare September 22, 2024 12:21
@nobu007
Copy link
Contributor Author

nobu007 commented Sep 22, 2024

@lkuligin
integration_test was caused by #469, but it is fixed now.

@tudoanh
Copy link

tudoanh commented Sep 30, 2024

Encountered this issue too

model = ChatGoogleGenerativeAI(model=_MODEL, safety_settings=safety).bind_tools(
[MyModel]
)
response = model.invoke([message])
print("response=", response)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: we probably don't need it anymore, do we?

@lkuligin lkuligin merged commit 5594bc1 into langchain-ai:main Oct 7, 2024
15 checks passed
@RafaelMCarvalho
Copy link

Thanks for the contribution.

I've noticed we still have issues when there's an array like this:

class Person(BaseModel):
    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )

class People(BaseModel):
    people: List[Person]

So the following code raised some spec ignoring warning and the result leads to invalid Pydantic input:

llm = ChatGoogleGenerativeAI(model='gemini-1.5-pro', api_key=settings.google_api_key, temperature=0.2)
query = "Anna is 23 years old and she is 6 feet tall; John is 34 and about 171cm"
prompt = ChatPromptTemplate.from_messages([("system", "Answer the user query."), ("human", "{query}") ])
chain = prompt | llm.with_structured_output(People, include_raw=True)
# WARNING OUTPUT:
# Value 'Information about a person.' is not supported in schema, ignoring v=Information about a person.
# Value '['name', 'height_in_meters']' is not supported in schema, ignoring v=['name', 'height_in_meters']
# Value 'Person' is not supported in schema, ignoring v=Person
# Value 'object' is not supported in schema, ignoring v=object
# Value 'Information about a person.' is not supported in schema, ignoring v=Information about a person.
# Value '['name', 'height_in_meters']' is not supported in schema, ignoring v=['name', 'height_in_meters']
# Value 'Person' is not supported in schema, ignoring v=Person
# Value 'object' is not supported in schema, ignoring v=object

chain.invoke({"query": query})
# OUTPUT:
# {'name': 'People', 'parameters': {'type_': 6, 'properties': {'people': {'type_': 5, 'items': {'type_': 1, 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0', 'properties': {}, 'required': []}, 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0', 'properties': {}, 'required': []}}, 'required': ['people'], 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0'}, 'description': ''}
# {'raw': AIMessage(content='', additional_kwargs={'function_call': {'name': 'People', 'arguments': '{"people": ["Anna", "John"]}'}}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}]}, id='run-d4f260cb-1ec2-49a9-befc-e99b8ba235de-0', tool_calls=[{'name': 'People', 'args': {'people': ['Anna', 'John']}, 'id': 'df592d4a-a4a7-48b3-a212-983a97e100d1', 'type': 'tool_call'}], usage_metadata={'input_tokens': 65, 'output_tokens': 16, 'total_tokens': 81}), 'parsing_error': 2 validation errors for People
# people.0
#   Input should be a valid dictionary or instance of Person [type=model_type, input_value='Anna', input_type=str]
#     For further information visit https://errors.pydantic.dev/2.8/v/model_type
# people.1
#   Input should be a valid dictionary or instance of Person [type=model_type, input_value='John', input_type=str]
#     For further information visit https://errors.pydantic.dev/2.8/v/model_type, 'parsed': None}

Is it a Gemini limitation that I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants