Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: The Language Hindi is not fully supported as garbled text displayed in the output #1778

Closed
hk-rajeev opened this issue Jun 27, 2024 · 1 comment
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug

Comments

@hk-rajeev
Copy link

How are you running AnythingLLM?

Docker (local)

What happened?

Bug Report: Incorrect Display of Hindi Text in LLM Responses

Summary:
It is not displaying Hindi text correctly in its responses. The text contains incorrect characters and formatting issues, making it unreadable and inaccurate.

Details:
When asking for a joke in Hindi, the response given by the LLM contains garbled text and incorrect characters. Here is an example of the issue:

Expected Response in Hindi:
"तीन चिकने होकर एक चिकनी के पास आते हैं और उसने उनसे पूछा - आप क्या चाहते हो? चिकने बोले - हम एक चिकनी को पकाना चाहते हैं। चिकनी ने बोली - अच्छा, फिर आप लोग एक दूसरे को पका लो और मैं बाकी रह जाऊंगी।"

Actual Response received:
"तीन चिकने होकर एक चिकनी के पास आते हैं और उसने उनसे पooNT पूछा - आप क्या चाहते हो? चिकने bolE - हम एक चिकनी को पकाना चाहते हैं। चिकनी ने बोली - अच्छा, फिर आप लोग एक दूसरे को पका लो और मैं बाकी रह जाऊंगी।"

Issues Observed:

  1. Incorrect characters: "पooNT" should be "पूछा", "bolE" should be "बोले".
  2. Random insertion of English characters.
  3. Incorrect formatting that disrupts the readability of the text.

Steps to Reproduce:

  1. Query the LLM with a request for a joke in Hindi.
  2. Observe the response for any garbled or incorrect characters.

Expected Outcome:
The LLM should provide correctly formatted Hindi text without any garbled characters or random English insertions. The response should be:

Please prioritize this bug as it affects the usability of the application for Hindi-speaking users.

image

Are there known steps to reproduce?

Steps to Reproduce:

  1. Query the LLM with a request for a joke in Hindi.
  2. Observe the response for any garbled or incorrect characters.
@hk-rajeev hk-rajeev added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jun 27, 2024
@hk-rajeev hk-rajeev changed the title [BUG]: The Language hind is not fully supported or say displayed in the output [BUG]: The Language hind is not fully supported as garbled text displayed in the output Jun 27, 2024
@hk-rajeev hk-rajeev changed the title [BUG]: The Language hind is not fully supported as garbled text displayed in the output [BUG]: The Language Hindi is not fully supported as garbled text displayed in the output Jun 27, 2024
@timothycarambat
Copy link
Member

Are you sure that this is even AnythingLLM and not the actual output from your LLM? We dont apply a charset transformer on top of outputs - we just store what is given back from the LLM buffers

@timothycarambat timothycarambat added needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Jun 28, 2024
@timothycarambat timothycarambat closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug
Projects
None yet
Development

No branches or pull requests

2 participants