Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] opensearch-with-long-numerals blocks and times out from Discover page #6377

Closed
lyradc opened this issue Apr 9, 2024 · 9 comments · Fixed by #6915
Closed

[BUG] opensearch-with-long-numerals blocks and times out from Discover page #6377

lyradc opened this issue Apr 9, 2024 · 9 comments · Fixed by #6915
Assignees
Labels
bug Something isn't working needs research

Comments

@lyradc
Copy link

lyradc commented Apr 9, 2024

Describe the bug

When attempting to view logs on the Discover page with long-numerals the Kibana instance will fail to respond to requests, causing a failure in health checks as well as returning 500 in Kibana logs.

When attempting the same query from DevTools or outside Kibana (curl) no error is returned and results are supplied.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Discover page
  2. Create search to include logs with long-numerals
  3. Kibana will stop responding for ~5 min.

Kibana logs after ~5min delay of no new logs...

{"type":"log","@timestamp":"2024-04-08T23:14:18Z","tags":["error","opensearch","data"],"pid":1,"message":"[DeserializationError]: Maximum call stack size exceeded"}
{"type":"response","@timestamp":"2024-04-08T23:09:18Z","tags":[],"pid":1,"method":"post","statusCode":500,"req":{"url":"/internal/search/opensearch-with-long-numerals","method":"post","headers":{"x-forwarded-for":"x.x.x.x","x-forwarded-proto":"https","x-forwarded-port":"443","host":"kibana.url.com","x-amzn-trace-id":"Root=1-6614791e-105ca136348f5fd27136bf8f","content-length":"2220","sec-ch-ua":"\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"","content-type":"application/json","osd-xsrf":"osd-fetch","sec-ch-ua-mobile":"?0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","osd-version":"2.13.0","sec-ch-ua-platform":"\"Windows\"","accept":"*/*","origin":"https://kibana.url.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://kibana.url.com/app/data-explorer/discover","accept-encoding":"gzip, deflate, br, zstd","accept-language":"en-US,en;q=0.9","securitytenant":"tenant"},"remoteAddress":"x.x.x.x","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","referer":"https://kibana.url.com/app/data-explorer/discover"},"res":{"statusCode":500,"responseTime":300915,"contentLength":9},"message":"POST /internal/search/opensearch-with-long-numerals 500 300915ms - 9.0B"}

Expected behavior
Kibana is expected to continue responding to other requests while processing.
Kibana is expected to not throw a 500 error.

OpenSearch Version
2.13.0

Dashboards Version
2.13.0

Host/Environment (please complete the following information):

  • OS: CentOS7 from docker image
  • Chrome 123.0.6312.106
@lyradc lyradc added bug Something isn't working untriaged labels Apr 9, 2024
@ananzh ananzh self-assigned this Apr 9, 2024
@ananzh ananzh removed the untriaged label Apr 9, 2024
@ananzh ananzh assigned AMoo-Miki and unassigned ananzh Apr 9, 2024
@lyradc
Copy link
Author

lyradc commented Apr 10, 2024

After more testing I was able to find a work around along with some details that might be helpful.

I was able to duplicate the issue by pushing 200 documents into an index. Each had a single field, initially a string containing JSON which was around 15k characters in length.
When viewing these in Discover I watched the Kibana logs for POST /internal/search/opensearch-with-long-numerals to get the time taken before Discover rendered or errored.
Initially this took 37541ms

I repeated this with the single large field only containing "A" repeated 15k times. This completed in 96ms.

I then removed characters and captured the time taken in opensearch-with-long-numerals.

Each iteration built on the last. For example in the second pass of removing characters, []'"/\, {} were left out. In the last pass all characters listed in the below table were removed from the original JSON string.

chars removed duration ms saved
37541
{} 36983 558
[]'"/\ 27350 9633
| 27350 0
: 25191 2159
, 86 25105

With this in mind I only replaced commas (,) with :: from the original JSON strings. This then only ran for 665ms; from the initial delay of 37541ms this was a marked improvement.

I then replaced commas with pipes on 800 documents and opensearch-with-long-numerals began returning 500 error in 789ms.

opensearch-with-long-numerals is still blocking when in progress which is not ideal. Although with this character replacement it can fail in under 1 second instead of 37 seconds or longer which allows Kibana a larger margin to respond to health checks.

@AMoo-Miki
Copy link
Collaborator

These finding are great. I will use this to figure out the bottleneck.

@AMoo-Miki
Copy link
Collaborator

@lyradc the source of the exception is the opensearch-js client which uses secure-json-parse which adds some overhead. However, long-numerals also adds some overhead. In order to speed up my investigation, would you be able to share one of the documents you use for testing?

@AMoo-Miki
Copy link
Collaborator

@lyradc I received the payload; thanks a lot for sending it over. I will dig more and get back to you in a few days.

@rlueckl
Copy link

rlueckl commented Apr 15, 2024

As I wrote in: #6134 (comment) try it with a very long logline which contains this type of message.

Here's an example from our Cassandra: cassandra_paxos_example.json

If you have a message like this 2 or 3 times in your time range, you'll definitely see that Dashboards hangs for a very long time.

@AMoo-Miki AMoo-Miki removed their assignment May 1, 2024
@rlueckl
Copy link

rlueckl commented May 21, 2024

Any update here? 2.14.0 was released last week, but it is still broken. I've just tested it with the example from my previous post.

@lsoumille
Copy link

We are facing exactly the same issue on our OpenSearch setup. That would be great to have a solution for this issue.

@bbfoto
Copy link

bbfoto commented Jun 2, 2024

seems in 2.14 it simply doesnt return a hit, fails silently.
query in dashboards/discover for an _id of doc known to have a long int in message field.

       "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        },

same query from dev-tools

  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "###########-000163",
        "_id": "5rY_yo8BUoZvL84giDht",
        ...

@AMoo-Miki
Copy link
Collaborator

I have made a new package named JSON11 for handling long numerals. opensearch-project/opensearch-js#784 will add that to the JS client and then OSD will adopt it with the appropriate code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants