Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High load and internal server error on garbage in logs #3275

Closed
marvin-sinister opened this issue Jan 30, 2021 · 3 comments · Fixed by #3347
Closed

High load and internal server error on garbage in logs #3275

marvin-sinister opened this issue Jan 30, 2021 · 3 comments · Fixed by #3347
Assignees
Milestone

Comments

@marvin-sinister
Copy link

marvin-sinister commented Jan 30, 2021

Describe the bug
I'm using "Web Analytics Dashboard for NGINX" in grafana, with Loki as source. There are request displayed in recent requests panel. When a garbage request with unicode characters is present in logs, loki reports error and consumes multiple cores of CPU.

The query is:

{filename="/var/log/nginx/json_access.log", host="$host"} | json | line_format "request for {{.request_uri}} with HTTP status: {{.status}} "

When the request contains garbage i get an error.

{
  "response": {
    "error": "Internal Server Error",
    "response": "err while creating labelset for {args=\"\", body_bytes_sent=\"157\", bytes_sent=\"309\", connection=\"389\", connection_requests=\"1\", filename=\"/var/log/nginx/json_access.log\", geoip_country_code=\"GB\", gzip_ratio=\"\", host=\"zarss.sinister.io\", http_cf_ray=\"\", http_host=\"\", http_referer=\"\", http_user_agent=\"\", http_x_forwarded_for=\"\", job=\"nginx\", msec=\"1612042321.497\", pid=\"23157\", pipe=\".\", remote_addr=\"89.248.172.90\", remote_port=\"51636\", remote_user=\"\", request=\"\\x04\\x01\\x00P3OU\\xbe0\\x00\", request_id=\"0c29c0dc5e6a163639277b73ab96cf26\", request_length=\"0\", request_method=\"\", request_time=\"3.253\", request_uri=\"\", scheme=\"http\", server_name=\"server.tld\", server_protocol=\"\", ssl_cipher=\"\", ssl_protocol=\"\", status=\"400\", time_iso8601=\"2021-01-30T22:32:01+01:00\", time_local=\"30/Jan/2021:22:32:01 +0100\", upstream=\"\", upstream_cache_status=\"\", upstream_connect_time=\"\", upstream_header_time=\"\", upstream_response_length=\"\", upstream_response_time=\"\"}: 1:429: parse error: error unquoting string \"\\\"\\\\x04\\\\x01\\\\x00P3OU\\\\xbe0\\\\x00\\\", request_id=\\\"\": invalid syntax\n",
    "message": "Internal Server Error"
  }
}

The log line in this instance is:

{"msec": "1612042321.497", "connection": "389", "connection_requests": "1", "pid": "23157", "request_id": "0c29c0dc5e6a163639277b73ab96cf26", "request_length": "0", "remote_addr": "89.248.172.90", "remote_user": "", "remote_port": "51636", "time_local": "30/Jan/2021:22:32:01 +0100", "time_iso8601": "2021-01-30T22:32:01+01:00", "request": "\u0004\u0001\u0000P3OU<BE>0\u0000", "request_uri": "", "args": "", "status": "400", "body_bytes_sent": "157", "bytes_sent": "309", "http_referer": "", "http_user_agent": "", "http_x_forwarded_for": "", "http_host": "", "server_name": "server.tld", "request_time": "3.253", "upstream": "", "upstream_connect_time": "", "upstream_header_time": "", "upstream_response_time": "", "upstream_response_length": "", "upstream_cache_status": "", "ssl_protocol": "", "ssl_cipher": "", "scheme": "http", "request_method": "", "server_protocol": "", "pipe": ".", "gzip_ratio": "", "http_cf_ray": "","geoip_country_code": "GB"}

The regular log is:

89.248.172.90 - - [30/Jan/2021:22:32:01 +0100] "\x04\x01\x00P3OU\xBE0\x00" 400 157 "-" "-" "3.253" "-" "GB" "United Kingdom" "-"

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki (version=master-c3d1937, branch=master, revision=c3d1937e)
  2. Started Promtail (version 2.1.0 (branch: HEAD, revision: 1b79df3)
  3. Query: {filename="/var/log/nginx/json_access.log", host="$host"} | json | line_format "request for {{.request_uri}} with HTTP status: {{.status}} "
  4. Error in grafana and high CPU load

Expected behavior
Some kind of failover, either properly displayed unicode string in browser, or just blank string. No unusual CPU load.

Environment:

  • Infrastructure: grafana/loki running in container on laptop, promtail on remote vps
  • Deployment tool: podman

Note:
This only happens if log_format is defined with escape=json in nginx, if i use escape=default then i get JSONParserErr in __error__ label, and most fields don't get parsed.

@xal3xhx
Copy link

xal3xhx commented Feb 15, 2021

Can confirm I’m having this issue as well, when it last occurred I did not think it would have been the escaped characters as the dashboard caught that stoped working until I removed that bad entire

@xal3xhx
Copy link

xal3xhx commented Feb 16, 2021

just watched this error happen first hand, cpu usage appears to be low at first occurrence but with increase over about 20 seconds till the cpu is pinned

@cyriltovena cyriltovena self-assigned this Feb 17, 2021
@cyriltovena cyriltovena added this to the 2.2 milestone Feb 17, 2021
@cyriltovena
Copy link
Contributor

We found the issue. A fix is on its way.

cyriltovena added a commit to cyriltovena/loki that referenced this issue Feb 17, 2021
I've also added a test to prove the issue was happening and now is fixed.

Fixes grafana#3275
Fixes grafana#3264
Fixes grafana#3020

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
owen-d pushed a commit that referenced this issue Feb 18, 2021
…3347)

I've also added a test to prove the issue was happening and now is fixed.

Fixes #3275
Fixes #3264
Fixes #3020

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants