Redis database grew too big, causing general Redis troubles #271

mfn · 2018-01-04T16:16:58Z

Today we had the following issue:

our Redis clients couldn't sporadically connect to Redis itself
upon inspection we found out the redis database holding horizon had > 1 million keys
it's size accounted to ~960MB (ie. 99%) of our whole Redis database

Errors we received were:

Connection closed
Redis::pconnect(): connect() failed: Connection timed out
RedisException: read error on connection

(note: this error were recorded from non-Laravel PHP based applications; i.e. as explained the below, the Horizon database size seemed to affect Redis as a whole).

What we did:

stopped horizon via supervisor
ran flushdb in the horizon database (Achtung: be sure you've selected the right database)
Running that command took 40 seconds
started horizon via supervisor

In our case the Horizon Redis database was number 2, here's the output from redis internal info command:

db2:keys=1192438,expires=22,avg_ttl=89314947

The avg_ttl looks suspiciously high.

Usually when inspecting such problems, I use the keys * command but I didn't dare to run it as it would block redis completely during running it and we couldn't do this in production.

As such, at this time we don't have any information what keys where in there.

We can definitely exclude other applications having written to the same database; it's exclusively used by Horizon.

Our configuration:

13 queues
38 workers
~60 jobs per Minute
we've (had) enabled monitoring for each job (each job has a default queue)
horizon:snapshot running every 5 minutes

Does anyone have a clue what could cause this?
We were approximately running this since 2 months in production now.

The text was updated successfully, but these errors were encountered:

mfn · 2018-01-05T09:32:36Z

After this "reset" yesterday, todays INFO output for that database looks much saner:

db2:keys=4312,expires=4265,avg_ttl=1648948

When I look at the output from yesterday again, something feels very off here: keys=1192438,expires=22 => Only 22 keys expected to expire, from over 1Million (If I read that output correct).

mfn · 2018-01-06T07:18:43Z

Todays INFO output:

db2:keys=889,expires=848,avg_ttl=1530403

I'm going to watch this a few days.

One thing I remember we did before the problem: we've enabled lots of "Monitor Tags". Basically one for every job-type we have.

We didnd't do this yet after we purged the database.

Can this be connected?

fgilio · 2018-07-29T20:16:59Z

I arrived here after having all out "Monitor Tags" disappear, and I think it might be related to this.

We're processing hundreds of jobs per minute and the Monitoring tab would show thousands (some around 100k) of entries in the Jobs column, and now there's no tag being monitored.

EDIT: Maybe Horizon could add a counter type of monitor. So it'd only increment the counter, instead of keeping a record of all the jobs. At least that's what I needed in this case.

mfn · 2018-07-30T06:05:21Z

We've never enabled "tagging" for each job and the problem never appeared. I didn't bother to investigate further as we really didn't need the detailed metrics (it was just "nice to have").

ndberg · 2018-09-13T08:44:36Z

Had the same problem occuring already three times. The horizon redis database is always growing. Has anybody a solution for this?

mfn · 2018-09-13T08:54:37Z

@ndberg after we disabled tagging, we never had this problem again. But OTOH: I don't recall creating so many jobs at once again either (>1 mio)

ndberg · 2018-09-13T11:11:58Z

So I should test it with disabling tagging.. I have used tags for alle Jobs, and I have a similar environment as you, with less queues and workers:

3 queues
5 workers
~1'000 Jobs / day
enabled monitoring for each job (each job has a default queue)
horizon:snapshot running every 5 minutes

driesvints · 2018-10-26T08:15:38Z

This could be solved by #333. I'll keep this open for now so we don't lose track of it.

mfn mentioned this issue Oct 25, 2018

Job trimming for monitored tags #333

Closed

driesvints added the bug label Oct 26, 2018

driesvints mentioned this issue Jan 29, 2019

[2.0] Expire monitored jobs #484

Merged

taylorotwell closed this as completed in 6879f72 Jan 30, 2019

mfn mentioned this issue Jul 1, 2019

'Memory Exhausted' when removing monitored tag #625

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis database grew too big, causing general Redis troubles #271

Redis database grew too big, causing general Redis troubles #271

mfn commented Jan 4, 2018

mfn commented Jan 5, 2018

mfn commented Jan 6, 2018

fgilio commented Jul 29, 2018 •

edited

Loading

mfn commented Jul 30, 2018

ndberg commented Sep 13, 2018

mfn commented Sep 13, 2018

ndberg commented Sep 13, 2018

driesvints commented Oct 26, 2018

Redis database grew too big, causing general Redis troubles #271

Redis database grew too big, causing general Redis troubles #271

Comments

mfn commented Jan 4, 2018

mfn commented Jan 5, 2018

mfn commented Jan 6, 2018

fgilio commented Jul 29, 2018 • edited Loading

mfn commented Jul 30, 2018

ndberg commented Sep 13, 2018

mfn commented Sep 13, 2018

ndberg commented Sep 13, 2018

driesvints commented Oct 26, 2018

fgilio commented Jul 29, 2018 •

edited

Loading