-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory error on deployment with millions of files in PGDATA #738
Comments
Hi! Can you please rerun upload with WALG_LOG_LEVEL=DEBUG? wal-g/internal/backup_push_handler.go Line 282 in 7c2aa35
probably there are better ways to format and send large JSON |
if this marshaling is the only thing that prevents you from backup - we should use stream Encoder https://golang.org/pkg/encoding/json/#Encoder |
It would be best to run #740 if here will be green test results https://travis-ci.org/github/wal-g/wal-g/builds/720630361 I'm going to the bar currently. Not sure if i'll take my laptop with me. |
#740 seems to be broken yet... |
@x4m I think I managed to build your branch
Running backup now |
@x4m It failed once more unfortunately, actually it seemed to fail quicker and use more memory now; Log attached. Re-running with |
Can you plz try backup-push with WALG_UPLOAD_CONCURRENCY set to 1? |
Also plz use WALG_COMPRESSION_METHOD=brotli |
I've checked logs, it seems that backup upload was 2x faster than in previous time. |
@x4m Will do, running backup now with |
It got to the end this time (calling Memory graph; Stack trace; |
Meh.... Encoder buffers all... I'm looking for other workarounds... |
We could switch to https://github.com/francoispqt/gojay probably... |
plz test if green https://travis-ci.org/github/wal-g/wal-g/builds/720967236 |
@x4m Awesome! Thats some quick work - running backup now with compiled with your latest changes
|
@MannerMan sorry, 740 is broken again |
@x4m Alright, no problem - I stopped the running backup meanwhile |
so far I've considered easyjson and that strange library, all they are not enough robust to fit for sentinel of every supported in wal-g db.... |
@MannerMan can you plz try #741 ? |
@x4m Testing now, running backup
|
@x4m Did not help unfortunately. |
I'm spamming json lib maintainers a bit. Actually, I've unsuccessfully tried all 3 libs, but maybe I'm missing something... |
Dunno maybe it would be useful to bog golang json maintainers too so they could merge finally changes list about streaming?.. |
Can we manually loop over the data? I've written some data export tools that can sucessfully export at least 5gb data without using extra memory. It uses this approach:
|
@jonaz we can but this routine is reused by many other DBs with different sentinel types. |
we will have to create repo wal-g/json :) We even can take Google's changelist, merge it and keep in our repo |
Does https://github.com/json-iterator/go work? Saw some encoder memory relates buffer issues in among their fixed issues. |
@jonaz wow, cool, thanks! Looks promising. @MannerMan I've pushed new commit to #740 , plz test if CI will be green |
@x4m Great! Running a backup with a new binary compiled from your
|
Nope, still crashed with 'out of memory' 😕 |
Hi guys, System specs: WALG_FILE_PREFIX is set to local disk (not SSD) We do not have so many files in PGDATA folder, in our case is cca 82 000 files, 52 databases, instance size is cca 330GB.
Also, there are no OOM killer logs in syslog. Br, |
@x4m I noticed your activity in json-iterator/go#488 - is there a branch with the patched library applied that I can compile and test with? |
@nh-nmurk I'm not sure your problem originates from WAL-G... do you have atop logs or something similar to identify process consuming a lot of memory? The stacktrace that you provided do not show JSON problems which @MannerMan have. @MannerMan to test proposed there patch you need to compile patched WAL-G from #740 with jsoniter proposed in that discussion. Meanwhile author of that patch is not maintainer of jsoniter... |
I have no idea why this patch is not even submitted as pull request. Patch itself looks OK. |
@x4m we cann't find any additional information in PostgreSQL nor syslog which could indicate that system is running out of memory. It can be seen a list of archived WAL logs and then ...out of memory. Since we had switched back to use test and cp in archive_command, we do not have any problems with memory. On other production system, we also use wal-g and do not have such problems. More from Postgresql logs...
or
this part of log is, presumably, just consequence of out of memory, not the cause...
Thank you for your time |
@x4m I think I was able to apply the json-iterator patch to your |
@x4m Hm, not sure whats going on but the backup still failed. Different error this time though;
I have run the backup twice and got this error both times. Not sure if it actually makes it as far as previously, or if it now fails at an earlier stage. |
@MannerMan can you paste whole output? The error seems to be missing only part of the stack trace? |
@jonaz Indeed, I was running a screen-within-a-screen which gobbled the output a bit. Updated the comment above with the full stack trace. Still an out of memory error as it turns out. |
JFYI we are working on this problem at #1103 too |
Fixed in #1101. |
WAL-G version: 0.2.15 release binary
PostgreSQL version: 9.5
Operating system/version: CentOS 7.8
Hi,
I'm evaluating WAL-G as a replacement for my companys current backup system WAL-E. We have a schema-level style of sharding our tenants, where each of our database servers hosts 4000 customer schemas, spread over 16 databases. This has worked great for scalability, but presents a challenge for many backup systems - since this layout results in a lot of
files in postgresql data directory.
Output of
ls -1RL /var/lib/pgsql/9.5/data/ | wc -l
;6602805
So above 6 million files, total size 77gb. Around 5 years ago we deployed WAL-E since it was one of the few backup systems that could handle so many files without a problem. However, since WAL-E is no longer maintained we're looking for alternatives. When testing WAL-G, I'm running out of memory when performing a full-backup. It seems to be some kind of go internal memory error, since the memory of the server is not fully utilized. See graphs;
Server specs:
Chunk of the error log;
Attatched the full log output when performing the backup, including the error as well;
walg_log.txt
Target datastore is a local minio S3 instance. Tried
WALG_UPLOAD_DISK_CONCURRENCY
set to1
and4
, same result. I see no kernel level OOM-killer logs in syslog, it appears to fail internally.The text was updated successfully, but these errors were encountered: