Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gigabytes of ram usage with Cassandra or Scylla with lz4/snappy enabled. #1617

Open
ashtonian opened this issue Apr 11, 2022 · 4 comments
Open

Comments

@ashtonian
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

Cassandra: 4.0.3
Scylla: 4.6.1

What version of Gocql are you using?

v1.0.0

What version of Go are you using?

1.18

What did you do?

Enable snappy or lz4 compression.

What did you expect to see?

Similar ram profile as with compression disabled.

What did you see instead?

Massive uptick in ram usage.


Have an api call it opens ~20k concurrent requests to scylla/cassandra cluster, and fetches a total of ~50mb of data from the cluster.

Here is a recording of the usage stats via statsviz for each. I attempted to use pprof but it only list ~200mb of usage and its all from the json serializer which is expected. The other interesting thing is it seems to take the gc a while to actually collect, not sure if thats just go doing its thing or if there is something holding it.

goto pprof ./mem_profiles/heap.30.out
Type: inuse_space
Time: Apr 11, 2022 at 5:42pm (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 223.81MB, 96.94% of 230.88MB total
Dropped 71 nodes (cum <= 1.15MB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
  119.95MB 51.96% 51.96%   119.95MB 51.96%  github.com/goccy/go-json/internal/decoder.(*mapDecoder).mapassign
   87.94MB 38.09% 90.04%   210.89MB 91.34%  github.com/goccy/go-json.unmarshal
    5.51MB  2.38% 92.43%     5.51MB  2.38%  runtime.allocm
    4.47MB  1.94% 94.37%     4.47MB  1.94%  reflect.unsafe_NewArray
    2.50MB  1.08% 95.45%     2.50MB  1.08%  reflect.makemap
       2MB  0.87% 96.31%        2MB  0.87%  github.com/jackc/pgtype.scanPlanString.Scan
    1.44MB  0.62% 96.94%     1.44MB  0.62%  github.com/goccy/go-json/internal/decoder.init.0
         0     0% 96.94%     2.51MB  1.09%  database/sql.(*Rows).Next
         0     0% 96.94%     2.51MB  1.09%  database/sql.(*Rows).Next.func1
         0     0% 96.94%   210.89MB 91.34%  database/sql.(*Rows).Scan
@ashtonian
Copy link
Author

After investigating it some more, and getting some more advice I've come to the conclusion this seems to be a byproduct of the unbound goroutines and the buffers being allocated for each payload and gc lag. Not sure if that can be trimmed in the implementation or if the buffers can be reused somehow.

@martin-sucha
Copy link
Contributor

It seems that it is not straightforward to introduce sync.Pool for reusing the buffers. We'd need to change the Compressor interface. Do snappy and lz4 compression libraries allow reusing the buffers if we changed the interface?

@ashtonian
Copy link
Author

Sorry, could you elaborate a bit more ? I'm just now learning of sync.Pool, is the nuance/difficulty here something like - the code is attempting to recycle slices, which are dynamic sized and have to deal with the differing backing array sizes and that the pool is size unaware or more something along the lines that slices are immutable and worried about it being modified in a weird way?

I found some interesting resources around this topic.

golang/go #23199 sync: Pool example suggests incorrect usage
GothamGo 2019 – "Slice Recycling Performance and Pitfalls" by David Golden
net/http bucketed []byte pool example

@martin-sucha
Copy link
Contributor

What I meant is that gocql's Compressor interface

https://github.com/gocql/gocql/blob/0eacd31836251d409774cd626b5c7e8d61b323db/compressor.go#L7-L11

has methods that return a new buffer instead of writing to a provided buffer. Compressor interface implementation creates a new buffer and gocql knows when the buffer is no longer used. If we wanted to use sync.Pool (or other kind of free list) to recycle the buffers, either Compressor interface would need to know when the buffer is no longer used or the buffer would have to be allocated by gocql - so that the whole lifecycle of the buffer is in one place.

So we'd need maybe something like

type Compressor interface {
	Name() string
	EncodeAppend(buf, input []byte) ([]byte, error)
	DecodeAppend(buf, input []byte) ([]byte, error)
}

or some interface resembling io.Reader or io.Writer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants