Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc: push down RangeOptions.limit argv into index tree to reduce memory overhead #11990

Merged
merged 2 commits into from
Jun 9, 2020

Conversation

tangcong
Copy link
Contributor

@tangcong tangcong commented Jun 8, 2020

if there are too many keys(key,rangeEnd), even if we set RangeOptions.limit to 1,range request is very expensive and it will result in oom. this pr will push down RangeOptions.limit argv into index tree to reduce memory overhead. @gyuho
image

@codecov-commenter
Copy link

Codecov Report

Merging #11990 into master will decrease coverage by 0.42%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11990      +/-   ##
==========================================
- Coverage   65.96%   65.53%   -0.43%     
==========================================
  Files         403      403              
  Lines       37306    37310       +4     
==========================================
- Hits        24608    24452     -156     
- Misses      11182    11329     +147     
- Partials     1516     1529      +13     
Impacted Files Coverage Δ
mvcc/index.go 93.91% <100.00%> (+0.16%) ⬆️
mvcc/kvstore_txn.go 75.00% <100.00%> (ø)
client/client.go 43.13% <0.00%> (-30.40%) ⬇️
auth/jwt.go 51.68% <0.00%> (-20.23%) ⬇️
pkg/transport/timeout_conn.go 60.00% <0.00%> (-20.00%) ⬇️
proxy/httpproxy/director.go 52.43% <0.00%> (-15.86%) ⬇️
proxy/grpcproxy/register.go 72.50% <0.00%> (-10.00%) ⬇️
clientv3/leasing/util.go 91.66% <0.00%> (-6.67%) ⬇️
clientv3/concurrency/mutex.go 61.64% <0.00%> (-5.48%) ⬇️
etcdserver/api/v3rpc/lease.go 77.21% <0.00%> (-5.07%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49f91d6...e939961. Read the comment docs.

@tangcong tangcong marked this pull request as draft June 8, 2020 11:45
@tangcong tangcong marked this pull request as ready for review June 8, 2020 17:12
@tangcong
Copy link
Contributor Author

tangcong commented Jun 9, 2020

before optimization:

time etcdctl get /config --prefix --limit 2
/config
/config/test1

real	0m1.036s
user	0m0.008s
sys	0m0.004s

after optimization:

time etcdctl get /config  --prefix --limit 2
/config
/config/test1

real	0m0.009s
user	0m0.002s
sys	0m0.009s

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice!

@gyuho gyuho merged commit 1c1029e into etcd-io:master Jun 9, 2020
@xrmzju
Copy link

xrmzju commented Sep 17, 2020

limit ==0 means fetch everything, but this pr merged seems will change the behavior

@tangcong
Copy link
Contributor Author

limit ==0 means fetch everything, but this pr merged seems will change the behavior

can you give me a failed case? if limit == 0, it seems will also fetch everything. thanks.

@xrmzju
Copy link

xrmzju commented Sep 17, 2020

limit ==0 means fetch everything, but this pr merged seems will change the behavior

can you give me a failed case? if limit == 0, it seems will also fetch everything. thanks.

sry i miss some logic here..

@serathius
Copy link
Member

@tangcong This PR broke Kubernetes api chunking feature. Kubernetes uses Range request Count returned by CountRevisions function for pagination. This PR changed CountRevisions to return after reaching limit.

I don't this behavior is correct as it removes value provided by having a Count field as it will always equal number of keys returned. Count is only beneficial if it returns number of objects with Limit.

@ptabor
Copy link
Contributor

ptabor commented May 31, 2021

Thank you for nailing this @serathius.

Definitely its a wire-format breaking change and in scope of 3.x branch we cannot perform it. I think we shouldn't push down the limit if its count-only query (and I hope that count-only queries are not accumulating the memory).

I see value of counting with limit - e.g.(for some use-cases like checking whether there is more than N objects,
but I don't think they are critical and that they cannot wait till 4.x.

Potentially etcd/clientv3 could print warning if both: count & limit are set if we imagine we could change the behavior in v4.

@gyuho
Copy link
Contributor

gyuho commented May 31, 2021

PR broke Kubernetes api chunking feature

I think we shouldn't push down the limit if its count-only query

Agree with @ptabor. CountRevisions is only used for range count query, we should revert such semantic changes.

@tangcong Can you share more about your use case for count revision with limit?

@tangcong
Copy link
Contributor Author

There are two main scenarios:

  1. gRPC Proxy will query with limit, we find that the overhead is relatively large when there are more keys
  2. some non-K8S businesses will also specify limit queries through etcdctl and API when there are more keys.

unfortunately, it broke Kubernetes api chunking feature. i agree with @ptabor, etcd 3.5 coming soon,it is not critical, i hope we can optimize in a future version. thanks @serathius @ptabor @gyuho

@wilsonwang371
Copy link
Contributor

Without this change, if we list all the pods with a limit specified, it will be a nightmare when we are going through millions of pod objects.

From what I observed in issue #13068, the write performance can get reduced to only 25% of the normal case.

@wilsonwang371
Copy link
Contributor

Can we just add code to deal with the limit == 0 case? Will this work? @tangcong @ptabor @gyuho

@wilsonwang371
Copy link
Contributor

Can we just add code to deal with the limit == 0 case? Will this work? @tangcong @ptabor @gyuho

I saw the original issue: #13056 (comment)

It seems to be very tricky...

@wilsonwang371
Copy link
Contributor

cc @Jeffwan

@serathius
Copy link
Member

Yep, K8s directly depends on this behavior to provide chunking in API.
Original proposal kubernetes/community#3710

@tangcong
Copy link
Contributor Author

tangcong commented Jun 3, 2021

image

It seems that this PR only affects the RemainingItemCount feature,it has always been an Alpha feature.

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L771-L775

@JohnRusk
Copy link

JohnRusk commented Jan 10, 2023

@tangcong what was the final outcome about the effect on the RemainingItemCount feature in Kubernetes? I'm asking because this page shows that it is currently in Beta (not Alpha as you said above) and that it is enabled by default. https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants