Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watch stream return "permission denied" if token expired #12385

Closed
Tracked by #14750
cfz opened this issue Oct 10, 2020 · 19 comments · Fixed by #12549 or #14322
Closed
Tracked by #14750

watch stream return "permission denied" if token expired #12385

cfz opened this issue Oct 10, 2020 · 19 comments · Fixed by #12549 or #14322

Comments

@cfz
Copy link
Contributor

cfz commented Oct 10, 2020

i ran into the similar problem as #8914

i'm using v3.4.13 server and v3.4.12 client.
i can reproduce the error using following steps:

on one terminal , start etcd server and enable auth and shroten the default token ttl:

etcd --log-level warn --logger zap --auth-token-ttl 10 & 
etcdctl user add root:root
etcdctl user grant-role root root
etcdctl auth enable

on the other termianal, run a the demo program:

cfz@cfz-desktop:~/ws/etcd-watch-demo$ cat main.go
package main

import (
        "context"
        "fmt"
        "sync"
        "time"

        "go.etcd.io/etcd/clientv3"
)

var wg sync.WaitGroup

func main() {
        cfg := clientv3.Config{
                Endpoints: []string{"http://127.0.0.1:2379"},
                Username:  "root",
                Password:  "root",
                LogConfig: nil,
        }

        cli, err := clientv3.New(cfg)
        if err != nil {
                panic(err)
        }

        watch(cli, 1)
        time.Sleep(1 * time.Second)
        watch(cli, 2)
        time.Sleep(11 * time.Second)
        watch(cli, 3)
        time.Sleep(1 * time.Second)
        watch(cli, 4)

        wg.Wait()
}

func watch(cli *clientv3.Client, num int) {
        wg.Add(1)

        go func() {
                defer wg.Done()
                for wr := range cli.Watch(context.Background(), "") {
                        if wr.Err() != nil {
                                fmt.Printf("%#v\n", wr.Err())
                        }
                }

                fmt.Printf("#%v watch existed\n", num)
        }()
}
cfz@cfz-desktop:~/ws/etcd-watch-demo$ go run main.go
&errors.errorString{s:"rpc error: code = PermissionDenied desc = etcdserver: permission denied"}
#3 watch exited
&errors.errorString{s:"rpc error: code = PermissionDenied desc = etcdserver: permission denied"}
#4 watch exited

as shown in the log , the 1st and 2nd watch is ok, while the 3rd and 4th report "permission denied"

meanwile, etcd server will log following:

{"level":"warn","ts":"2020-10-10T20:34:45.010+0800","caller":"auth/store.go:1301","msg":"invalid auth token","token":"JWourXlCoFPTxMHj.8"}
{"level":"warn","ts":"2020-10-10T20:34:46.010+0800","caller":"auth/store.go:1301","msg":"invalid auth token","token":"JWourXlCoFPTxMHj.8"}

i think the causes are:

  • the grpc stream's authentication token is provided at the time when the stream is initialized
  • the client is trying to reusing the same grpc stream(if avaliable) to setup new watch substream.
  • the token attached to the stream would be expired if no other requests to extend its ttl.
@cfc4n
Copy link
Contributor

cfc4n commented Oct 10, 2020

i think the causes are:

  • the grpc stream's authentication token is provided at the time when the stream is initialized
  • the client is trying to reusing the same grpc stream(if avaliable) to setup new watch substream.
  • the token attached to the stream would be expired if no other requests to extend its ttl.

Yes,you are right. and you can get more information about the token being deleted with command $ bin/etcd --log-level debug --logger zap --auth-token-ttl 10

{"level":"debug","ts":"2020-10-11T00:22:06.647+0800","caller":"auth/store.go:335","msg":"authenticated a user","user-name":"root","token":"xNgkSwPjKojuppwM.11"}
{"level":"debug","ts":"2020-10-11T00:22:06.647+0800","caller":"v3rpc/interceptor.go:181","msg":"request stats","start time":"2020-10-11T00:22:06.569+0800","time spent":"77.981547ms","remote":"127.0.0.1:63080","response type":"/etcdserverpb.Auth/Authenticate","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"info","ts":"2020-10-11T00:22:06.648+0800","caller":"grpclog/grpclog.go:45","msg":"transport: loopyWriter.run returning. connection error: desc = \"transport is closing\""}
{"level":"info","ts":"2020-10-11T00:22:17.950+0800","caller":"auth/simple_token.go:169","msg":"deleted a simple token","user-name":"root","token":"xNgkSwPjKojuppwM.11"}
{"level":"warn","ts":"2020-10-11T00:22:18.655+0800","caller":"auth/store.go:1301","msg":"invalid auth token","token":"xNgkSwPjKojuppwM.11"}
{"level":"warn","ts":"2020-10-11T00:22:19.658+0800","caller":"auth/store.go:1301","msg":"invalid auth token","token":"xNgkSwPjKojuppwM.11"}

So, I think that is correct. and you?

@cfz
Copy link
Contributor Author

cfz commented Oct 10, 2020

acutally, i enabled debug log during my investigation :). i saw the log at very beginning, but it takes me a while to understand why the stale token is still being referenced.

@cfz
Copy link
Contributor Author

cfz commented Oct 26, 2020

i noticed a related PR #12264, but i think this issure still exists on master branch.
wondering if it's possible to detect the "perssion denied" message from serveWatchClient() and force the client to re-create a watch stream?
@cfc4n

@cfz
Copy link
Contributor Author

cfz commented Oct 26, 2020

and for anyone has the same problem:
JWT token has the same problem. Using CommonName in certificate as username is a walk around

@mitake
Copy link
Contributor

mitake commented Oct 26, 2020

i noticed a related PR #12264, but i think this issure still exists on master branch.
@cfz thanks for checking with the master branch, let me check again.

bbiao added a commit to bbiao/etcd that referenced this issue Dec 14, 2020
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token
even if the request is a Authenticate request.

If the client has a invalid auth token, it will not able to update it's
token, since the Authenticate has a invalid auth token.

This fix clear the auth token when encounter an ErrInvalidAuthToken to
talk with old version etcd servers.

Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
bbiao added a commit to bbiao/etcd that referenced this issue Dec 14, 2020
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token
even if the request is a Authenticate request.

If the client has a invalid auth token, it will not able to update it's
token, since the Authenticate has a invalid auth token.

This fix clear the auth token when encounter an ErrInvalidAuthToken to
talk with old version etcd servers.

Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
bbiao added a commit to bbiao/etcd that referenced this issue Dec 14, 2020
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token
even if the request is an Authenticate request.

If the client has a invalid auth token, it will not able to update it's
token, since the Authenticate has a invalid auth token.

This fix clear the auth token when encounter an ErrInvalidAuthToken to
talk with old version etcd servers.

Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
bbiao added a commit to bbiao/etcd that referenced this issue Dec 27, 2020
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token
even if the request is an Authenticate request.
If the client has a invalid auth token, it will not able to update it's
token, since the Authenticate has a invalid auth token.
This fix clear the auth token when encounter an ErrInvalidAuthToken to
talk with old version etcd servers.

Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
agargi pushed a commit to agargi/etcd that referenced this issue Jan 23, 2021
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token
even if the request is an Authenticate request.
If the client has a invalid auth token, it will not able to update it's
token, since the Authenticate has a invalid auth token.
This fix clear the auth token when encounter an ErrInvalidAuthToken to
talk with old version etcd servers.

Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
sayap added a commit to sayap/etcd that referenced this issue Dec 31, 2021
This attempts to fix a special case of the problem described in etcd-io#12385,
where trying to do `clientv3.Watch` with an expired token would result
in `ErrGRPCPermissionDenied`, due to the failing authorization check in
`isWatchPermitted`. Furthermore, the client can't auto recover, since
`shouldRefreshToken` rightly returns false for the permission denied
error.

In this case, we would like to have a runbook to dynamically disable
auth, without causing any disruption. Doing so would immediately expire
all existing tokens, which would then cause the behavior described
above. This means existing watchers would still work for a period of
time after disabling auth, until they have to reconnect, e.g. due to a
rolling restart of server nodes.

This commit adds a client-side fix and a server-side fix, either of
which is sufficient to get the added test case to pass. Note that it is
an e2e test case instead of an integration one, as the reconnect only
happens if the server node is stopped via SIGINT or SIGTERM.

A generic fix for the problem described in etcd-io#12385 would be better, as
that shall also fix this special case. However, the fix would likely be
a lot more involved, as some untangling of authn/authz is required.
sayap added a commit to sayap/etcd that referenced this issue Dec 31, 2021
This attempts to fix a special case of the problem described in etcd-io#12385,
where trying to do `clientv3.Watch` with an expired token would result
in `ErrGRPCPermissionDenied`, due to the failing authorization check in
`isWatchPermitted`. Furthermore, the client can't auto recover, since
`shouldRefreshToken` rightly returns false for the permission denied
error.

In this case, we would like to have a runbook to dynamically disable
auth, without causing any disruption. Doing so would immediately expire
all existing tokens, which would then cause the behavior described
above. This means existing watchers would still work for a period of
time after disabling auth, until they have to reconnect, e.g. due to a
rolling restart of server nodes.

This commit adds a client-side fix and a server-side fix, either of
which is sufficient to get the added test case to pass. Note that it is
an e2e test case instead of an integration one, as the reconnect only
happens if the server node is stopped via SIGINT or SIGTERM.

A generic fix for the problem described in etcd-io#12385 would be better, as
that shall also fix this special case. However, the fix would likely be
a lot more involved, as some untangling of authn/authz is required.
@jwebb
Copy link

jwebb commented Jun 14, 2022

This ticket is closed, but I can still reproduce this exactly as described by OP (except import path is /v3 now) using both client and server at v3.5.4. Are the assumptions in the original report not valid?

I just want to watch a key indefinitely, but this seems currently not to work in conjuction with authentication tokens.

@sayap
Copy link
Contributor

sayap commented Jun 15, 2022

This bug should still be reproducible, due to the problem I described in #13577:

... trying to do clientv3.Watch with an expired token would result
in ErrGRPCPermissionDenied, due to the failing authorization check in
isWatchPermitted. Furthermore, the client can't auto recover, since
shouldRefreshToken rightly returns false for the permission denied
error.

The commit from that PR only fixed a special case of the problem, where all the tokens expired after disabling authentication. It doesn't fix the problem where a token expired naturally.

Last I checked, the root cause is that isWatchPermitted is wearing both authorization and authentication hats. The code has to be modified such that isWatchPermitted is only called with a legitimate token and thus only has to worry about authorization.

@mitake
Copy link
Contributor

mitake commented Jun 15, 2022

Hmm I see... let me reproduce the issue on my side.

@mitake
Copy link
Contributor

mitake commented Jul 3, 2022

Sorry for delayed update, I'll be able to check the issue sometime this week.

@mitake
Copy link
Contributor

mitake commented Jul 9, 2022

I could reproduce the bug on the latest main branch... Let me reopen the issue and handle it.

@mitake mitake reopened this Jul 9, 2022
@mitake
Copy link
Contributor

mitake commented Jul 31, 2022

This is the branch for the fix: https://github.com/mitake/etcd/tree/watch-auth-err I'll open a PR after finalizing it.

@kafuu-chino
Copy link
Contributor

The context on server side never change after stream created, so the token won't refresh, subsequent watches will fail after the token expires.

PerRPCCredentials only called once when the stream is created:
https://stackoverflow.com/questions/69551307/go-grpc-refresh-token-for-a-bidirectional-stream

  1. We only get the token once when we create the stream.
  2. Client is trying to reusing the same grpc stream(if avaliable) to setup new watch substream.
  3. The token cannot be refreshed on the server, then when the token expires, subsequent watches will fail.

I think the solutions for the future:

  1. Create new watch stream when the token expired.
  2. Do refresh token in user logic, not rely on grpc.

Temporary solutions:

  1. Create a new stream with a new ctxKey, like this:
    ctx = metadata.AppendToOutgoingContext(context.Background(), "key", time.Now().String())

  2. Set a very long TTL.

@mitake
Copy link
Contributor

mitake commented Aug 7, 2022

Thanks for commenting @kafuu-chino , actually I think Create new watch stream when the token expired. is a better way.

@mitake
Copy link
Contributor

mitake commented Aug 7, 2022

@jwebb @sayap @kafuu-chino I opened this PR: #14322 I checked the above test case can pass now. Could you cross check if you have time?

@mitake
Copy link
Contributor

mitake commented Aug 7, 2022

@kafuu-chino I'll also check your PR #14296 later, thanks for opening this!

@kafuu-chino
Copy link
Contributor

@mitake ok, I will check PR: #14322 later.

@serathius serathius mentioned this issue Nov 14, 2022
22 tasks
ahrtr added a commit to ahrtr/etcd that referenced this issue Dec 14, 2022
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@saddit
Copy link

saddit commented Feb 5, 2023

After I update etcd to v3.5.7, the err becomes to this:

rpc error: code = Unauthenticated desc = etcdserver: invalid auth token

It could be 100% reproduced after the client has not been used for a few minutes.

@lavacat
Copy link

lavacat commented Feb 5, 2023

@838239178 This is expected because retry behavior was reverted. I'll push an alternative way to handle this as discussed in #15058

tjungblu pushed a commit to tjungblu/etcd that referenced this issue Jul 26, 2023
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
tjungblu pushed a commit to tjungblu/etcd that referenced this issue Jul 26, 2023
In order to fix etcd-io#12385,
PR etcd-io#14322 introduced a change
in which the client side may retry based on the error message
returned from server side.

This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in kubernetes/kubernetes#114403

Note: The issue etcd-io#12385 only
happens when auth is enabled, and client side reuse the same client
to watch.

So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
  The client just needs to create a new client each time before watching.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr
Copy link
Member

ahrtr commented Mar 1, 2024

Refer to #17384 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
9 participants