Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGBUS on startup in etcd-3.5.0 after filesystem rollback #13406

Closed
bencord0 opened this issue Oct 8, 2021 · 13 comments · Fixed by etcd-io/bbolt#294
Closed

SIGBUS on startup in etcd-3.5.0 after filesystem rollback #13406

bencord0 opened this issue Oct 8, 2021 · 13 comments · Fixed by etcd-io/bbolt#294
Labels

Comments

@bencord0
Copy link

bencord0 commented Oct 8, 2021

Hi, I seem to have found a reproducible bug in etcd after restarting from a zfs snapshot rollback.

This occurred while I was testing a k0s based kubernetes cluster, and since there isn't much in the database, I've provided a zipfile with the data-dir if you need to debug.

etcd-crash-on-startup.zip

Here's a stack trace of the crash.

unexpected fault address 0x7f31a61db000                                                                              
fatal error: fault                                                                                                   
[signal SIGBUS: bus error code=0x2 addr=0x7f31a61db000 pc=0x9dd562]                                                                                                                                                                         
                                                          
goroutine 157 [running]:                                                                                             
runtime.throw(0x1202783, 0x5)                                                                                        
        /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc0000c1ea0 sp=0xc0000c1e70 pc=0x4385d2                                                                                                                                           
runtime.sigpanic()                                                                                                   
        /usr/local/go/src/runtime/signal_unix.go:731 +0x2c8 fp=0xc0000c1ed8 sp=0xc0000c1ea0 pc=0x44fe48                                                                                                                                     
go.etcd.io/bbolt.(*Tx).checkBucket.func1(0x7f31a61db000, 0x2)                                                        
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:465 +0x62 fp=0xc0000c1fa0 sp=0xc0000c1ed8 pc=0x9dd562                                                                                                                                     
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000b01c0, 0x674, 0x2, 0xc0000c20c8)                                                                                                                                                                  
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:625 +0x89 fp=0xc0000c1fe8 sp=0xc0000c1fa0 pc=0x9dbde9                                                                                                                                     
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000b01c0, 0xa5, 0x1, 0xc0000c20c8)                                                                                                                                                                   
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:631 +0xe5 fp=0xc0000c2030 sp=0xc0000c1fe8 pc=0x9dbe45                                                                                                                                     
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000b01c0, 0x1cd, 0x0, 0xc0000c20c8)                                                                                                                                                                  
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:631 +0xe5 fp=0xc0000c2078 sp=0xc0000c2030 pc=0x9dbe45                                                                                                                                     
go.etcd.io/bbolt.(*Tx).checkBucket(0xc0000b01c0, 0xc000663180, 0xc0000c2370, 0xc0000c2340, 0xc0000424e0)                                                                                                                                    
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:464 +0xd4 fp=0xc0000c2108 sp=0xc0000c2078 pc=0x9db514                                                                                                                                     
go.etcd.io/bbolt.(*Tx).checkBucket.func2(0x7f31a5be81b9, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                 
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:489 +0xc5 fp=0xc0000c2160 sp=0xc0000c2108 pc=0x9ddaa5                                                                                                                                     
go.etcd.io/bbolt.(*Bucket).ForEach(0xc0000b01d8, 0xc0000c21f0, 0x0, 0xc0000c2220)                                                                                                                                                           
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/bucket.go:390 +0x103 fp=0xc0000c21d0 sp=0xc0000c2160 pc=0x9c86e3                                                                                                                                
go.etcd.io/bbolt.(*Tx).checkBucket(0xc0000b01c0, 0xc0000b01d8, 0xc0000c2370, 0xc0000c2340, 0xc0000424e0)                                                                                                                                    
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:487 +0x146 fp=0xc0000c2260 sp=0xc0000c21d0 pc=0x9db586                                                                                                                                    
go.etcd.io/bbolt.(*DB).freepages(0xc0000f8480, 0x0, 0x0, 0x0)                                                        
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1059 +0x210 fp=0xc0000c2458 sp=0xc0000c2260 pc=0x9d01f0                                                                                                                                   
go.etcd.io/bbolt.(*DB).loadFreelist.func1()                                                                           
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:320 +0x114 fp=0xc0000c2490 sp=0xc0000c2458 pc=0x9dcc74                                                                                                                                    
sync.(*Once).doSlow(0xc0000f85f0, 0xc0000c24f0)                                                                      
        /usr/local/go/src/sync/once.go:68 +0xec fp=0xc0000c24e0 sp=0xc0000c2490 pc=0x47d94c                                                                                                                                                 
sync.(*Once).Do(...)                                                                                                 
        /usr/local/go/src/sync/once.go:59                                                                            
go.etcd.io/bbolt.(*DB).loadFreelist(0xc0000f8480)                                                                    
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:316 +0x6a fp=0xc0000c2510 sp=0xc0000c24e0 pc=0x9cce0a                                                                                                                                     
go.etcd.io/bbolt.Open(0xc0004b11a0, 0x13, 0x180, 0xc0000c37c0, 0x40dbbb, 0xc0004b11a0, 0x18)                                                                                                                                                
        /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:293 +0x3af fp=0xc0000c35d0 sp=0xc0000c2510 pc=0x9cc8af                                                                                                                                    
go.etcd.io/etcd/server/v3/mvcc/backend.newBackend(0xc0004b11a0, 0x13, 0x5f5e100, 0x2710, 0x1204c4f, 0x7, 0x280000000, 0xc000592640, 0x0, 0x137fe00, ...)
        /etcd/server/mvcc/backend/backend.go:180 +0x145 fp=0xc0000c3828 sp=0xc0000c35d0 pc=0xb76b85                                                                                                                                         
go.etcd.io/etcd/server/v3/mvcc/backend.New(...)                                                                      
        /etcd/server/mvcc/backend/backend.go:156                                                                     
go.etcd.io/etcd/server/v3/etcdserver.newBackend(0x7fff6ea0b794, 0x3, 0x0, 0x0, 0x0, 0x0, 0xc0000ab7a0, 0x1, 0x1, 0xc0000ab4d0, ...)
        /etcd/server/etcdserver/backend.go:55 +0x1f8 fp=0xc0000c39c0 sp=0xc0000c3828 pc=0xcfe9b8                                                                                                                                            
go.etcd.io/etcd/server/v3/etcdserver.openBackend.func1(0xc000042480, 0xc000150600, 0x137fe00, 0xc0000abd40)                                                                                                                                 
        /etcd/server/etcdserver/backend.go:76 +0x98 fp=0xc0000c3fc0 sp=0xc0000c39c0 pc=0xd41d78                                                                                                                                             
runtime.goexit()                                                                                                     
        /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc0000c3fc8 sp=0xc0000c3fc0 pc=0x4722a1                                                                                                                                         
created by go.etcd.io/etcd/server/v3/etcdserver.openBackend                                                           
        /etcd/server/etcdserver/backend.go:75 +0x12b                                                                  

And you should be able to reproduce this on any linux-amd64 host by running the start.sh in the attached zip file.


Expected Result

etcd's storage is durable and is able to startup and recover the database after a crash event (or in my case, a zfs rollback).

Actual Result

Runtime panic in bbolt when opening the backend.
I've also tried testing the snapshot with etcdutl snapshot status ./etcd/member/snap/db, and the same crash occurs.

Steps to reproduce

You can reproduce the crash by extracting the attached zip file and running etcd --data-dir=./etcd in that directory.
This is the contents of /var/lib/k0s/ from my test host, with the etcd binary (which I think matches the released v3.5.0 distribution) and the etcd data dir.

This was created from a running a single-node k0s cluster controller. On my system, I take periodic snapshots of the zfs filesystem with the zfs-auto-snapshot tool. This filesystem snapshot was taken when the cluster was idle, and I would have expected all durable writes to have been flushed to disk.

Given that etcd's storage design includes a WAL, I would have expected that the service would be able to recover from a perceived crash (in this case, a point-in-time filesystem rollback, and not a bug in etcd), with an acceptable bit of data loss for any writes after the snapshot was taken.

This was a single-node cluster, and was not participating in distributed consensus.

@ahrtr
Copy link
Member

ahrtr commented Oct 10, 2021

It seems that the data file etcd/member/snap/db is corrupted.

@ptabor
Copy link
Contributor

ptabor commented Oct 10, 2021

Low level bolt cmd also does not manages to open that file:

$ gobin -run go.etcd.io/bbolt/cmd/bbolt buckets ~/Downloads/etcd-crash-on-startup/etcd/member/snap/db
unexpected fault address 0x291b7000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x291b7000 pc=0x10f54a2]

goroutine 1 [running]:
runtime.throw(0x1142180, 0x5)
        /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc0000e2498 sp=0xc0000e2468 pc=0x1035712
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:741 +0x276 fp=0xc0000e24d0 sp=0xc0000e2498 pc=0x104b376
go.etcd.io/bbolt.(*Tx).checkBucket.func1(0x291b7000, 0x2)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:465 +0x62 fp=0xc0000e2598 sp=0xc0000e24d0 pc=0x10f54a2
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000d8000, 0x674, 0x2, 0xc0000e26c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:625 +0x89 fp=0xc0000e25e0 sp=0xc0000e2598 pc=0x10f3a69
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000d8000, 0xa5, 0x1, 0xc0000e26c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:631 +0xe5 fp=0xc0000e2628 sp=0xc0000e25e0 pc=0x10f3ac5
go.etcd.io/bbolt.(*Tx).forEachPage(0xc0000d8000, 0x1cd, 0x0, 0xc0000e26c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:631 +0xe5 fp=0xc0000e2670 sp=0xc0000e2628 pc=0x10f3ac5
go.etcd.io/bbolt.(*Tx).checkBucket(0xc0000d8000, 0xc0000be1c0, 0xc0000e2968, 0xc0000e2938, 0xc00008c0c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:464 +0xd4 fp=0xc0000e2700 sp=0xc0000e2670 pc=0x10f3174
go.etcd.io/bbolt.(*Tx).checkBucket.func2(0x28bc41b9, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:489 +0xc5 fp=0xc0000e2758 sp=0xc0000e2700 pc=0x10f59e5
go.etcd.io/bbolt.(*Bucket).ForEach(0xc0000d8018, 0xc0000e27e8, 0x0, 0xc0000b0818)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/bucket.go:390 +0x103 fp=0xc0000e27c8 sp=0xc0000e2758 pc=0x10e22c3
go.etcd.io/bbolt.(*Tx).checkBucket(0xc0000d8000, 0xc0000d8018, 0xc0000b0968, 0xc0000b0938, 0xc00008c0c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:487 +0x146 fp=0xc0000e2858 sp=0xc0000e27c8 pc=0x10f31e6
go.etcd.io/bbolt.(*DB).freepages(0xc0000aa240, 0x0, 0x0, 0x0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1059 +0x210 fp=0xc0000e2a50 sp=0xc0000e2858 pc=0x10e8cd0
go.etcd.io/bbolt.(*DB).loadFreelist.func1()
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:320 +0x114 fp=0xc0000e2a88 sp=0xc0000e2a50 pc=0x10f4df4
sync.(*Once).doSlow(0xc0000aa3b0, 0xc0000b0ae8)
        /usr/local/go/src/sync/once.go:68 +0xec fp=0xc0000e2ad8 sp=0xc0000e2a88 pc=0x1071b8c
sync.(*Once).Do(...)
        /usr/local/go/src/sync/once.go:59
go.etcd.io/bbolt.(*DB).loadFreelist(0xc0000aa240)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:316 +0x6a fp=0xc0000e2b08 sp=0xc0000e2ad8 pc=0x10e642a
go.etcd.io/bbolt.Open(0x7ffeefbffa6e, 0x3f, 0x1b6, 0x1219580, 0x0, 0x0, 0xc0000b6018)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:293 +0x3af fp=0xc0000e3bc8 sp=0xc0000e2b08 pc=0x10e5ecf
main.(*BucketsCommand).Run(0xc0000e3e78, 0xc0000981a0, 0x1, 0x1, 0x0, 0x0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:1104 +0x1cf fp=0xc0000e3c70 sp=0xc0000e3bc8 pc=0x10fd3ef
main.(*Main).Run(0xc0000b1f48, 0xc000098190, 0x2, 0x2, 0x60, 0xc00008c000)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:110 +0x87f fp=0xc0000e3ee8 sp=0xc0000e3c70 pc=0x10f781f
main.main()
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:70 +0xe5 fp=0xc0000e3f88 sp=0xc0000e3ee8 pc=0x10f6e65
runtime.main()
        /usr/local/go/src/runtime/proc.go:225 +0x256 fp=0xc0000e3fe0 sp=0xc0000e3f88 pc=0x1037f36
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc0000e3fe8 sp=0xc0000e3fe0 pc=0x1068e21

goroutine 18 [chan receive]:
go.etcd.io/bbolt.(*DB).freepages.func2(0xc00008c0c0)
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1055 +0x45
created by go.etcd.io/bbolt.(*DB).freepages
        /Users/ptab/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1054 +0x1cd

In general bbolt would be a better component for this type of issue.

@ahrtr
Copy link
Member

ahrtr commented Oct 12, 2021

I am pretty sure the db file etcd/member/snap/db is corrupted. The file size is 2527232 bytes, and the pageSize in meta is 4096, so there are 617 (2527232/4096) pages in total. But the pgid value in meta page is 1706, which is out of range.

There are also some invalid entries in the B-tree internal nodes as well.

I provided a solution to fix the db file, please note that there may be some data loss, and I am not responsible for the data loss!

@ahrtr
Copy link
Member

ahrtr commented Oct 12, 2021

After applying the fix, the etcd is running well, please see the screen shot,
Screen Shot 2021-10-12 at 4 28 58 PM

@ahrtr
Copy link
Member

ahrtr commented Oct 15, 2021

@bencord0 Please let me know whether the solution works for you.

Can I upload the original zip file etcd-crash-on-startup.zip into my personal public repo etcd-issues, so that others can learn from the issue?

@bencord0
Copy link
Author

Yes!

I've managed to test this this morning, and it does fix the issue for me.
The dataloss in this case is fine, by the looks of things, this otherwise appears as a rollback.

Can I upload the original zip file etcd-crash-on-startup.zip

Yes, this is fine. As I outlined in the original report, this is from a test cluster and I managed to isolate the problem just to etcd, and not the entire k0s setup.


Out of curiosity, how did you find the broken pages, and how did you calculate the checksums?

If I see a similar issue in the future, is it possible to do the recovery myself? could this solution be made into a generic fsck-like tool?

Given a choice, I think that it could be acceptable (in some cases, and definitely an administrator's choice) to be able to recover etcd with dataloss rather than to crash on startup.

@ahrtr
Copy link
Member

ahrtr commented Oct 16, 2021

Yes, this is fine.

Thanks. I only uploaded the file db to my personal public repo etcd-issues.

Out of curiosity, how did you find the broken pages

The db file is actually updated using bbolt, and the data are organized as a B tree. So you just need to analyze the data using the same logic as what bbolt does.

and how did you calculate the checksums?

Please see db.go#L1221

If I see a similar issue in the future, is it possible to do the recovery myself? could this solution be made into a generic fsck-like tool?

Yes, it's a good idea. I may submit a PR to bbolt. What do you think? @ptabor

@ptabor
Copy link
Contributor

ptabor commented Oct 17, 2021

fsck sound like great idea. But I think we should start from making:

gobin -run go.etcd.io/bbolt/cmd/bbolt check ~/Downloads/etcd-crash-on-startup/etcd/member/snap/db 

not crashing in such situation, and providing more data what's broken/inconsistent.
Some approach to auto-fixing should be a followup step... likely driven by the 'check' results.

@ahrtr
Copy link
Member

ahrtr commented Oct 17, 2021

not crashing in such situation, and providing more data what's broken/inconsistent.
Some approach to auto-fixing should be a followup step... likely driven by the 'check' results.

@ptabor Thanks for the comment. I will submit a PR later.

@bencord0 I wonder how was the corrupted db file generated? Could you provide more detailed steps or info?

@ahrtr
Copy link
Member

ahrtr commented Oct 18, 2021

@ptabor I just submitted a PR bbolt/pull/294 for bolt. It's just the very first step to fix this issue. Once this one gets merged, there will be a couple of followup PRs.

@bencord0
Copy link
Author

@ahrtr How familiar are you with zfs snapshots?

I created this from a live snapshot of a running system, etcd was installed as part of the default k0s configuration (single host controller) in a systemd-nspawn container running Gentoo (although, all packages here were not installed via the package manager, and from what I can tell, the etcd binary matches the checksums of the tagged 3.5 release).

  • Create a new empty ZFS dataset, and mount it as the rootfs for systemd containers.
zfs create -o mountpoint=/var/lib/machines/k0s zpool/machines/k0s
  • Provision an operating system (I used gentoo, but I don't think it matters too much)
root=/var/lib/machines/k0s
emerge --root=$root --sysroot=$root @system

Someone else asked about my process for doing systemd containers recently, so I wrote it up if you're interested.

In this case, I installed the controller node in an nspawn container, and the worker node in a VM, since the kubelet currently doesn't run when nested inside an existing container.

  • Add simple test workloads, and wait for the pods to settle.
kubectl create deployment web --image=nginx
  • Take a zfs snapshot (anything after here doesn't really matter). The system was live at this point, although idle, etcd was still able to accept writes from kube-apiserver.
zfs snap zpool/machines/k0s@snapshot1
  • Did some other experiments, deployed more containers.
  • Shutdown the cluster.
  • Rollback the entire container filesystem
zfs rollback zpool/machines/k0s@snapshot1

At this point, etcd is not running, the filesystem has been cleanly rolled back to a consistent state (i.e. all writes before zfs snap have been flushed). The attached zip file contains the contents of the filesystem as-is in the snapshot.

  • Starting etcd, either via k0s or directly (using start.sh) crashes with the originally posted stacktrace.

@ahrtr
Copy link
Member

ahrtr commented Oct 19, 2021

@bencord0 Thanks for the detailed info.

I am not familiar with zfs.

I suspect it has something to do with the live snapshotting, which means you create the zfs snapshot when the etcd is still serving requests.

But on the other hand, snapshot is a generic technology. CSI spec has clear definition on snapshot API, and some CSI Drivers have already supported the snapshot functionality. I suppose it (snapshot) isn't the root cause, nor the zfs snapshot, although zfs snapshot may be different from CSI snapshot.

How often do you run into this issue? Can this issue be reproduced easily when creating the zfs snapshot when the etcd is running?

I would suggest to backup the etcd using etcdctl, such as etcdctl snapshot save backup.db.

I double checked the meta pages, both of them are not corrupted. But the pgid in the meta page is obvious out of range.

@ptabor Any thoughts on the possible reasons on the corrupted db file?

@stale
Copy link

stale bot commented Jan 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants