warning: fileutil: file already locked #15360

junhwong · 2023-02-26T15:59:02Z

What happened?

got a warning level log :

fileutil/purge.go:72 failed to lock file 
{
"path":"/wal/000000000000000c-0000000000018568.wal",
"error":"fileutil: file already locked"
}

What did you expect to happen?

I don't know the effect of this warning

How can we reproduce it (as minimally and precisely as possible)?

I don't know how it came about

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
v3.5.7

Etcd configuration (command line flags or environment variables)

Minimum configuration

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

Elbehery · 2023-02-26T16:32:04Z

I have faced this issue, May I ask what is the root cause ?

cc @tjungblu @hasbro17 @dusk125

jmhbnz · 2023-02-26T18:54:00Z

Here is the link to the file responsible for that error: https://github.com/etcd-io/etcd/blob/main/client/pkg/fileutil/purge.go#L72

There was a discussion on this error last month: #15100, including some parameters that might help with preventing this error.

Can you please confirm which filesystem are you using?

tjungblu · 2023-02-27T09:10:35Z

I think in our case it was a customer running etcd over NFS, so it's not entirely surprising they would get an error like that.

Elbehery · 2023-02-28T10:58:24Z

Hello ✋🏽

I have a customer who faced this issue as well.

The error is raised from here, and it wraps the error from here

However, there was no other process but etcd has opened the path /var/lib/etcd

[root@master-1 lib]# fuser -v /var/lib/etcd
[root@master-1 lib]# fuser -v /var/lib/etcd/member/wal/
                     USER        PID ACCESS COMMAND
/var/lib/etcd/member/wal:
                     root       3827 f.... etcd
[root@master-2 /]# fuser -v /var/lib/etcd
[root@master-2 /]# fuser -v /var/lib/etcd/member/wal/
                     USER        PID ACCESS COMMAND
/var/lib/etcd/member/wal:
                     root       4538 f.... etcd
[root@master-3 /]# fuser -v /var/lib/etcd
[root@master-3 /]# fuser -v /var/lib/etcd/member/wal
                     USER        PID ACCESS COMMAND
/var/lib/etcd/member/wal:
                     root       3494 f.... etcd

Any ideas about the cause of this ?

cc @ahrtr @spzala

cenkalti · 2023-03-03T23:13:02Z

I can reproduce this with a fresh db when I run put workload on it. No NFS involved.

Steps:
Run etcd.

etcd Version: 3.5.7
Git SHA: 215b53cf3
Go Version: go1.19
Go OS/Arch: linux/amd64

Run benchmark:

benchmark --target-leader --conns=10 --clients=10 \
    put --key-size=8 --key-space-size=1000000000 --total=100000 --val-size=20000

Warning message observed in log:

{"level":"info","ts":"2023-03-03T18:08:57.548-0500","caller":"wal/wal.go:785","msg":"created a new WAL segment","path":"default.etcd/member/wal/0000000000000009-0000000000007061.wal"}
{"level":"info","ts":"2023-03-03T18:08:59.393-0500","caller":"wal/wal.go:785","msg":"created a new WAL segment","path":"default.etcd/member/wal/000000000000000a-0000000000007cdd.wal"}
{"level":"warn","ts":"2023-03-03T18:08:59.911-0500","caller":"fileutil/purge.go:72","msg":"failed to lock file","path":"default.etcd/member/wal/0000000000000000-0000000000000000.wal","error":"fileutil: file already locked"}
{"level":"info","ts":"2023-03-03T18:09:01.397-0500","caller":"wal/wal.go:785","msg":"created a new WAL segment","path":"default.etcd/member/wal/000000000000000b-0000000000008958.wal"}
{"level":"info","ts":"2023-03-03T18:09:03.661-0500","caller":"wal/wal.go:785","msg":"created a new WAL segment","path":"default.etcd/member/wal/000000000000000c-00000000000095d0.wal"}

jmhbnz · 2023-03-03T23:23:31Z

Thanks for the repro @cenkalti - confirming I can reproduce also following those steps, just on a standard nvme ssd with standard ext4 filesystem so we can confirm this isn't just nfs related.

ahrtr · 2023-03-06T06:46:15Z

It's expected behavior. The wal files will be locked until a snapshot (with higher index) is created. Only 5 WAL files are retained by default, and the snapshot-count is 100K by default. When the sixth WAL file is created, it tries to remove the oldeset WAL file, but it isn't released yet because the snapshot isn't created yet.

But we can lower the default value for snapshotCount, (e.g set to 10K by default) FYI.

etcd/server/etcdserver/server.go

Line 77 in 7a98ab3

DefaultSnapshotCount = 100000

I think majority maintainers have already reached an agreement on this, so please feel free to deliver a PR for this.

The related discussion:

jmhbnz · 2023-03-06T07:13:09Z

Thanks for the behavior clarification @ahrtr, was good context to read through the previous discussions. I've raised a tentative pull request to see if there is now consensus to reduce back to 10k 🙏🏻

junhwong added the type/bug label Feb 26, 2023

ahrtr removed the type/bug label Mar 6, 2023

jmhbnz mentioned this issue Mar 6, 2023

Return default snapshot count to 10,000 #15408

Merged

serathius closed this as completed in #15408 Mar 6, 2023

jmhbnz mentioned this issue Mar 7, 2023

Update --snapshot-count to reflect new reduced default for 3.6+ and include new tls min/max etcd-io/website#650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warning: fileutil: file already locked #15360

warning: fileutil: file already locked #15360

junhwong commented Feb 26, 2023

Elbehery commented Feb 26, 2023

jmhbnz commented Feb 26, 2023

tjungblu commented Feb 27, 2023

Elbehery commented Feb 28, 2023

cenkalti commented Mar 3, 2023 •

edited

Loading

jmhbnz commented Mar 3, 2023

ahrtr commented Mar 6, 2023

jmhbnz commented Mar 6, 2023

warning: fileutil: file already locked #15360

warning: fileutil: file already locked #15360

Comments

junhwong commented Feb 26, 2023

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

Elbehery commented Feb 26, 2023

jmhbnz commented Feb 26, 2023

tjungblu commented Feb 27, 2023

Elbehery commented Feb 28, 2023

cenkalti commented Mar 3, 2023 • edited Loading

jmhbnz commented Mar 3, 2023

ahrtr commented Mar 6, 2023

jmhbnz commented Mar 6, 2023

cenkalti commented Mar 3, 2023 •

edited

Loading