Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover log from data corrupt #436

Merged
merged 2 commits into from
Feb 21, 2024
Merged

Conversation

lhsoft
Copy link

@lhsoft lhsoft commented Feb 20, 2024

在物理机宕机的情况下,由于page cache丢失,导致in progress日志部分数据损坏,无法启动
需要提供能力,可以truncate掉损坏的数据启动

src/braft/log.cpp Outdated Show resolved Hide resolved
@PFZheng PFZheng merged commit b37c610 into baidu:master Feb 21, 2024
@ehds
Copy link
Contributor

ehds commented Apr 9, 2024

我们也遇到了同样的问题,在未开 fsync 的情况下,机器掉电重启后,会发生数据的损坏的情况。

具体表现为:文件末尾的数据全部为 0,导致 checksum 校验失败,也就是文件的长度和文件内容出现了不一致。

我们的文件系统为 ext4(默认为 ordered 模式 https://man7.org/linux/man-pages/man5/ext4.5.html ),程序运行在容器环境,通过文件挂载的方式挂载到容器内。

如果是 ordered 模式,按照文档的说法数据的内容和长度应该是一致的。也就是说即使 page cache 丢了,文件也只会丢失末尾的数据,不应该存在文件长度更新了,而数据没有的情况。

想请教下,你们遇到文件损坏具体是哪种场景。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants