Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX: Implement FlushFileBuffers #1057

Open
wilbaker opened this issue Apr 19, 2019 · 1 comment
Open

POSIX: Implement FlushFileBuffers #1057

wilbaker opened this issue Apr 19, 2019 · 1 comment

Comments

@wilbaker
Copy link
Member

Copied from feature 1280283 in old Azure DevOps board.

@wilbaker
Copy link
Member Author

Details from earlier notes on the topic:

Reliable Writes in GVFS for Windows

GVFS for Windows uses the follow approach to ensure reliable writes:

  1. Write new file contents to a temporary file

  2. Flush the file buffer(s) open for the temporary file
    a. FileStream.Flush(flushToDisk: true) -or-
    b. FlushFileBuffers

  3. Rename the temporary file to its final filename (and overwrite any existing file with that name if present)
    a. MoveFileEx(MOVEFILE_REPLACE_EXISTING)

Flushing File Buffers on Mac

There are two options for flushing fysnc and F_FULLFSYNC. F_FULLFSYNC appears to provide the same functionality that we have today on Windows.

fsync(2)

int fsync(int fildes)

Fsync() causes all modified data and attributes of fildes to be moved to a permanent storage device. This normally results in all in-core modified copies of buffers for the associated file to be written to a disk.

Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence.

Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written. The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.

This is not a theoretical edge case. This scenario is easily reproduced with real world workloads and drive power failures.

For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF-SYNC F_FULLFSYNC SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail.

From https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man2/fsync.2.html

F_FULLFSYNC

int fcntl(int fildes, int cmd, ...)

F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data.

From https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man2/fcntl.2.html

Rename + Overwrite

rename(2)

int rename(const char *old, const char *new);

The rename() system call causes the link named old to be renamed as new. If new exists, it is first removed. Both old and new must be of the same type (that is, both must be either directories or non-directories) and must reside on the same file system.

The rename() system call guarantees that an instance of new will always exist, even if the system should crash in the middle of the operation.

If the final component of old is a symbolic link, the symbolic link is renamed, not the file or direc-tory directory tory to which it points.

From https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man2/rename.2.html

Open Questions

Directory Entries

On the Linux man page it mentions:

Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

From https://linux.die.net/man/2/fsync

Is that required on Mac as well?

Performance

What is the cost of using fsync and\or F_FULLFSYNC?

.NET Core Support

What approach does .NET Core use when calling FileStream.Flush(flushToDisk: true) on Mac? Will we need to use F_FULLFSYNC instead?

Hydrating Files

Do we need to perform a full sync when hydrating files? Can this be done asynchronously (only marking the file as hydrated once the sync completes)?

Useful Links

The issue of fsync not flushing to the device has been discussed in other projects that run on Mac:

Estimates

There are up to three pieces of work to complete:

  1. Determine the performance cost of F_FULLFSYNC [Cost: Small]
  2. Investigate options for file hydration, is there an asynchronous way to flush to disk [Cost: Small to Medium]
  3. If the cost of F_FULLFSYNC is high, find a way to mitigate it [Cost: Medium to Large]

@jrbriggs jrbriggs modified the milestones: M153, M152 Apr 22, 2019
@jrbriggs jrbriggs modified the milestones: M152, M153 May 22, 2019
@jrbriggs jrbriggs added the pri3 label Jun 3, 2019
@jrbriggs jrbriggs removed this from the M153 milestone Jun 5, 2019
@wilbaker wilbaker changed the title Mac: Implement FlushFileBuffers POSIX: Implement FlushFileBuffers Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants