Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create output file atomically #221

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

sol
Copy link
Member

@sol sol commented Feb 6, 2023

This change is useful if you both (a) run alex on modifications to .x-files and (b) :reload GHCi on modifications to .hs-files. It prevents that your GHCi session will see partial source files.

As things are you will initially get an error of the form:

src/Lexer.hs:1:1: error:
    File name does not match module name:
    Saw     : ‘Main’
    Expected: ‘Lexer’
Failed, no modules loaded.

(as initially the output file is empty)

Considerations:

  • If we go with this then we will need to drop support for GHC < 6.10.1 (as we don't have compat code for bracket and I'm not eager to burn cycles on that).
  • We could do the same thing for .info-files (personally however, .hs-files is what I care about).
  • We could guard this behind a flag (e.g. --atomic-output or something).

Thoughts?

@andreasabel
Copy link
Member

@sol, thanks for the PR!

  • If we go with this then we will need to drop support for GHC < 6.10.1

I don't see a problem with dropping GHC versions < 7 because we anyway do not test them. CI goes back to 7.0 only.

  • We could do the same thing for .info-files (personally however, .hs-files is what I care about).

This could be good for code uniformity. At least if there is similar code in other places, it should be changed in sync.

  • We could guard this behind a flag (e.g. --atomic-output or something).

I don't think that would be necessary.

One question: Can the memory footprint of alex increase significantly by doing the write atomically? (Not sure how relevant this question is, but it came to my mind and you may have an answer to this. It could increase if e.g. the writing was triggering a lazy computation of the output, and this now doesn't work like this anymore.)

@sol
Copy link
Member Author

sol commented Feb 6, 2023

One question: Can the memory footprint of alex increase significantly by doing the write atomically?

No, it doesn't. This patch creates a temporary file and later renames it. For that reason, the memory characteristics are unchanged.

The downside of this approach is that we create that temporary file on the file system and trigger corresponding inotify events, which I don't really like. The pros are that this is a common and well understood approach to guarantee atomicity when creating files + it's easy to implement without the risk of unintentional side effects.

An other approach would be to construct a ByteString (or rather a builder I guess) in memory first, and finally write that to a file. While this is not atomic it's likely fast enough so that you can get away with it + you don't get the additional inotify events for the temporary file, which I would prefer. In this case you will need ~0.5mb for the builder (for a parser of the size of what GHC uses; possibly more due to the builder overhead not sure) which I think in itself is not a big issue, but you also run the risk of changing the memory characteristics unintentionally and may need to seq things away.

A third and somewhat half measured approach would be to try do a little bit more work upfront (before opening the output file), so that the output file stays empty for a shorter period of time. This may or may not be enough to get away with it. I haven't looked into this at all.

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, if you rebase on master now, you get CI up to GHC 9.6.1 alpha2.

src/Main.hs Outdated
@@ -169,10 +173,12 @@ alex cli file basename script = do

scheme <- getScheme directives

let o_file_tmp = o_file ++ ".tmp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe more robust by using a proper temp file?
See e.g. https://hackage.haskell.org/package/base-4.3.0.0/docs/System-IO.html#v:openTempFile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made two changes:

  1. Use openTempFile
  2. Append a ~ to the name of the temporary file

The rationale for (2) is so that the file is treated the same way as backup files by file watching utilities and the like.

@sol sol force-pushed the atomic-output branch 6 times, most recently from c134931 to 66211b0 Compare February 9, 2023 00:56
@sol
Copy link
Member Author

sol commented Feb 9, 2023

@andreasabel before moving forward with this, I will dogfeed it for a little bit longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants