Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): protect MFS root in node with lock #8447

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Sep 20, 2021

Depends on #8574. Please review and land that first to simplify this diff.

Replace the readily accessible MFS root in the node (FilesRoot) with a lock-protected version (LockedFilesRoot).

Testing: We currently only do a basic test manually (through the temporarily added script test-mfs-lock.bash), but we should look for a robust way to test the lock (preferably not through sharness/bash but through the internal Go API).

Note that this entire logic applies only when having a daemon running where the different GC and files commands contend for the same MFS root. If the commands are run in standalone mode then each one will contend for the entire repo's lock (and this mechanism isn't relevant there).

TODO:

  • Update this description according to the new changes.
  • Land chore(cmds): encapsulate ipfs rm logic in another function #8574 on which this depends.
  • Figure out how to reliably test this (that is, how to test that the race condition doesn't happen anymore). I haven't seen any sharness test that could be used as models.
  • Resolve the blocking FIXMEs.
  • Review the unlocks and be more fine-grained where possible instead of using defer always.

Closes #6113.

@BigLep
Copy link
Contributor

BigLep commented Sep 24, 2021

Assigned to @petar for review as this is a newer area for all the current go-ipfs stewards.

@@ -792,20 +795,25 @@ stat' on the file or any of its ancestors.
if err != nil {
return err
}
filesRoot := nd.LockedFilesRoot.LockRoot()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann This design takes a high-level lock on all of MFS while running cli commands or the GC. On the one hand, this approach is a little error-prone because we need to make sure that this lock is used in all app-level code that touches MFS. On the other hand, after doing some thinking, I don't think it is easy to implement per-node locking in MFS. So, for now, it seems that the approach in this PR is fine for fixing the issue in the short term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a high level lock here is unfortunate, but isn't really any different than locking around Pin operations. It also seems fairly straightforward to implement.

In theory we could leverage the existence of the flush command here to allow accumulating data that's protected from GC, but given other work we'd like to do around MFS and GC (independently and together) it doesn't seem worth the effort at the moment.

@schomatis
Copy link
Contributor Author

@aschmahmann From your previous comment I take it you approve the direction of this PR but could you make it explicit here just to make sure I'm on the right track before proceeding, please?

@aschmahmann
Copy link
Contributor

@schomatis yes, having a global write lock on MFS seems like the best thing to do in this PR.

For reads we might want to be more careful than a global lock though. For example, not locking ipfs files ls or ipfs files stat while GC is ongoing, but preventing corruption between MFS write and read operations (out of date data, is fine with concurrent operations, but corruption would be bad). It might be that MFS already handles some of this, but we should verify.

If we find ourselves stuck with locking on some of the read commands I'd really like ipfs files stat /ipfs/.... to not utilize the same locking mechanism since the data is immutable and shouldn't need to be locked. It's also important in order to make sure nobody has a reason to continue using ipfs object stat.

@schomatis
Copy link
Contributor Author

Continuing with this.

@schomatis schomatis force-pushed the schomatis/review/gc-mfs-not-safe branch from 86c53a0 to 3939682 Compare November 27, 2021 00:48
@schomatis
Copy link
Contributor Author

I'd really like ipfs files stat /ipfs/.... to not utilize the same locking mechanism since the data is immutable and shouldn't need to be locked.

Confirming this is already the case since the /ipfs/ prefix is handled through a different API and doesn't use the MFS root:

https://github.com/ipfs/go-ipfs/blob/c00065cc8412ec660037f3fed4df7d3fed467b17/core/commands/files.go#L424-L429

@schomatis schomatis force-pushed the schomatis/review/gc-mfs-not-safe branch from 005f234 to f6e51ac Compare December 1, 2021 14:55
@schomatis schomatis changed the title Lock MFS root in node feat(core): protect MFS root in node with lock Dec 1, 2021
@schomatis schomatis force-pushed the schomatis/review/gc-mfs-not-safe branch from f6e51ac to d0e0595 Compare December 1, 2021 15:20
@schomatis schomatis requested review from aschmahmann and petar and removed request for aschmahmann December 2, 2021 13:33
// given we would be exposing an unlocked version of the MFS root)
// * remove it completely (breaking interface) in favor of LockedFilesRoot;
// this would seem the cleaner option
// * some in-between like renaming LockedFilesRoot to the original FilesRoot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go for this option, because this will help someone using the new version of the code figure out what is going on. It does not look like there is any danger of old incorrect code compiling quietly.

@@ -168,3 +169,54 @@ func Files(mctx helpers.MetricsCtx, lc fx.Lifecycle, repo repo.Repo, dag format.

return root, err
}

// FIXME(BLOCKING): This should probably be merged into the Files provider above
// as no one should have access to the unlocked version of the mfs.Root.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

// distinction between reading and writing.
// FIXME(BLOCKING): Consider providing our own API functions like GetNode (and similar) that
// already take charge of lock-handling here instead of leaving that burden to the consumer.
// That way we could also make sure the lock is being enforced correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave things exactly as you have them right now. There are bigger issues with MFS locking that cannot be resolved by any sort of locking on the root alone. This PR is trying to just fix races between GC and competing command-line MFS operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the interface of LockedFilesRoot, because having only RLock and Lock methods forces whoever is touching this code to think/ask what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, this is actually a non-blocking fixme, just a note of a possible direction down the road, will clarify that.

// doing a Lookup, (a) allows the Flush operation and (b) its File/Directory structures
// implementing it have their own locking mechanism to keep track of open file
// descriptors.
func (lfr *LockedFilesRoot) RLock() *mfs.Root {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the best we can do for now without a total redesign of MFS, I think.

@@ -50,7 +50,11 @@ type ipfsPinMFSNode struct {
}

func (x *ipfsPinMFSNode) RootNode() (ipld.Node, error) {
return x.node.FilesRoot.GetDirectory().GetNode()
// FIXME(BLOCKING): We're deploying this lock taking mechanism as simply as
// possible for now deferring the unlock but we should be more precise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See/Similar to comments below. That's the best we can do now. Even though this code has no effect at all (because the root cannot change inside LockedFilesRoot), it still seems useful to make someone staring at the code to be puzzled and investigate what this lock is all about.

ipfs --version
ipfs files mkdir /test-lock/
ipfs files rm /test-lock/ -r
((echo "content" | ./cmd/ipfs/ipfs files write --create --parents --truncate --lock-time 3 /test-lock/file) && echo "ipfs write lock released" &)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way you've implemented the timeout the command ipfs files write does not exit until the timeout expires, consequently the lock is never present for the next command ipfs repo gc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command ipfs files write does not exit until the timeout expires,

Correct. My bash is very poor but the idea was to have the above command running in the background while ipfs repo gc attempts to take the lock and has to wait until ipfs write releases it, where then I see both command's outputs roughly simultaneously.

But anyway, this is just a temporary ad-hoc test while in the draft stage; I still don't know how to actually test this in production to get this code ready to merge.

@schomatis
Copy link
Contributor Author

@aschmahmann Petar confirmed the current proposal but I'm still stuck on how to test it other than some manual confirmation than the lock is being taken. How would you like to test this to wrap it up?

@BigLep BigLep added this to the Best Effort Track milestone Mar 5, 2022
@BigLep BigLep removed the request for review from Stebalien March 10, 2022 03:32
@BigLep BigLep modified the milestones: kubo 0.14, kubo 0.15 Jul 22, 2022
@BigLep BigLep removed the request for review from aschmahmann September 1, 2022 01:54
@BigLep BigLep modified the milestones: kubo 0.15, Best Effort Track Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🥞 Todo
Development

Successfully merging this pull request may close these issues.

GC and MFS are not safe
4 participants