Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unification constraints to NodeProperties. #221

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jmillikin
Copy link
Contributor

A unification constraint specifies whether servers are allowed to unify specific filesystem nodes, for example using symlinks or hardlinks.

Compared to proposals #216 (cc @EricBurnett) and #217 (cc @EdSchouten), this design would let the client specify for each node whether it can be de-duplicated (1) with a symlink, (2) with a hardlink, or (3) not at all.

Use cases:

  1. Forbidding symlinks is useful for executing tools in the Node.js ecosystem (Allow artifacts (or maybe runfiles?) to be marked non-symlinkable bazel#10299)
  2. Forbidding hardlinks helps with Clang, which uses inodes as part of its #pragma once implementation (https://lists.llvm.org/pipermail/cfe-dev/2021-June/068431.html)

(hopefully I summarized the second use case correctly from the monthly meeting's discussion)

A unification constraint specifies whether servers are allowed
to unify specific filesystem nodes, for example using symlinks
or hardlinks.
ALLOW_REDIRECTS = 0;

// The server MAY use a filesystem alias, such as a hard link or bind
// mount, to substitute for this node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard links and bind mounts are absolutely not the same thing. As far as I know, bind mounts each have their own st_dev, meaning that files essentially don't alias. They each get their own identity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux, files within bind mounts have the same st_dev and st_ino as the original.

$ uname -sr
Linux 5.14.0-1004-oem
$ mkdir srcdir dstdir
$ sudo mount -o bind srcdir dstdir
$ echo 'testing' > srcdir/test.txt
$ stat srcdir/test.txt 
  File: srcdir/test.txt
  Size: 8         	Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d	Inode: 24259448    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/    john)   Gid: ( 1000/    john)
[...]
$ stat dstdir/test.txt 
  File: dstdir/test.txt
  Size: 8         	Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d	Inode: 24259448    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/    john)   Gid: ( 1000/    john)
[...]
$ python3 -c 'import os; print(os.stat("srcdir/test.txt"))'
os.stat_result(st_mode=33204, st_ino=24259448, st_dev=64769, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8, st_atime=1650338444, st_mtime=1650338444, st_ctime=1650338444)
$ python3 -c 'import os; print(os.stat("dstdir/test.txt"))'
os.stat_result(st_mode=33204, st_ino=24259448, st_dev=64769, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8, st_atime=1650338444, st_mtime=1650338444, st_ctime=1650338444)

Copy link
Collaborator

@EricBurnett EricBurnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific use-case for wanting this to be node-level, rather than a hint or constraint specified at the level of the entire action? On one side, I'm trying to think of what build tools are going to have the information and desire necessary to provide this file-by-file, and on the other side, I'm unsure how often you'd actually need to disable an optimization only for a specific subset of files. My gut feeling here is that if we started by doing action-level constraints instead of file-level, that'd be both easier to adopt and cover the majority of use-cases. Though we could follow on with file-level still later, if there's a specific argument for it.

The examples I'm aware of are python interpreters and clang compilers. For clang, every input to the action is presumably to be read by clang (or is part of clang), so I'd expect the "no hardlink" constraint on every input file; for python, I guess only the python files themselves need to have the "no symlink" constraint applied and e.g. data files to be read at runtime could theoretically be aliased to each other, but if we imagine an action with enough data files that it's actively worth aliasing them to each other, maybe it'd be better for the build tool to specify that intentionally? (And if we're not forbidding hardlinks, servers can still do that under the hood if they have the capability.)

@@ -842,6 +842,58 @@ message NodeProperties {

// The UNIX file mode, e.g., 0755.
google.protobuf.UInt32Value unix_mode = 3;

// Constrains server-side unification of identical filesystem nodes.
NodeUnificationConstraint.Value unification_constraint = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought: repeated OptimizationConstraint? I'm not sure the field needs to be super specific that it's about NodeUnification, or could be expanded to cover other things later.

NodeUnificationConstraint.Value unification_constraint = 4;
}

// A `NodeUnificationConstraint` specifies whether servers are allowed to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about flipping this? Instead of mixing positive ("ALLOW_X") and negatives ("FORBID_X"), only stick to negatives? If we have the default be that the server is generally free to optimize, then this field would be for adding one or more restrictions against what the server is allowed to do.

My first worry is that having clients opt-in to optimizations seems like it'll lead to clients missing viable optimizations commonly, and a pile of "allow_X" optimizations on every single node from the rest. That also seems like it may be more future-proof: if we add a new optimization field, does every existing client not explicitly allowing it effectively stop being eligible for that optimization? Whereas we'd only start adding new ways to forbid optimizations if/when they were found to matter for a few cases. (If they only ever apply in a few cases, I guess I could see this, or just specifying the recommendation to do something in a slightly different way.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants