Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better strict version hash (SVH) computation #14145

Closed

Conversation

pnkfelix
Copy link
Member

Teach SVH computation to ignore more implementation artifacts.

In particular, this version of strict version hash (SVH) works much
like the deriving(Hash)-based implementation did, except that it
deliberately:

  1. skips over content known not affect the generated crates, and,
  2. uses a content-based hash for names instead of using the value of
    the Name index itself, which can differ depending on the order
    in which strings are interned (which in turn is affected by
    e.g. the presence of --cfg options on the command line).

Fix #14132.

@pnkfelix
Copy link
Member Author

r? @alexcrichton

if i.vis == ast::Public {
SawItem.hash(self.st); walk_item(self, i, e);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the results of the reachability pass instead of ast::Public due to things like reexports and generic functions and such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i might just do the hash unconditionally in visit_item; e.g. a non-pub function may still get inlined in a downstream crate, and thus we should include it in the SVH, right? (I suppose the answer to that question depends on what I can learn via the reachability pass; I still need to look at that.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reachability pass will return all possibly user-facing functions, which should be what you need here. For example, we use the results of reachability to determine what symbols should be visible in the object file (others can link to), which is, in essence, the public ABI.

@alexcrichton
Copy link
Member

I'm a little worried about this approach because it seems like it's very easy to sweep ABI-breaking changes under the rug by accident. For example, if a new method is added to the Visitor, then there's no guarantee that an implementation will be present for this Visitor.

I do think that it's the best approach, however, as hashing the AST just doesn't cut it. I think we'll just have time tell how this approach works.

Could you add some tests along the same lines of changing-crates.rs? The portions that are explicitly ignored in terms of hashing would be nice to keep track of.

}

fn visit_ty(&mut self, t: &Ty, e: E) {
SawTy.hash(self.st); walk_ty(self, t, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this doesn't actually hash the type itself, what if the type changes from u16 to u8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, true. Let me experiment and see if I can fix that.

@alexcrichton
Copy link
Member

I've pointed out a few places where I think the hashing isn't necessarily sufficient. I'm worried about missing cases (I'm certainly no exhaustive check). Maybe we want to be conservative on this visitor approach...

@pnkfelix
Copy link
Member Author

I just realized, some of your comments seem to indicate that you are worried about certain substructure being skipped (e.g. local definition, the patterns on match arms, etc). But I think all of those cases are covered by the recursive call to visit::walk_foo in each of those cases, which should descend down the relevant substructure.

In any case, I will make sure to add tests for those examples.

@pnkfelix
Copy link
Member Author

(the previous version was pretty buggy in various nasty ways and probably collapsed too many distinct texts to the same hash, largely due to the mistaken way I incorporated visibility that alex pointed out. I will have a new PR up tomorrow that does a better job.)

@pnkfelix
Copy link
Member Author

(added plus: writing tests forced me to discover that the PR as written here was still hashing on spans in the Lit case, and perhaps others. The effect of this, AFAICT, was that this PR as posted only fixed the --cfg stage1 issue; as soon as you made changes to the text, including just whitespace ones, the svh differed. yay testing!)

@pnkfelix
Copy link
Member Author

I will reopen this when it is more fully baked.

@pnkfelix pnkfelix closed this May 13, 2014
@pnkfelix
Copy link
Member Author

more fully baked now. :)

@pnkfelix pnkfelix reopened this May 14, 2014
@pnkfelix
Copy link
Member Author

r? @alexcrichton or @huonw (do note if you grab it so that we don't have needless duplicate review effort)

@huonw
Copy link
Member

huonw commented May 14, 2014

I'll glance over it quickly now, but I've only got 15 minutes or so.

fn got_path(&mut self, path: &Path, id: ast::NodeId, e: &E) -> After;
}

pub struct ForcedVisitor<HookState /*:ForcedVisitorHooks*/>{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a fixme? or is it just for documentation purposes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out bound? Its for documentation for now.

I guess with planned future changes to the language then we could uncomment it. I guess that's your point, that I should add a FIXME comment with a pointer to the RFC and/or tracking issue for the change to add trait-bounds to struct's type params?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it just seems a little weird to have a commented-out piece of code with no explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a tracking issue in the Rust repo for RFC 0011. I would consider adding a FIXME pointing to such a tracking issue, but without the tracking issue I'm not sure if I'd bother.

@huonw
Copy link
Member

huonw commented May 14, 2014

From my quick glance it seems fine to me, but I won't be able to read over it properly for 18+ hours (or more).

@@ -61,7 +61,37 @@ pub fn generics_of_fn(fk: &FnKind) -> Generics {
}
}

/// Each method of the Visitor trait is a hook to be potentially
/// overriden. Each method's default implementation recursvely visits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursively

@alexcrichton
Copy link
Member

I'm curious what you think about the introduction of a whole new visitor, but other than that this looks fantastic to me!

@pnkfelix
Copy link
Member Author

@alexcrichton Yeah I have gone back and forth about whether the new visitor is worth it.

(The third option, assuming the first is Visitor and the second is ForcedVisitor, would be to not use a visitor at all, and instead write the code as a set of recursive functions that match the various enums directly. I briefly started on a rewrite to that pattern this morning, and quickly abandoned it: avoiding the boilerplate for the recursive traversals really is worth it.)

Anyway, as for this PR:

I am pretty happy with how the SVH client code looks in the end with the new ForcedVisitorHooks. And the other thing is that the ForcedVisitor does duplicate some code in "visit.rs", but it does manage to reuse the all of the existing visit::walk machinery, so its not quite as bad as if I had had to cut-and-paste all of "visit.rs"

I guess the right thing for me to do now would be to survey the existing Visitor implementations and check if any of them would look better as ForcedVisitorHooks instead. If I can find something else that would benefit, then I'll keep ForcedVisitor; otherwise, I'll drop it (and rewrite this code to use Visitor again). Sound good?

@pnkfelix
Copy link
Member Author

@nikomatsakis You may have an opinion on the discussion regarding the Visitor API; we discussed it briefly during our meeting yesterday.

@alexcrichton
Copy link
Member

That sounds good to me.

@alexcrichton
Copy link
Member

r=me when you've completed your survey

@pnkfelix
Copy link
Member Author

The summary is: I don't think I can easily apply ForcedVisitor to any of our existing visitors and have it actually pay off. So I'll take it out.

According to my survey:

  • There are 60 Visitor impls that override fewer than 6 of the default methods. Obviously ForcedVisitor, with its 26 required methods, would not be good there.
  • Then there are 7 Visitor impls that override 6--9 of the default methods. I think the same logic applies here (but I at least skimmed the code for these cases to double-check that assumption.
  • There are two Visitors left:
    • The rustc::middle::lint Visitor for Context<'a> overrides 12 methods. I could believe the linter could one day grow to override more of the methods, to the point where ForcedVisitor would be worth considering. However, the architecture of the ForcedVisitorHooks is that the hook runs, and then returns a flag saying whether to recur or not. The problem then is that the lint visitor establishes state that it wants to clean up after the recursion returns, and the ForcedVisitor does not support that pattern of usage.
    • The syntax::ast_util Visitor for IdVisitor<'a, O> overrides 16 methods. Again, skimming over this I started to think that maybe it could make use of the ForcedVisitor -- I was not able to convince myself that the existing code is not overlooking cases. However, the same problem as the previous bullet arises: there is local state that it wants to set up during the recursion and clean up afterwards, a pattern that is not supported by the ForcedVisitor API I posted.

So, I will revise the PR to take out ForcedVisitor, make sure the tests still pass, and then mark as reviewed.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 15, 2014
Closes rust-lang#14210 (Make Vec.truncate() resilient against failure in Drop)
Closes rust-lang#14206 (Register new snapshots)
Closes rust-lang#14205 (use sched_yield on linux and freebsd)
Closes rust-lang#14204 (Add a crate for missing stubs from libcore)
Closes rust-lang#14201 (Render not_found with an absolute path to the rust stylesheet)
Closes rust-lang#14198 (update valgrind headers)
Closes rust-lang#14174 (Optimize common path of Once::doit)
Closes rust-lang#14162 (Print 'rustc' and 'rustdoc' as the command name for --version)
Closes rust-lang#14145 (Better strict version hash (SVH) computation)
@pnkfelix
Copy link
Member Author

(rebasing)

In particular, this version of strict version hash (SVH) works much
like the deriving(Hash)-based implementation did, except that uses a
content-based hash that filters rustc implementation artifacts and
surface syntax artifacts.

Fix rust-lang#14132.
Namely: non-pub `use` declarations *are* significant to the SVH
computation, since they can change which traits are part of the method
resolution step, and thus affect which methods get called from the
(potentially inlined) code.
(Only after adding the tests did I realize that this is not really a
special case at the AST level; as far as the visitor is concerned,
`int` and `i32` and `i64` are just idents.)
bors added a commit that referenced this pull request May 15, 2014
…excrichton

Teach SVH computation to ignore more implementation artifacts.

In particular, this version of strict version hash (SVH) works much
like the deriving(Hash)-based implementation did, except that it
deliberately:

  1. skips over content known not affect the generated crates, and,

  2. uses a content-based hash for names instead of using the value of
     the `Name` index itself, which can differ depending on the order
     in which strings are interned (which in turn is affected by
     e.g. the presence of `--cfg` options on the command line).

Fix #14132.
@bors bors closed this May 15, 2014
@petrochenkov
Copy link
Contributor

@pnkfelix @alexcrichton
I've noticed some parts of HIR are not hashed by SVH, some parts are hashed twice and some Idents (leaked through ExprInlineAsm) are still hashed as unstable numbers. Are these things worth fixing given the probabilistic nature of SVH, or it's good enough without extra polishing?

In particular, non-crate attributes seem to be intentionally not hashed at all, but they can easily affect ABI (repr, packed etc.).

@alexcrichton
Copy link
Member

It's probably "good enough" today to get the job done, but I also wouldn't be confident in saying that the SVH reflects the "true ABI" of a crate either, so in that sense I'm sure there's a few bugs here and there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strict version hash (SVH) incorporates hidden implementation artifacts
5 participants