Use `HasSpan` trait more extensively for more ergonomic API #278

Veetaha · 2023-10-07T15:05:03Z

This PR contains several breaking changes documented in the changelog.

I added HasSpan trait to all types that had a custom span() method not included in that trait. I also replace &Span with impl HasSpan in all public methods in marker_api to allow passing nodes to them directly.

I even added a new lint to marker_lints that should catch the cases when impl HasSpan isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble in marker_api yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since #26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.

When adding a new custom lint to marker_lints I also moved it under a module not_using_has_span_trait and named the lint variable just LINT. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic in rustc_lint or some other internal rusrc crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.

Now if there are name conflicts in lint names the error will look like this:

      Marker linting
    Checking marker_api v0.4.0-dev (/home/veetaha/dev/marker/marker_api)
   Compiling marker_rustc_driver v0.4.0-dev (/home/veetaha/dev/marker/marker_rustc_driver)
    Checking marker_error v0.4.0-dev (/home/veetaha/dev/marker/marker_error)
    Checking marker_uitest v0.4.0-dev (/home/veetaha/dev/marker/marker_uitest)
  × The lint `marker::lint` is defined multiple times:
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
error: could not compile `marker_api` (lib)
warning: build failed, waiting for other jobs to finish...
error: could not compile `marker_rustc_driver` (build script)
error: could not compile `marker_error` (lib)
error: could not compile `marker_uitest` (lib)
  × linting finished with an error

I had to use process::exit to implement this because it's not possible to return a result from the context where the lints are registered. I saw a comment that mentions a caveat that that code runs on a different thread, and thus we have no other choice, and I didn't try to fight with that. Yeah, it is a bit ugly, but it works for now.

You can also see that the error is repeated multiple times. It's because cargo called the rustic driver multiple times in parallel and all of these calls returned the same error. Unfortunately cargo didn't duplicate the error messages. Not sure how cargo does this, bit may be researched later.

Veetaha · 2023-10-07T15:28:09Z

marker_lints/src/not_using_has_span_trait.rs

+    Warn,
+}
+
+pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) {


I'm struggling with code organization here. Since there is a single lint pass for the entire crate, then if I want to define unrelated lints in separate modules I can't write their code inside of a trait impl. Writing code inside of a trait impl is cool because I can leverage rust-analyzer to paste the stub trait impl and let me fill it in. With the free function I had to copy the signature from the trait's declaration manually. I wish we could improve this situation.

Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔

Yeah, I already forgot about inventory. I'm thinking what the right API should be though, it's hard to to come up with something both succinct but flexible.

I see that clippy uses a single lint pass per lint.
For example MultiAssignments is defined in a separate file with its own lint pass (code).

And then that lint pass is registered manually (code).

That second part of manual registration can be done by inventory though.

But let's not forget that lints should be configurable, like this one.

Clippy usually has one lint pass per lint crate, but we also combine them sometimes. This can become messy (See clippy_lints::methods as an extreme example), but allows lints to share common logic.

We could do something similar, where we have a lint pass trait implemented on several types. The main LintPass of the crate would hold Vec<Box<&dyn LintPass>>, to distribute the control flow inside the crate. The dyn LintPass wouldn't have access to the 'ast lifetime, but I think we want to discourage users from storing any 'ast objects in the structs anyways

Aren't clippy lints all in one crate? I've shown the example where they register one lint pass per lint module.

That extreme example is really extreme. It's hard to tell which logic they share in common, and wether it's even worth it. I suppose it's an attempt to save some performance on looking up method expressions in code. I.e. the difference between:

if let Expr::Method(method) = expr { method_lint_1(method); method_lint_2(method); }

if let Expr::Method(method) = expr { method_lint_1(method); } if let Expr::Method(method) = expr { method_lint_2(method); }

and the first one wins because it doesn't have a repeated check?
That makes sense, but I'm not sure such optimization is worth it, given the messy code organization, but I suppose we shouldn't have an API that disallows doing such context sharing.

Maybe there could be several variations of declare_lint! macro. One for the simple case where the lint with the lint pass implementation is directly registered in inventory, and the second one more advanced that registers only the lint in the inventory, but the implementation of the lint pass is expected to be used manually inside of other lint pass implementation and thus not registered.

Veetaha · 2023-10-07T15:28:57Z

marker_lints/src/not_using_has_span_trait.rs

+fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> {
+    match ty {
+        ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()),
+        _ => ty,
+    }


I wonder if this function could be moved to the public API of TyKind.

it would definitely be a good function to have. peel_refs from rustc might be a good example as well

Cool 🤙, I added the same peel_refs method to ast::TyKind and sem::TyKind.

I've noticed that there is no information about the parentheses around the type in TyKinds. For example both u32, (u32), and ((u32)) are represented the same in the AST. This is in contrast to syn, where it has a Paren type kind that wraps the inner type in parens explicitly.

I added two test cases to print_ty.rs to show that there is no info about that. Is this by design? I don't think this is critical, and in fact the Paren type kind in syn is often annoying and I have to write a similar strip_ty_group function that removes the parentheses and an invisible Group. It's really hard to understand what Group is, but I suppose that's related to invisible groupping that is added by the compiler when a macro is invoked in type position. For example &macro_call!() is implicitly grouped when expanded like &(expanded_ty_here) and the parentheses I used here aren't visible.

So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.

Is this by design?

Yes, such information is usually not represented in ASTs, as they are semantically the same. You'll see the same thing for binary operators. (1 + 2) * 3 the brackets are not included in the final AST. The tree structure preserves the execution order.

This is part of what the abstract from "Abstract Syntax Tree" is about. It tries to unify these syntactical differences into one uniform view, at the loss of information. If a user wants, they can still use the span, to see if there are parentheses, but that's not necessarily recommended.

Clippy has a few lints, which check for the syntax. But they're always a bit weird, as we use Clippy as a formatter with semantic information. clippy::semicolon_if_nothing_returned is an example, where we need to check the written syntax as well for some heuristics, if the semicolon should be suggested.

So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.

In Clippy we have lived happy without this information, and hacked around with spans, where needed. If we wanted to provide a way to completely reconstruct the original source code, we should include parenthesis. Rustfmt and syn, for instance, require this information. For Marker's case, I don't believe that this should be one of our goals. It would be nice to have a function to print the AST as it would be written (Similar how rustc has this for hir, with the "Show HIR" option on the playground). But adding parenthesis nodes, would add a lot of edge cases to linting, with very little benefit :)

Veetaha · 2023-10-07T15:30:19Z

marker_lints/src/not_using_has_span_trait.rs

+        if ident != "Span" {
+            continue;
+        }


I didn't find a way to resolve the semantic information of an ast::TyKind. That's why I had to use a heuristic that compares the last segment of the path with Span which is good enough, but definitely not ideal in cases when there are multiple crates with multiple types named Span.

Jep, you're right. There should be an option to check this using resolve_ty_ids, but there seems to be no way of getting the TyDefId, woops 😅

The problem with resolve_ty_ids is that it takes a stringified path as a parameter. It also returns a list of matching types.
In this case we should have a method that takes an ast::TyKind as a parameter and returns just a single sem::TyKind. I suppose it shouldn't be ambiguous, and only a single type can be resolved given the context of the crate and module where the ast::TyKind resides it should always resolve to a single type, right?

Rustc has a util to convert AST types to semantic types, it can crash under some circumstances, which is why I'm cautious with it. I think the better option would be to have a resolve_item_ids equivalent. The AST path of the ast::TyKind should already provide the ItemId it refers to. Right now, there is just no way to check if that's the one from our Span struct

ast::TyKind resides it should always resolve to a single type, right?

That's correct 👍 Everything in the AST can only have a single type, but a string path can have multiple IDs.

marker_lints/src/not_using_has_span_trait.rs

xFrednet

Looks good to me, a few nits and comments, but that's it. It's nice to see another lint implementation. It's a good way to find issues and to prioritize them 👍

When adding a new custom lint to marker_lints I also moved it under a module not_using_has_span_trait and named the lint variable just LINT. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic in rustc_lint or some other internal rusrc crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.

This is something which I've been thinking about. Ideally, I would like to have the lint crate in front, instead of marker. allow(marker_lints::XYZ) makes more sense, and prevents collisions by default. But the chances of this happening, with the previous state of the discussion about tool name spaces is basically zero. We could add something after the marker prefix, marker::<lint_crate>::<lint_name> or so. Here, I fear that this is too verbose. 🤔 I'm also not sure if rustc would accept this correctly, but if this is something we want, we could probably convince the compiler team, to accept these changes :)

Edit: It seems like rustc accepts lint path with multiple segments, so that could be an option (Playground) 👍 This doesn't solve the issue, if multiple lints have the same name in one lint crate, but at least prevents collisions between different ones

For now, I like your solution a lot :D

xFrednet · 2023-10-08T10:08:56Z

marker_rustc_driver/src/main.rs

+    feature(register_tool),
+    register_tool(marker),
+    // This lint applies only to public `marker_api`
+    allow(marker::not_using_has_span_trait)


It might be better, to have the lint allow-by-default and then add the warn(marker::not_using_has_span_trait) to marker_api instead. I also thought about adding a lint crate for internal marker lints. 🤔

Good idea 💡, I forgot about that

marker_lints/src/not_using_has_span_trait.rs

xFrednet · 2023-10-08T10:15:26Z

marker_lints/src/not_using_has_span_trait.rs

+fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> {
+    match ty {
+        ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()),
+        _ => ty,
+    }


it would definitely be a good function to have. peel_refs from rustc might be a good example as well

xFrednet · 2023-10-08T10:18:15Z

marker_lints/src/not_using_has_span_trait.rs

+        if ident != "Span" {
+            continue;
+        }


Jep, you're right. There should be an option to check this using resolve_ty_ids, but there seems to be no way of getting the TyDefId, woops 😅

xFrednet · 2023-10-08T10:20:15Z

marker_lints/src/not_using_has_span_trait.rs

+    Warn,
+}
+
+pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) {


Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔

marker_api/src/lib.rs

marker_api/src/ast/ty.rs

xFrednet

Looks good to me, thank you for the nice update and new lint ❤️

I added `HasSpan` trait to all types that had a custom `span()` method not included in that trait. I also replace `&Span` with `impl HasSpan` in all public methods in `marker_api` to allow passing nodes to them directly. I even added a new lint to `marker_lints` that should catch the cases when `impl HasSpan` isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble in `marker_api` yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since rust-marker#26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.

Veetaha force-pushed the feat/more-has-span branch 2 times, most recently from 0f8276b to 95d660b Compare October 7, 2023 15:16

Veetaha commented Oct 7, 2023

View reviewed changes

marker_lints/src/not_using_has_span_trait.rs Show resolved Hide resolved

Veetaha requested a review from xFrednet October 7, 2023 15:31

xFrednet approved these changes Oct 8, 2023

View reviewed changes

Veetaha requested a review from xFrednet October 8, 2023 13:54

xFrednet approved these changes Oct 8, 2023

View reviewed changes

Veetaha force-pushed the feat/more-has-span branch from 4f5b787 to 87fbdf5 Compare October 8, 2023 16:01

Veetaha enabled auto-merge October 8, 2023 16:01

Veetaha force-pushed the feat/more-has-span branch from 87fbdf5 to e47d5b8 Compare October 8, 2023 16:28

Veetaha added this pull request to the merge queue Oct 8, 2023

Merged via the queue into rust-marker:master with commit 36ecc5b Oct 8, 2023
22 of 23 checks passed

Veetaha deleted the feat/more-has-span branch October 8, 2023 16:36

xFrednet mentioned this pull request Oct 11, 2023

API: Include the lint crate in the lint identifier #288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `HasSpan` trait more extensively for more ergonomic API #278

Use `HasSpan` trait more extensively for more ergonomic API #278

Veetaha commented Oct 7, 2023 •

edited

Loading

Veetaha Oct 7, 2023

xFrednet Oct 8, 2023

Veetaha Oct 8, 2023

Veetaha Oct 8, 2023

xFrednet Oct 8, 2023

Veetaha Oct 8, 2023

Veetaha Oct 7, 2023

xFrednet Oct 8, 2023

Veetaha Oct 8, 2023 •

edited

Loading

Veetaha Oct 8, 2023 •

edited

Loading

xFrednet Oct 8, 2023

Veetaha Oct 7, 2023

xFrednet Oct 8, 2023

Veetaha Oct 8, 2023

xFrednet Oct 8, 2023 •

edited

Loading

xFrednet left a comment •

edited

Loading

xFrednet Oct 8, 2023

Veetaha Oct 8, 2023

xFrednet Oct 8, 2023

xFrednet Oct 8, 2023

xFrednet Oct 8, 2023

xFrednet left a comment

Use HasSpan trait more extensively for more ergonomic API #278

Use HasSpan trait more extensively for more ergonomic API #278

Conversation

Veetaha commented Oct 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha Oct 8, 2023 • edited Loading

Choose a reason for hiding this comment

Veetaha Oct 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xFrednet Oct 8, 2023 • edited Loading

Choose a reason for hiding this comment

xFrednet left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xFrednet left a comment

Choose a reason for hiding this comment

Use `HasSpan` trait more extensively for more ergonomic API #278

Use `HasSpan` trait more extensively for more ergonomic API #278

Veetaha commented Oct 7, 2023 •

edited

Loading

Veetaha Oct 8, 2023 •

edited

Loading

Veetaha Oct 8, 2023 •

edited

Loading

xFrednet Oct 8, 2023 •

edited

Loading

xFrednet left a comment •

edited

Loading