Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changelog: new lint: [split_with_space] #13059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DavidHusicka
Copy link

changelog: new lint: [split_with_space]

Adds a new lint for checking whether the split method uses ASCII space as an delimiter and if it does so, recommends use of split_whitespace.

Adds a new lint for checking whether the `split` method uses ASCII space as an delimeter and if it does so, recommends use of `split_whitespace`.
@rustbot
Copy link
Collaborator

rustbot commented Jul 6, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Centri3 (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jul 6, 2024

impl<'tcx> LateLintPass<'tcx> for SplitWithSpace {
fn check_expr(&mut self, cx: &LateContext<'tcx>, expr: &Expr<'_>) {
if let ExprKind::MethodCall(path, _split_recv, [arg], split_ws_span) = expr.kind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check that the receiver is really a &str and not some other type with a split method that may not have split_whitespace (see this)

split_ws_span.with_hi(split_ws_span.lo()),
"found call to `str::split` that uses the ASCII space character as an argument".to_string(),
"replace the use of `str::split` with `str::split_whitespace`".to_string(),
String::new(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current span & suggestion seems to replace nothing?

I think it would be good to use multipart_suggestion and replace arg.span with "", and split_ws_span with "split_whitespace", which would generate a diagnostic like

help: replace the use of `str::split` with `str::split_whitespace`
   |
LL - some_space_delimtetered_string.split(" ")
LL + some_space_delimtetered_string.split_whitespace()

"found call to `str::split` that uses the ASCII space character as an argument".to_string(),
"replace the use of `str::split` with `str::split_whitespace`".to_string(),
String::new(),
Applicability::MachineApplicable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...It should probably not be MachineApplicable though.

Although in the majority of cases I imagine this is strictly more correct, there might be weird edge cases where people do want to split by exactly that, and rely on "a b".split(' ').count() == 3, which fails with split_whitespace, and MachineApplicable should not change behavior that the user needs

/// Checks for calling `str::split` with the ASCII space character instead of `str::split_whitespace`.
///
/// ### Why is this bad?
/// `split` using the ASCII space character does not handle cases where there is a different space character used. `split_whitespace` handles all whitespace characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention that split_whitespace also handles multiple consecutive spaces

//@no-rustfix

#[deny(clippy::split_with_space)]
#[allow(unused)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allow(unused) is already implicitly passed, so this probably isn't needed

@@ -0,0 +1,16 @@
//@aux-build:proc_macros.rs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test links proc_macros.rs but never actually uses it.

It probably should, though, so this is a good idea. Can you add a test with proc macros? You can look at e.g. this file to see how to write a test that uses this.
You'll also likely need to add a !is_from_proc_macro() check to the lint code.


for substr in some_space_delimtetered_string.split(" ") {
println!("{substr}");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some negative tests, e.g. splitting by something that isn't a space.

&& path.ident.name == sym!(split)
&& let ExprKind::Lit(lit) = arg.kind
&& (matches!(lit.node, LitKind::Char(' '))
|| matches!(lit.node, LitKind::Str(str, _) if str.as_str() == " "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a check here that this isn't from a macro, (or at least, not from an external macro since the user can't change that, can use in_external_macro for this)

@Alexendoo
Copy link
Member

Are there some examples where split_whitespace is preferred? There's a good number of parsing cases where you do want to split on spaces, there's also a good number where just splitting on spaces would be slightly incorrect (e.g. IRC's message format)

However I can't think of any where split_whitespace is the correct choice - usually it's something slightly more subtle, for example rust whitespace is a different unicode property, Pattern_White_Space

@bors
Copy link
Collaborator

bors commented Jul 17, 2024

☔ The latest upstream changes (presumably #13088) made this pull request unmergeable. Please resolve the merge conflicts.

@xFrednet
Copy link
Member

xFrednet commented Aug 3, 2024

Hey @DavidHusicka , this is a ping from triage, since there hasn't been any activity in some time. Are you still planning to continue this implementation?

If you have any questions, you're always welcome to ask them in this PR or on Zulip.

@rustbot author


Also it seems like @Centri3 is currently busy. @y21 would you mind taking over this review, since you already started?

r? @y21

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Aug 3, 2024
@rustbot rustbot assigned y21 and unassigned Centri3 Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants