Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad regex sort when multiple paths match #484

Closed
BrknRobot opened this issue Apr 6, 2021 · 1 comment · Fixed by #485
Closed

Bad regex sort when multiple paths match #484

BrknRobot opened this issue Apr 6, 2021 · 1 comment · Fixed by #485

Comments

@BrknRobot
Copy link
Contributor

BrknRobot commented Apr 6, 2021

Regex comparisons where both paths/basenames match incorrectly order the files, rather than allowing subsequent criteria to break the tie.

example where regex ordering breaks
1/a
1/a2
1/b
2/a
2/a2
2/b
pattern:
x<1>r<a>l

@SeeSpotRun
Copy link
Collaborator

Firstly -S x relates to basename while -S r relates to full path. So you probably want pattern r<1>x<a>l

But still there is a bug:

$ rmlint -S 'r<1>x<a>l'
# Duplicate(s):
    ls '/pwd/1/b'
    rm '/pwd/1/a'
    rm '/pwd/1/a2'
    rm '/pwd/2/a2'
    rm '/pwd/2/a'
    rm '/pwd/2/b'

Expected:

# Duplicate(s):
    ls '/pwd/1/a'
    rm '/pwd/1/a2'
    rm '/pwd/1/b'
    rm '/pwd/2/a'
    rm '/pwd/2/a2'
    rm '/pwd/2/b'

It looks like there is a problem with the regex match sorting implementation here:

rmlint/lib/preprocess.c

Lines 356 to 389 in 094fbd5

static int rm_pp_cmp_by_regex(GRegex *regex, int idx, RmPatternBitmask *mask_a,
const char *path_a, RmPatternBitmask *mask_b,
const char *path_b) {
int result = 0;
if(RM_PATTERN_IS_CACHED(mask_a, idx)) {
/* Get the previous match result */
result = RM_PATTERN_GET_CACHED(mask_a, idx);
} else {
/* Match for the first time */
result = g_regex_match(regex, path_a, 0, NULL);
RM_PATTERN_SET_CACHED(mask_a, idx, result);
}
if(result) {
return -1;
}
if(RM_PATTERN_IS_CACHED(mask_b, idx)) {
/* Get the previous match result */
result = RM_PATTERN_GET_CACHED(mask_b, idx);
} else {
/* Match for the first time */
result = g_regex_match(regex, path_b, 0, NULL);
RM_PATTERN_SET_CACHED(mask_b, idx, result);
}
if(result) {
return +1;
}
/* Both match or none of the both match */
return 0;
}

Will look into it....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants