From 98d3012ed9fa2a812968fe1dc034670cfb571680 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 21 Aug 2023 00:37:06 -0700 Subject: [PATCH 1/6] Use version-sorting for all sorting Add a description of a version-sorting algorithm. (This algorithm does not precisely match `strverscmp`; it's intentionally simpler in its handling of leading zeroes, and produces a result easier for humans to easily understand and do by hand.) Change all references to sorting to use version-sorting. Change all references to "ASCIIbetically" to instead say "sort non-lowercase before lowercase". --- src/doc/style-guide/src/README.md | 36 +++++++++++++++++++++++++++++ src/doc/style-guide/src/cargo.md | 6 ++--- src/doc/style-guide/src/editions.md | 2 ++ src/doc/style-guide/src/items.md | 23 +++++++++++------- 4 files changed, 56 insertions(+), 11 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index f4d759673700d..dd80966d15c1d 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -99,6 +99,42 @@ fn bar() {} fn baz() {} ``` +### Sorting + +In various cases, the default Rust style specifies to sort things. If not +otherwise specified, such sorting should be "version sorting", which ensures +that (for instance) `x8` comes before `x16` even though the character `1` comes +before the character `8`. (If not otherwise specified, version-sorting is +lexicographical.) + +For the purposes of the Rust style, to compare two strings for version-sorting: + +- Compare the strings by (Unicode) character as normal, finding the index of + the first differing character. (If the two strings do not have the same + length, this may be the end of the shorter string.) +- For both strings, determine the sequence of ASCII digits containing either + that character or the character before. (If either string doesn't have such a + sequence of ASCII digits, fall back to comparing the strings as normal.) +- Compare the numeric values of the number specified by the sequence of digits. + (Note that an implementation of this algorithm can easily check this without + accumulating copies of the digits or converting to a number: longer sequences + of digits are larger numbers, equal-length sequences can be sorted + lexicographically.) +- If the numbers have the same numeric value, the one with more leading zeroes + comes first. + +Note that there exist various algorithms called "version sorting", which differ +most commonly in their handling of numbers with leading zeroes. This algorithm +does not purport to precisely match the behavior of any particular other +algorithm, only to produce a simple and satisfying result for Rust formatting. +(In particular, this algorithm aims to produce a satisfying result for a set of +symbols that have the same number of leading zeroes, and an acceptable and +easily understandable result for a set of symbols that has varying numbers of +leading zeroes.) + +As an example, version-sorting will sort the following symbols in the order +given: `x000`, `x00`, `x0`, `x01`, `x1`, `x09`, `x9`, `x010`, `x10`. + ### [Module-level items](items.md) ### [Statements](statements.md) diff --git a/src/doc/style-guide/src/cargo.md b/src/doc/style-guide/src/cargo.md index d3b67ae45825d..d47d04642280f 100644 --- a/src/doc/style-guide/src/cargo.md +++ b/src/doc/style-guide/src/cargo.md @@ -8,11 +8,11 @@ Put a blank line between the last key-value pair in a section and the header of the next section. Do not place a blank line between section headers and the key-value pairs in that section, or between key-value pairs in a section. -Sort key names alphabetically within each section, with the exception of the +Version-sort key names within each section, with the exception of the `[package]` section. Put the `[package]` section at the top of the file; put the `name` and `version` keys in that order at the top of that section, -followed by the remaining keys other than `description` in alphabetical order, -followed by the `description` at the end of that section. +followed by the remaining keys other than `description` in order, followed by +the `description` at the end of that section. Don't use quotes around any standard key names; use bare keys. Only use quoted keys for non-standard keys whose names require them, and avoid introducing such diff --git a/src/doc/style-guide/src/editions.md b/src/doc/style-guide/src/editions.md index 5c67a185b8ffa..19e62c4867c99 100644 --- a/src/doc/style-guide/src/editions.md +++ b/src/doc/style-guide/src/editions.md @@ -37,6 +37,8 @@ history of the style guide. Notable changes in the Rust 2024 style edition include: - Miscellaneous `rustfmt` bugfixes. +- Use version-sort (sort `x8`, `x16`, `x32`, `x64`, `x128` in that order). +- Change "ASCIIbetical" sort to Unicode-aware "non-lowercase before lowercase". ## Rust 2015/2018/2021 style edition diff --git a/src/doc/style-guide/src/items.md b/src/doc/style-guide/src/items.md index a6d941f6d0454..e00e5a9903811 100644 --- a/src/doc/style-guide/src/items.md +++ b/src/doc/style-guide/src/items.md @@ -9,8 +9,8 @@ an item appears at module level or within another item. alphabetically. `use` statements, and module *declarations* (`mod foo;`, not `mod { ... }`) -must come before other items. Put imports before module declarations. Sort each -alphabetically, except that `self` and `super` must come before any other +must come before other items. Put imports before module declarations. +Version-sort each, except that `self` and `super` must come before any other names. Don't automatically move module declarations annotated with `#[macro_use]`, @@ -441,8 +441,10 @@ foo::{ A *group* of imports is a set of imports on the same or sequential lines. One or more blank lines or other items (e.g., a function) separate groups of imports. -Within a group of imports, imports must be sorted ASCIIbetically (uppercase -before lowercase). Groups of imports must not be merged or re-ordered. +Within a group of imports, imports must be version-sorted, except that +non-lowercase characters (characters that can start an `UpperCamelCase` +identifier) must be sorted before lowercase characters. Groups of imports must +not be merged or re-ordered. E.g., input: @@ -469,10 +471,15 @@ re-ordering. ### Ordering list import -Names in a list import must be sorted ASCIIbetically, but with `self` and -`super` first, and groups and glob imports last. This applies recursively. For -example, `a::*` comes before `b::a` but `a::b` comes before `a::*`. E.g., -`use foo::bar::{a, b::c, b::d, b::d::{x, y, z}, b::{self, r, s}};`. +Names in a list import must be version-sorted, except that: +- `self` and `super` always come first if present, +- non-lowercase characters (characters that can start an `UpperCamelCase` + identifier) must be sorted before lowercase characters, and +- groups and glob imports always come last if present. + +This applies recursively. For example, `a::*` comes before `b::a` but `a::b` +comes before `a::*`. E.g., `use foo::bar::{a, b::c, b::d, b::d::{x, y, z}, +b::{self, r, s}};`. ### Normalisation From 127e052a5a697854c3c1cd0502bfb23cac063094 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 21 Aug 2023 05:03:17 -0700 Subject: [PATCH 2/6] Make an implementation note on version-sorting accurate --- src/doc/style-guide/src/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index dd80966d15c1d..7a6ebc8f5652a 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -117,9 +117,9 @@ For the purposes of the Rust style, to compare two strings for version-sorting: sequence of ASCII digits, fall back to comparing the strings as normal.) - Compare the numeric values of the number specified by the sequence of digits. (Note that an implementation of this algorithm can easily check this without - accumulating copies of the digits or converting to a number: longer sequences - of digits are larger numbers, equal-length sequences can be sorted - lexicographically.) + accumulating copies of the digits or converting to a number: after skipping + leading zeroes, longer sequences of digits are larger numbers, and + equal-length sequences of digits can be sorted lexicographically.) - If the numbers have the same numeric value, the one with more leading zeroes comes first. From 47bb0761e60f51bcea39aac5792b2e0e8dd03f71 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 21 Aug 2023 05:05:09 -0700 Subject: [PATCH 3/6] Clarify that version-sorting looks for the *longest* sequence of digits --- src/doc/style-guide/src/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index 7a6ebc8f5652a..9c81a25e627e4 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -112,9 +112,10 @@ For the purposes of the Rust style, to compare two strings for version-sorting: - Compare the strings by (Unicode) character as normal, finding the index of the first differing character. (If the two strings do not have the same length, this may be the end of the shorter string.) -- For both strings, determine the sequence of ASCII digits containing either - that character or the character before. (If either string doesn't have such a - sequence of ASCII digits, fall back to comparing the strings as normal.) +- For both strings, determine the longest sequence of ASCII digits containing + either that character or the character before. (If either string doesn't have + such a sequence of ASCII digits, fall back to comparing the strings as + normal.) - Compare the numeric values of the number specified by the sequence of digits. (Note that an implementation of this algorithm can easily check this without accumulating copies of the digits or converting to a number: after skipping From 95eb1e206ca562a27762bd8c4cbd88bb26c1e115 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 21 Aug 2023 18:42:44 -0700 Subject: [PATCH 4/6] Streamline description of versionsort (incorporate suggestion from Ralf) --- src/doc/style-guide/src/README.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index 9c81a25e627e4..62f463ee2f0d4 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -112,10 +112,9 @@ For the purposes of the Rust style, to compare two strings for version-sorting: - Compare the strings by (Unicode) character as normal, finding the index of the first differing character. (If the two strings do not have the same length, this may be the end of the shorter string.) -- For both strings, determine the longest sequence of ASCII digits containing - either that character or the character before. (If either string doesn't have - such a sequence of ASCII digits, fall back to comparing the strings as - normal.) +- For both strings, determine the longest sequence of ASCII digits that either + contains or ends at that index. (If either string doesn't have such a + sequence of ASCII digits, fall back to comparing the strings as normal.) - Compare the numeric values of the number specified by the sequence of digits. (Note that an implementation of this algorithm can easily check this without accumulating copies of the digits or converting to a number: after skipping From f06df2207ed4a7adc34cab93fe82d7d0e22c2cc8 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 27 Aug 2023 17:02:33 -0700 Subject: [PATCH 5/6] Clarify "as normal" -> "lexicographically" --- src/doc/style-guide/src/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index 62f463ee2f0d4..d00e6a5d882b8 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -109,12 +109,13 @@ lexicographical.) For the purposes of the Rust style, to compare two strings for version-sorting: -- Compare the strings by (Unicode) character as normal, finding the index of - the first differing character. (If the two strings do not have the same - length, this may be the end of the shorter string.) +- Compare the strings by (Unicode) character lexicographically, finding the + index of the first differing character. (If the two strings do not have the + same length, this may be the end of the shorter string.) - For both strings, determine the longest sequence of ASCII digits that either contains or ends at that index. (If either string doesn't have such a - sequence of ASCII digits, fall back to comparing the strings as normal.) + sequence of ASCII digits, fall back to comparing the strings + lexicographically.) - Compare the numeric values of the number specified by the sequence of digits. (Note that an implementation of this algorithm can easily check this without accumulating copies of the digits or converting to a number: after skipping From 2e931b541787bbdf444f6edab89cbd8efc3b7eaf Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Fri, 22 Dec 2023 16:22:45 -0800 Subject: [PATCH 6/6] style-guide: Rework version-sorting algorithm Treat numeric chunks with equal value but differing numbers of leading zeroes as equal, unless we get to the end of the entire string in which case we use "more leading zeroes in the earliest differing chunk" as a tiebreaker. Treat `_` as a word separator, sorting it before anything other than space. Give more examples. --- src/doc/style-guide/src/README.md | 91 +++++++++++++++++++++++-------- 1 file changed, 69 insertions(+), 22 deletions(-) diff --git a/src/doc/style-guide/src/README.md b/src/doc/style-guide/src/README.md index d00e6a5d882b8..b8193891b13b1 100644 --- a/src/doc/style-guide/src/README.md +++ b/src/doc/style-guide/src/README.md @@ -109,32 +109,79 @@ lexicographical.) For the purposes of the Rust style, to compare two strings for version-sorting: -- Compare the strings by (Unicode) character lexicographically, finding the - index of the first differing character. (If the two strings do not have the - same length, this may be the end of the shorter string.) -- For both strings, determine the longest sequence of ASCII digits that either - contains or ends at that index. (If either string doesn't have such a - sequence of ASCII digits, fall back to comparing the strings - lexicographically.) -- Compare the numeric values of the number specified by the sequence of digits. - (Note that an implementation of this algorithm can easily check this without - accumulating copies of the digits or converting to a number: after skipping - leading zeroes, longer sequences of digits are larger numbers, and - equal-length sequences of digits can be sorted lexicographically.) -- If the numbers have the same numeric value, the one with more leading zeroes - comes first. - -Note that there exist various algorithms called "version sorting", which differ -most commonly in their handling of numbers with leading zeroes. This algorithm +- Process both strings from beginning to end as two sequences of maximal-length + chunks, where each chunk consists either of a sequence of characters other + than ASCII digits, or a sequence of ASCII digits (a numeric chunk), and + compare corresponding chunks from the strings. +- To compare two numeric chunks, compare them by numeric value, ignoring + leading zeroes. If the two chunks have equal numeric value, but different + numbers of leading digits, and this is the first time this has happened for + these strings, treat the chunks as equal (moving on to the next chunk) but + remember which string had more leading zeroes. +- To compare two chunks if both are not numeric, compare them by Unicode + character lexicographically, except that `_` (underscore) sorts immediately + after ` ` (space) but before any other character. (This treats underscore as + a word separator, as commonly used in identifiers.) + - If the use of version sorting specifies further modifiers, such as sorting + non-lowercase before lowercase, apply those modifiers to the lexicographic + sort in this step. +- If the comparison reaches the end of the string and considers each pair of + chunks equal: + - If one of the numeric comparisons noted the earliest point at which one + string had more leading zeroes than the other, sort the string with more + leading zeroes first. + - Otherwise, the strings are equal. + +Note that there exist various algorithms called "version sorting", which +generally try to solve the same problem, but which differ in various ways (such +as in their handling of numbers with leading zeroes). This algorithm does not purport to precisely match the behavior of any particular other algorithm, only to produce a simple and satisfying result for Rust formatting. -(In particular, this algorithm aims to produce a satisfying result for a set of +In particular, this algorithm aims to produce a satisfying result for a set of symbols that have the same number of leading zeroes, and an acceptable and easily understandable result for a set of symbols that has varying numbers of -leading zeroes.) - -As an example, version-sorting will sort the following symbols in the order -given: `x000`, `x00`, `x0`, `x01`, `x1`, `x09`, `x9`, `x010`, `x10`. +leading zeroes. + +As an example, version-sorting will sort the following strings in the order +given: +- `_ZYWX` +- `u_zzz` +- `u8` +- `u16` +- `u32` +- `u64` +- `u128` +- `u256` +- `ua` +- `usize` +- `uz` +- `v000` +- `v00` +- `v0` +- `v0s` +- `v00t` +- `v0u` +- `v001` +- `v01` +- `v1` +- `v009` +- `v09` +- `v9` +- `v010` +- `v10` +- `w005s09t` +- `w5s009t` +- `x64` +- `x86` +- `x86_32` +- `x86_64` +- `x86_128` +- `x87` +- `Z_YWX` +- `ZY_WX` +- `ZYW_X` +- `ZYWX` +- `ZYWX_` ### [Module-level items](items.md)