Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

printf rewrite (with a lot of seq changes) #5128

Merged
merged 33 commits into from
Nov 28, 2023

Conversation

tertsdiepraam
Copy link
Member

@tertsdiepraam tertsdiepraam commented Aug 2, 2023

The current printf implementation is -- with all due respect to the original author -- a mess. So I'm rewriting it completely to be smaller and more readable.

Here's what changed:

  • Well, everything, but let me be more precise.
  • It started with a complete rewrite of the parsing of the % directives, which is now much more concise. I think it's also more efficient, but I haven't tested that.
  • It is no longer necessary to give all arguments as strings, which are then formatted in some weird way. The way it worked was honestly very impressive, but ultimately not sustainable. We can now give typed arguments with the FormatArgument enum. This enum includes Unparsed variant where we can still give strings to parse first and then format. This is how the printf util works.
  • There is a Spec type which represents a % directive with all options included.
  • To fit all our needs, we can parse:
    • Only escape codes
    • Only % directives
    • Both escape codes and % directives
  • The internals of all formatting are opened up, this means that dd does not need to use printf("%g", ...) but can call the Float format directly.
  • Similarly, seq can use the Format type, which accepts only a single directive. With that type, we can parse the whole format string before we print, which simplifies the error handling around it.
  • Naming of cargo features and modules is (in my opinion) much more logical.
  • You might ask why seq had to change so much. The reason is that GNU seq only accepts float % directives and that there is no special path for integers. So, I had to remove all the integer handling from that util. This allowed me to give the correct type (i.e. f64) to the format. As a nice side-effect, seq got much simpler!
  • I also included a bunch of tests for formatting floating point numbers while I was working on that.

I'm marking this as ready because I'm passing all the tests now. I can open issues for things that still need to change. Current limitations:

  • Parsing of numbers can be made more precise to match GNU a bit better. For example, handling parsing the start of a string as number and ignoring the rest.
  • Parsing of \uHHHH and \UHHHHHHHH needs some more work for handling invalid codes.
  • Some error messages can be improved.
  • seq needs to only accept 1 length parameter, while printf can accept multiple, so this needs to be configurable.
  • We might need to implement our own alignment logic.
  • I did not feel like implementing parsing for hexadecimal floats yet, so that is the only test I've ignored for now. This is a (small) regression

@sylvestre
Copy link
Sponsor Contributor

lot of conflicts, sorry!

@tertsdiepraam
Copy link
Member Author

I think most are easy to solve because this is a full rewrite of the functionality. I'll look into it soon.

@tertsdiepraam
Copy link
Member Author

So changing printf required me to change seq quite drastically... This is taking longer than expected.

@uutils uutils deleted a comment from github-actions bot Nov 21, 2023
@tertsdiepraam tertsdiepraam changed the title printf rewrite printf rewrite (with a lot of seq changes) Nov 21, 2023
@uutils uutils deleted a comment from github-actions bot Nov 21, 2023
@uutils uutils deleted a comment from github-actions bot Nov 21, 2023
@uutils uutils deleted a comment from github-actions bot Nov 21, 2023
@uutils uutils deleted a comment from github-actions bot Nov 21, 2023
@@ -293,7 +298,7 @@ fn sub_num_float_e_no_round() {
#[test]
fn sub_num_float_round() {
new_ucmd!()
.args(&["two is %f", "1.9999995"])
.args(&["two is %f", "1.9999996"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cheating ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it wasn't cheating because my printf said:

❯ printf "two is %f" 1.9999995
two is 1,999999

but I just tried with

❯ env printf "two is %f" 1.9999995
two is 2,000000

So yeah I should change that back 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ignored this test for now. I did add a test for 0.9999995 as well so that we at least have a test which checks for rounding (and which does work).

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check for the two potential values ?

This is a limitation of the current implementation, which should ultimately use "long double" precision instead of f64.
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail/inotify-dir-recreate is no longer failing!
GNU test failed: tests/tail/symlink. tests/tail/symlink is passing on 'main'. Maybe you have to rebase?
GNU test error: tests/cp/link-heap. tests/cp/link-heap is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Sponsor Contributor

you fixed an issue with seq that a fuzzer found
current:

$  ./target/debug/coreutils seq 4 -9 4.8
4

yours:

$ seq 4 -9 4.8
[empty]

gnu:

$ seq 4 -9 4.8
[empty]

@tertsdiepraam tertsdiepraam force-pushed the printf-rewrite branch 2 times, most recently from 011f1c6 to 715857c Compare November 22, 2023 13:04
}

#[derive(Clone, Copy, Debug)]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -3,79 +3,9 @@
// For the full copyright and license information, please view the LICENSE
// file that was distributed with this source code.
// spell-checker:ignore extendedbigdecimal extendedbigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// spell-checker:ignore extendedbigdecimal extendedbigint
// spell-checker:ignore extendedbigdecimal

@@ -4,26 +4,20 @@
// file that was distributed with this source code.
// spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse
// spell-checker:ignore (ToDO) extendedbigdecimal numberparse

@uutils uutils deleted a comment from github-actions bot Nov 28, 2023
@sylvestre sylvestre merged commit 14a8e8a into uutils:main Nov 28, 2023
48 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants