`printf` rewrite (with a lot of `seq` changes) #5128

tertsdiepraam · 2023-08-02T22:08:42Z

The current printf implementation is -- with all due respect to the original author -- a mess. So I'm rewriting it completely to be smaller and more readable.

Here's what changed:

Well, everything, but let me be more precise.
It started with a complete rewrite of the parsing of the % directives, which is now much more concise. I think it's also more efficient, but I haven't tested that.
It is no longer necessary to give all arguments as strings, which are then formatted in some weird way. The way it worked was honestly very impressive, but ultimately not sustainable. We can now give typed arguments with the FormatArgument enum. This enum includes Unparsed variant where we can still give strings to parse first and then format. This is how the printf util works.
There is a Spec type which represents a % directive with all options included.
To fit all our needs, we can parse:
- Only escape codes
- Only % directives
- Both escape codes and % directives
The internals of all formatting are opened up, this means that dd does not need to use printf("%g", ...) but can call the Float format directly.
Similarly, seq can use the Format type, which accepts only a single directive. With that type, we can parse the whole format string before we print, which simplifies the error handling around it.
Naming of cargo features and modules is (in my opinion) much more logical.
You might ask why seq had to change so much. The reason is that GNU seq only accepts float % directives and that there is no special path for integers. So, I had to remove all the integer handling from that util. This allowed me to give the correct type (i.e. f64) to the format. As a nice side-effect, seq got much simpler!
I also included a bunch of tests for formatting floating point numbers while I was working on that.

I'm marking this as ready because I'm passing all the tests now. I can open issues for things that still need to change. Current limitations:

Parsing of numbers can be made more precise to match GNU a bit better. For example, handling parsing the start of a string as number and ignoring the rest.
Parsing of \uHHHH and \UHHHHHHHH needs some more work for handling invalid codes.
Some error messages can be improved.
seq needs to only accept 1 length parameter, while printf can accept multiple, so this needs to be configurable.
We might need to implement our own alignment logic.
I did not feel like implementing parsing for hexadecimal floats yet, so that is the only test I've ignored for now. This is a (small) regression

sylvestre · 2023-09-24T09:03:02Z

lot of conflicts, sorry!

tertsdiepraam · 2023-09-24T17:31:02Z

I think most are easy to solve because this is a full rewrite of the functionality. I'll look into it soon.

…nto printf-rewrite

tertsdiepraam · 2023-11-13T16:38:25Z

So changing printf required me to change seq quite drastically... This is taking longer than expected.

This can be un-ignored when it is implemented

cakebaker · 2023-11-21T14:23:54Z

tests/by-util/test_printf.rs

@@ -293,7 +298,7 @@ fn sub_num_float_e_no_round() {
 #[test]
 fn sub_num_float_round() {
    new_ucmd!()
-        .args(&["two is %f", "1.9999995"])
+        .args(&["two is %f", "1.9999996"])


That's cheating ;-)

I thought it wasn't cheating because my printf said:

❯ printf "two is %f" 1.9999995 two is 1,999999

but I just tried with

❯ env printf "two is %f" 1.9999995 two is 2,000000

So yeah I should change that back 😄

I've ignored this test for now. I did add a test for 0.9999995 as well so that we at least have a test which checks for rounding (and which does work).

maybe check for the two potential values ?

This is a limitation of the current implementation, which should ultimately use "long double" precision instead of f64.

…nto printf-rewrite

github-actions · 2023-11-21T16:29:05Z

GNU testsuite comparison:

Congrats! The gnu test tests/tail/inotify-dir-recreate is no longer failing!
GNU test failed: tests/tail/symlink. tests/tail/symlink is passing on 'main'. Maybe you have to rebase?
GNU test error: tests/cp/link-heap. tests/cp/link-heap is passing on 'main'. Maybe you have to rebase?

sylvestre · 2023-11-21T20:44:20Z

you fixed an issue with seq that a fuzzer found
current:

$  ./target/debug/coreutils seq 4 -9 4.8
4

yours:

$ seq 4 -9 4.8
[empty]

gnu:

$ seq 4 -9 4.8
[empty]

src/uucore/src/lib/features/format/mod.rs

src/uucore/src/lib/features/format/spec.rs

cakebaker · 2023-11-25T15:04:46Z

src/uucore/src/lib/features/format/num_format.rs

+}
+
+#[derive(Clone, Copy, Debug)]
+


Suggested change

src/uu/printf/src/printf.rs

cakebaker · 2023-11-26T14:10:32Z

src/uu/seq/src/number.rs

@@ -3,79 +3,9 @@
 // For the full copyright and license information, please view the LICENSE
 // file that was distributed with this source code.
 // spell-checker:ignore extendedbigdecimal extendedbigint


Suggested change

// spell-checker:ignore extendedbigdecimal extendedbigint

// spell-checker:ignore extendedbigdecimal

cakebaker · 2023-11-26T14:26:19Z

src/uu/seq/src/seq.rs

@@ -4,26 +4,20 @@
 // file that was distributed with this source code.
 // spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse


Suggested change

// spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse

// spell-checker:ignore (ToDO) extendedbigdecimal numberparse

…t matching on result

tertsdiepraam added 3 commits August 2, 2023 23:57

uucore: start work on a completely new printf implementation

a3e68d5

dd, printf, seq: update to new printf

66eb64e

some more work on printf spec

407bccc

tertsdiepraam added 8 commits October 28, 2023 16:35

Merge branch 'main' into printf-rewrite

2881090

printf rewrite: fix compilation

69b7095

printf rewrite: fix compilation

f117fc1

Merge branch 'printf-rewrite' of github.com:tertsdiepraam/coreutils i…

bdfe5f1

…nto printf-rewrite

printf: move number formatting to separate module

198f7c7

uucore/format: move types for num_format

39c6758

dd: use num_format::Float directly instead of printf

ee0e2c0

uucore/format: implement single specifier formats

6481d63

seq: simplify and use new printf implementation

e7d58f6

tertsdiepraam force-pushed the printf-rewrite branch from af492c7 to e7d58f6 Compare November 16, 2023 13:30

tertsdiepraam added 14 commits November 16, 2023 17:00

printf: parse arguments and handle escape codes

eaf5006

printf: more flexible parsing of unparsed arguments

a45ff8c

printf: implement %b

cd0c24a

printf: accept multiple length parameters

f83e0d1

printf: support precision for integers

f3da081

uucore/format: fix doctests

76eca8d

printf: exit correctly on \c

4aafb3f

printf: fix and test float formatting

955640a

printf: add emoji character test

fef84f7

printf: ignore hexadecimal floats test

ce18e0a

This can be un-ignored when it is implemented

printf: fix negative hex argument parsing

5f2374b

printf: allow precision in string

c43ee01

printf: coerce missing and invalid arguments to 0

066d8ba

printf: basic support for unicode escape sequences

68d036c

uutils deleted a comment from github-actions bot Nov 21, 2023

tertsdiepraam changed the title ~~printf rewrite~~ printf rewrite (with a lot of seq changes) Nov 21, 2023

uutils deleted a comment from github-actions bot Nov 21, 2023

Merge branch 'main' into printf-rewrite

07aaf61

uutils deleted a comment from github-actions bot Nov 21, 2023

cakebaker reviewed Nov 21, 2023

View reviewed changes

tertsdiepraam added 2 commits November 21, 2023 16:49

test/printf: ignoring rounding up to 2

0822511

This is a limitation of the current implementation, which should ultimately use "long double" precision instead of f64.

Merge branch 'printf-rewrite' of github.com:tertsdiepraam/coreutils i…

4b9fca8

…nto printf-rewrite

cakebaker reviewed Nov 22, 2023

View reviewed changes

src/uucore/src/lib/features/format/mod.rs Outdated Show resolved Hide resolved

cakebaker reviewed Nov 22, 2023

View reviewed changes

src/uucore/src/lib/features/format/mod.rs Outdated Show resolved Hide resolved

cakebaker reviewed Nov 22, 2023

View reviewed changes

src/uucore/src/lib/features/format/spec.rs Outdated Show resolved Hide resolved

tertsdiepraam force-pushed the printf-rewrite branch 2 times, most recently from 011f1c6 to 715857c Compare November 22, 2023 13:04

uucore/format: fix license headers and improve docs

e95add7

tertsdiepraam force-pushed the printf-rewrite branch from 715857c to e95add7 Compare November 22, 2023 13:06

uutils deleted a comment from github-actions bot Nov 22, 2023

cakebaker reviewed Nov 25, 2023

View reviewed changes

src/uucore/src/lib/features/format/num_format.rs Outdated

}

#[derive(Clone, Copy, Debug)]

Copy link

Contributor

cakebaker Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

cakebaker reviewed Nov 25, 2023

View reviewed changes

src/uu/printf/src/printf.rs Show resolved Hide resolved

cakebaker reviewed Nov 26, 2023

View reviewed changes

printf: remove whitespace, remove redundant spelling ignore and rever…

8eb66ab

…t matching on result

tertsdiepraam force-pushed the printf-rewrite branch from 3c86940 to 8eb66ab Compare November 27, 2023 10:53

uutils deleted a comment from github-actions bot Nov 28, 2023

sylvestre merged commit 14a8e8a into uutils:main Nov 28, 2023
48 of 53 checks passed

tertsdiepraam mentioned this pull request Feb 7, 2024

printf: %a not supported (but partially implemented) #2776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`printf` rewrite (with a lot of `seq` changes) #5128

`printf` rewrite (with a lot of `seq` changes) #5128

tertsdiepraam commented Aug 2, 2023 •

edited

Loading

sylvestre commented Sep 24, 2023

tertsdiepraam commented Sep 24, 2023

tertsdiepraam commented Nov 13, 2023

cakebaker Nov 21, 2023

tertsdiepraam Nov 21, 2023

tertsdiepraam Nov 21, 2023

sylvestre Nov 21, 2023

github-actions bot commented Nov 21, 2023

sylvestre commented Nov 21, 2023

cakebaker Nov 25, 2023

cakebaker Nov 26, 2023

cakebaker Nov 26, 2023

	// spell-checker:ignore extendedbigdecimal extendedbigint
	// spell-checker:ignore extendedbigdecimal

		@@ -4,26 +4,20 @@
		// file that was distributed with this source code.
		// spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse

	// spell-checker:ignore (ToDO) istr chiter argptr ilen extendedbigdecimal extendedbigint numberparse
	// spell-checker:ignore (ToDO) extendedbigdecimal numberparse

printf rewrite (with a lot of seq changes) #5128

printf rewrite (with a lot of seq changes) #5128

Conversation

tertsdiepraam commented Aug 2, 2023 • edited Loading

sylvestre commented Sep 24, 2023

tertsdiepraam commented Sep 24, 2023

tertsdiepraam commented Nov 13, 2023

cakebaker Nov 21, 2023

Choose a reason for hiding this comment

tertsdiepraam Nov 21, 2023

Choose a reason for hiding this comment

tertsdiepraam Nov 21, 2023

Choose a reason for hiding this comment

sylvestre Nov 21, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 21, 2023

sylvestre commented Nov 21, 2023

cakebaker Nov 25, 2023

Choose a reason for hiding this comment

cakebaker Nov 26, 2023

Choose a reason for hiding this comment

cakebaker Nov 26, 2023

Choose a reason for hiding this comment

`printf` rewrite (with a lot of `seq` changes) #5128

`printf` rewrite (with a lot of `seq` changes) #5128

tertsdiepraam commented Aug 2, 2023 •

edited

Loading