reimplement PCRE2_UNREACHABLE() assertions with a slightly safer approach #490

carenas · 2024-09-22T10:27:49Z

Reimplement the original assertions to use different versions for debug and non debug builds.

Non debug asserts use compiler specific builtins that are not safe to invoke, and therefore their use has been audited to paths that should NEVER be executed.

Co-Authored-by: Alex Dowad alexinbeijing@gmail.com

alexdowad · 2024-09-22T11:08:43Z

One comment: the PCRE2_UNREACHABLE() macro was intended to serve 2 different purposes:

In debug builds, it's a debugging tool which can help fuzzers, etc. to find bugs where control flow goes somewhere it shouldn't;
In release builds, it's an optimization hint for the compiler.

I think the implementation in this PR only meets point 1 but not point 2; but if that is not correct, please mention.

carenas · 2024-09-22T11:19:42Z

You are correct, there is no production implementation and therefore there is no optimization hint either (except for DEBUG builds by the mentioned attribute that is expected from abort(), but that of course is implementation dependent).

Interestingly enough from all the places where it was used, the only place where it would make a difference was the function fixed in #489, and the difference wouldn't be good for our users.

If the assertion was ever reached, and their library was built with a recent enough compiler in an x86 system, they will get a crash, instead of getting an error, because as you clearly pointed out, the "return" call itself would be removed at compile time; was that behaviour what was expected from that "optimization"?

alexdowad · 2024-09-22T11:43:41Z

If the assertion was ever reached, and their library was built with a recent enough compiler in an x86 system, they will get a crash, instead of getting an error, because as you clearly pointed out, the "return" call itself would be removed at compile time; was that behaviour what was expected from that "optimization"?

In answer to that question, what was expected was that if we assert a certain path is unreachable, that path should really be unreachable, and because the compiler is told that is unreachable, it can (in some cases) generate better code. For example, if you tell the compiler that the default: label in a switch statement is unreachable, it can assume that there will always be a matching label and doesn't need to generate code to handle the "default" case.

If the unreachable path is really unreachable, then there will be no crashes in production (because it will never be reached). On the other hand, if we are not able to guarantee, through whatever method, that our "unreachable" code is really unreachable... that is a problem. 😦 Since PCRE2 has significant installed base, it would be nice if the codebase was at a high enough level of quality that we could have full confidence that the "unreachable" paths are really unreachable. In that case, they could safely benefit from __builtin_unreachable's optimization hint.

However, if the codebase is not at that level of quality yet, then you are very right that PCRE2_UNREACHABLE must not do anything unsafe in production builds.

In any case, thanks very much for looking into this issue.

Another thought: It would be nice if, in production builds, all instances of PCRE2_ASSERT and PCRE2_UNREACHABLE would make public API functions return PCRE2_INTERNAL_ERROR if an assertion fails. That could even mitigate possible vulnerabilities by turning them into error returns instead. However, it would probably require some trickery with longjmp, which seems inadvisable.

alexdowad · 2024-09-22T11:45:11Z

Just one suggestion: I think the assertion failure message in this PR would be easier to understand if it said something like "Execution must not reach this point" or "Execution reached unexpected point".

zherczeg · 2024-09-23T03:34:39Z

Reaching an unreachable code is always a big problem, since you cannot write tests to see what happens. Hence whatever we do, we probably have issues. I don't mind unreachable in matching code, because its purpose is high performance. For pattern compilation, we could use errors, although the problem is the same: we cannot test them.

carenas · 2024-09-23T04:24:21Z

I would prefer no unreachables in production, as they are additional code, their behaviour changes from compiler to compiler, and even between versions of the same compiler and have bugs, but agree that since they are meant to be really unreachable (assuming the original selection was done carefully enough) they could be used to maybe improve performance.

Anyway, as requested by Alex updated the series to also modify behaviour in non debug builds and cover both use cases and updated the PR subject to reflect that.

zherczeg · 2024-09-23T04:26:36Z

Btw I would prefer an assertion before all internal errors. When I develop code, and get those errors, it is too hard to find their sources.

.github/workflows/build.yml

src/pcre2_util.h

alexdowad · 2024-09-23T05:19:10Z

agree that since they are meant to be really unreachable (assuming the original selection was done carefully enough) they could be used to maybe improve performance.

Carlo, how would this be:

We agree that an UNREACHABLE assertion should only be used when 1) the code has been carefully audited to make sure that the "unreachable" path is really unreachable, and 2) the code in question is heavily tested by the existing test suite.
UNREACHABLE assertions compile to an optimization hint in release builds, a failing assertion (i.e. abort) in debug builds.
For now, all calls to PCRE2_UNREACHABLE can be converted to a comment saying /* We should not get here */ and PCRE2_ERROR_INTERNAL return. I will go through one by one and check each site carefully before converting it back to an assertion.

Just a suggestion.

alexdowad · 2024-09-23T05:21:11Z

Btw I would prefer an assertion before all internal errors. When I develop code, and get those errors, it is too hard to find their sources.

Suggestion: Aside from PCRE2_ASSERT, can we add another macro which expands to a failing assertion (abort) in debug builds, but return PCRE2_ERROR_INTERNAL; in release builds?

This macro must only be used in functions which return int status code (zero for success, non-zero for error code).

src/pcre2_util.h

carenas · 2024-09-23T06:13:08Z

can we add another macro

sure, fork away and go ahead, I promise not to squash this until all discussion is settled and review your code to suggest improvements.

note though that having different assertion per type of function is going to be messy, since each function has different ways to report those errors and even some have none, so IMHO might be easier to open code each as needed using PCRE2_ASSERT()

zherczeg · 2024-09-23T08:18:13Z

src/pcre2_util.h

 #else
-#define PCRE2_UNREACHABLE() do {} while(0)
+#warning "missing implementation of PCRE2_UNREACHABLE()"


This looks like a strong requirement

I know and I didn't like it either, but how else can we let people know that their compiler might silently continue running code after reaching something labeled as PCRE2_UNREACHABLE().

This whole funny business with #489 where we could reasonable think it was ok to delete the fail safe code twice in a month, makes me think this is too confusing and needs at least a stronger "heads up".

BTW removed it from the latest revisions, because it was too painful to have it when building on anything outside the 3 compilers we have implementations for, but that means we also have to be extra careful in the opposite direction, because something like:

case THIS: do_good(); PCRE2_UNREACHABLE(); case THAT: do_bad();

is perfectly valid code and will build cleanly on those 3 compilers we have in CI, but result in a dangerous fall through when built by a different compiler.

Carlo, just a thought... Aside from gcc, clang, and MS Visual C++, do you know any other C compilers which are commonly used to build PCRE2? If so, I can do a bit of research and see if they have a similar built-in.

I understand that even so, the issue will still remain that some users, somewhere may use a compiler for which PCRE2_UNREACHABLE is not implemented. But it wouldn't be bad to have implementations for as many compilers as possible.

do you know any other C compilers which are commonly used to build PCRE2?

Oracle's (AKA Sun) Studio, the latest release doesn't have this (with the C++ version implemented as a NOP internally, so it will even be detected and used but still behave as erratically as described above). IBM's xlc and DEC's for the VMS are also likely common.

The important point though is not that we "could" add support for them, but the fact that we are opening ourselves to issues like this by using this builtins() that are specifically described as triggering "undefined behaviour" and so will need to be more careful going forward on our code review (I guess) or have to invent something else to protect us from that.

carenas · 2024-09-23T16:50:31Z

We agree that an UNREACHABLE assertion should only be used when 1) the code has been carefully audited to make sure that the "unreachable" path is really unreachable, and 2) the code in question is heavily tested by the existing test suite.

The part we seem to be having difficulty getting to an understanding is that there are cases were some code that would be normally unreachable and might be at a specific point when audited, could be still affected by future changes and in those cases, removing apparent dead code that was there to be a safety net for those cases (either explicitly or even worst, implicitly through dead code elimination by a compiler) is not a good idea and need to be handled differently (at least in non debug builds)

UNREACHABLE assertions compile to an optimization hint in release builds, a failing assertion (i.e. abort) in debug builds.

this was implemented in the last revision, but with the caveat that we need the exception from the previous point in at least one case (again only needed for the non debug builds, although it could also be applied there for consistency)

For now, all calls to PCRE2_UNREACHABLE can be converted to a comment saying /* We should not get here */ and PCRE2_ERROR_INTERNAL return. I will go through one by one and check each site carefully before converting it back to an assertion.

returning PCRE2_ERROR_INTERNAL might not be what is required on all places were this assert was introduced originally, it would also seem like a lot of work for something that could be done without the intermediate step and accomplished at the end of this series if you just did the promised audit for each entry over it, or are you implying that none of those asserts were audited previously and are therefore unsafe?

as stated in the subject, my end objective is not to remove all those assertions but just make sure that it is clear how to use them going forward (even by changing their names so there is no longer confusion) and their use is safe from unexpected side effects in all the places were they are currently being used.

src/pcre2_study.c

src/pcre2_auto_possess.c

alexdowad · 2024-09-23T21:18:56Z

IMHO might be easier to open code each as needed using PCRE2_ASSERT()

Sure, let's do that... it was just an idea.

alexdowad · 2024-09-23T21:39:10Z

2 occurrences of PCRE2_UNREACHABLE in pcre2grep.c are safe. Same for pcre2test.c.

Recommend that PCRE2_MAYBE_UNREACHABLE should be used instead in pcre2_study.c; this one should be safe, but the function in question is long, complicated, and it's hard to see at a glance everything that is happening there. Likewise for pcre2_auto_possess.c.

Some occurrences in pcre2_compile.c are definitely safe. But as Zoltan mentioned in this thread, compilation is less performance-sensitive than matching... so maybe it's better to use PCRE2_MAYBE_UNREACHABLE anyways.

I do think that the name PCRE2_MAYBE_UNREACHABLE will confuse people who are reading the code for the first time. The natural question which will come to a reader's mind is: "This code may be unreachable? What does that mean?"

Other options include: PCRE2_ASSERT_UNREACHABLE, PCRE2_UNREACHABLE_SAFE, PCRE2_DEBUG_UNREACHABLE.

carenas · 2024-09-23T21:59:11Z

I do think that the name PCRE2_MAYBE_UNREACHABLE will confuse people who are reading the code for the first time

Fair enough, then will post a v4 that would use PCRE2_DEBUG_UNREACHABLE instead as it was the one you proposed and that I suggested as well before but discarded.

FWIW as per the "audit", I was expecting commits not comments, which might be better to hold until that update is pushed, so no unnecessary conflicts are created; presume you are ok with my "POC" audit in patch 4?, your comment would seem to imply you agree but it will be easier if the comment would be in the code instead; will hold othewise pushing an update if you would rather see changes so I can include them with it.

alexdowad · 2024-09-23T22:54:41Z

FWIW as per the "audit", I was expecting commits not comments, which might be better to hold until that update is pushed, so no unnecessary conflicts are created; presume you are ok with my "POC" audit in patch 4?

Sure, I will provide some commits then. Yes, patch 4 looks good.

zherczeg

LGTM. I usually prefer one patch.

zherczeg · 2024-09-25T16:06:59Z

Very nice patch! Shall we land it?

carenas · 2024-09-25T16:09:05Z

Will mark as ready, as soon as I complete testing

src/pcre2_substring.c

src/pcre2_study.c

src/pcre2_compile.c

src/pcre2_match.c

src/pcre2_compile.c

carenas · 2024-09-25T18:32:36Z

src/pcre2_auto_possess.c

@@ -456,6 +456,7 @@ switch(c)
       {
       /* Early return if there is not enough space. This should never
       happen, since all clists are shorter than 5 character now. */
+       PCRE2_DEBUG_UNREACHABLE();


this needs more than just an assert based on a comment IMHO, and changes not only here.

the data this is using is generated from the Unicode tables by maint/GenerateUcd.py so the comment is not completely correct and could change when Unicode is updated.

not sure what the "correct" thing to do should be but it at least includes:

GenerateUcd.py is made aware of this limit and either prints a warning if exceeded or refuses to create the table with a suitable sugestion on what to do next.

The limit is increased or a test is added showing what the behaviour is and this "stops" being "dead code"

in any case, included a little more information that could be used to start a fix for this and that could be also used IMHO as a good example on how to use these macros.

I see that Carlo has added a MAX_LIST variable in GenerateUcd.py, though it doesn't seem to be used yet in that script.

Would adding a check around line 851 that throws an exception if len(s) > (MAX_LIST - 3) be a good idea? Then anyone who is regenerating the property tables will notice if this is a problem.

The reason for comparing len(s) to MAX_LIST - 3 is because two slots in list are used for other purposes, and it needs to be terminated with the special value NOTACHAR, so the number of slots available for codepoints are MAX_LIST - 3.

If this ever happens, simply increasing the value of MAX_LIST in pcre2_auto_possess.c and GenerateUcd.py is all that is needed... then the stack-allocated arrays used to store those values will get bigger and everything will be OK.

I'm just thinking more about this... If we adjust GenerateUcd.py so that it crashes if ever ucd_caseless_sets are too big... do we still need the fail-safe in pcre2_auto_possess.c? Would it be enough to simply assert PCRE2_ASSERT(clist_dest < list + MAX_LIST)?

This is not simply a technical issue, but more of a "project policy" issue.

carenas · 2024-09-25T19:52:45Z

This whole time, there was (apparently) a bug in pcre2_convert() and fixing it simplifies the code and has been squashed with the rest as it is simple enough.

carenas · 2024-09-26T05:05:55Z

@alexdowad: do you have any comments on the additional points raised?, I think it would be nice to at least document them a little better if you want to keep them in this PR; otherwise I am tempted to (as you suggested) remove them and then you could pick them up and reintroduce them one by one after more careful consideration.

FWIW, I am trusting all the original ones (and maybe some of the new ones that got lost in the sea of changes) were audited and are safe, otherwise.

alexdowad · 2024-09-26T05:45:44Z

@alexdowad: do you have any comments on the additional points raised?, I think it would be nice to at least document them a little better if you want to keep them in this PR; otherwise I am tempted to (as you suggested) remove them and then you could pick them up and reintroduce them one by one after more careful consideration.

I'll be happy to go through and add more comments. Apologies, but my life has become a bit busy in the last couple of days, so it will be appreciated if you can allow 48 hours or so.

FWIW, I am trusting all the original ones (and maybe some of the new ones that got lost in the sea of changes) were audited and are safe, otherwise.

I would be happy to make a full pass through again and double-check all occurrences of PCRE2_{,DEBUG_}UNREACHABLE.

carenas · 2024-09-26T06:02:29Z

Apologies, but my life has become a bit busy in the last couple of days

Sorry, there is no need to worry or apologize about that, we are volunteers anyway and you can take as much time as needed.

alexdowad · 2024-09-27T13:03:39Z

Opened a PR improving comments in this PR a bit more; see PR 3 in carenas/debugass.

alexdowad · 2024-09-27T13:20:06Z

I have read through the entire PR diff again. Aside from a couple small improvements to comments, I don't personally see anything else to improve right now.

I think the rationale for DEBUG_UNREACHABLE at sites which return PCRE2_ERROR_INTERNAL is clear; this error should never really occur unless the library is buggy, and failing a debug assertion in such cases will help fuzzers, etc. to discover such cases.

Aside from PCRE2_ERROR_INTERNAL, some non-public, internal-only functions use other values to indicate "can't happen" conditions. For exactly the same reasons, it makes sense to use DEBUG_UNREACHABLE in such cases.

There a number of other UNREACHABLE and DEBUG_UNREACHABLE assertions in this PR which are not specifically commented. For most of them, the function has obviously been designed to exit from inside an "infinite loop", and if ever the statement after that loop is reached, then something which the author did not anticipate must have happened. Is there anything else in particular which we should say in the comments? I think Control flow should not reach here is fairly clear.

alexdowad · 2024-09-27T13:21:39Z

Ah, one more thing... should I fuzz this PR for a while to see if any of the assertions are triggered?

The original asserts weren't very useful in debug mode as they were lacking information on where they were being triggered and were also unreliable and dangerous as they could result in important code being removed and trigger crashes (even in non debug mode). Instead of implementing one generic assert for both modes, build a more useful one for each one, so PCRE2_UNREACHABLE() could be also used in non debug builds to help with optimization. Reinstate all original assertions to use the new versions, which will have the sideeffect of fixing indentation issues introduced in the original, and include additional asserts that were provided as the original ones were being audited for safety. Note that during such audit the use of the original asserts might had been refactored so it also includes all those relevant code changes. While at it, update cmake and CI to help with testing as well as other documentation. Co-authored-by: Alex Dowad <alexinbeijing@gmail.com>

carenas mentioned this pull request Sep 22, 2024

Implement PCRE2_UNREACHABLE assertion for MS Visual C++ #465

Merged

carenas force-pushed the debugass branch from 91961b3 to cb04c63 Compare September 23, 2024 04:05

carenas changed the title ~~reimplement assertions as a debug helper~~ reimplement assertions with a safer approach Sep 23, 2024

zherczeg reviewed Sep 23, 2024

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

carenas commented Sep 23, 2024

View reviewed changes

src/pcre2_util.h Outdated Show resolved Hide resolved

alexdowad reviewed Sep 23, 2024

View reviewed changes

src/pcre2_util.h Outdated Show resolved Hide resolved

carenas changed the title ~~reimplement assertions with a safer approach~~ reimplement PCRE2_UNREACHABLE() assertions with a safer approach Sep 23, 2024

zherczeg reviewed Sep 23, 2024

View reviewed changes

carenas force-pushed the debugass branch from cb04c63 to f32f8ba Compare September 23, 2024 20:02

carenas requested review from zherczeg and alexdowad September 23, 2024 20:15

carenas commented Sep 23, 2024

View reviewed changes

src/pcre2_study.c Outdated Show resolved Hide resolved

carenas commented Sep 23, 2024

View reviewed changes

src/pcre2_auto_possess.c Outdated Show resolved Hide resolved

carenas force-pushed the debugass branch from f32f8ba to f297ce3 Compare September 24, 2024 00:20

zherczeg approved these changes Sep 24, 2024

View reviewed changes