Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix footnotes: nested, confused for link refs & mangled by the autolinker #229

Merged
merged 14 commits into from
Sep 16, 2021

Conversation

phillmv
Copy link
Member

@phillmv phillmv commented Aug 23, 2021

Hello!

This PR closes #121 and helps push https://github.com/github/coding/issues/2251 forward. It supersedes #227.

This PR fixes three different bug fixes in one relatively easy to read PR, combining two tiny changes with a slightly larger one:

I feel fairly confident functionality wise that this does the trick, but I invite more experienced C hands to tell me how I could improve or better allocate memory 😄.

This PR is a companion to/blocks/see also #230

Cheers,

When two footnote references are adjacent, the handle_close_bracket
function will first try to match the closing bracket to a link
reference. Now we reset the subject's state, so that the parser
correctly picks up both footnote references.
…ark_nodes.

Sometimes, the autolinker will go ahead and greedily split input into
multiple text nodes in the hopes of matching a hyperlink. This broke
footnotes, which expected a singular node. Instead of relying on the
tokenizing to have worked perfectly, when handling footnote references
we now simply insert the reference based on the closing bracket and
ignore and delete any existing and superfluous nodes.
@@ -468,7 +468,6 @@ static void process_footnotes(cmark_parser *parser) {
while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
cur = cmark_iter_get_node(iter);
if (ev_type == CMARK_EVENT_EXIT && cur->type == CMARK_NODE_FOOTNOTE_DEFINITION) {
cmark_node_unlink(cur);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file support allowing nested foornotes.

We accomplish this by simply changing when we unlink an unused footnote node.

Whereas before, we iterate thru the parser and delete all footnote definitions as we see them

Now we only unlink the footnotes definitions after we've scanned them for more references.

// advanced the current state beyond our footnote's actual closing
// bracket, ie if it went looking for a `link_label`.
// Let's just rewind the subject's position:
subj->pos = initial_pos;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one liner fixes a single bug and is best understood in isolation with its test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single bug

For reference, that bug (and a similar fix) had been previously identified in #139.

src/inlines.c Outdated Show resolved Hide resolved
src/blocks.c Show resolved Hide resolved
src/inlines.c Show resolved Hide resolved
src/inlines.c Outdated Show resolved Hide resolved
@philipturnbull
Copy link

I ran the fuzzing harness, test/cmark-fuzz.c and it looks like there's a use-after-free when processing footnotes. I collected a few different crashes and they all trigger in process_emphasis. For example, an input like:

$ hexdump -C github/cmark-gfm/cmark_fuzzer/03c30ed80c1cd8b58ddc91481798ba544d22871b/3c52bca311ed83bf32d8641d0ab014c7ce1fb083/minimized_test_case
00000000  54 66 5b 5e 01 27 6e 6f  5b 5e 27 5d              |Tf[^.'no[^']|
0000000c

gives:

/work/work842279636/cmark_fuzzer -timeout_exitcode=77 -error_exitcode=78 -rss_limit_mb=1024 -max_len=256 -timeout=1 /work/corpus910357798 -exact_artifact_path=/work/libfuzzer609519651/testcase -dict=/work/work842279636/cmark_fuzzer.dict
Dictionary: 64 entries
INFO: Seed: 3999492952
INFO: Loaded 1 modules   (9519 guards): 9519 [0x9c4600, 0x9cdabc), 
INFO:       49 files found in /work/corpus910357798
INFO: seed corpus: files: 49 min: 5b max: 566b total: 5570b rss: 31Mb
Loaded 1024/1093 files from /work/corpus910357798
=================================================================
==17==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e000257dbc at pc 0x0000005c8b40 bp 0x7ffe2ceb01d0 sp 0x7ffe2ceb01c8
READ of size 4 at 0x60e000257dbc thread T0
SCARINESS: 45 (4-byte-read-heap-use-after-free)
    #0 0x5c8b3f in cmark_chunk_free /src/octofuzz/src/./chunk.h:21:10
    #1 0x5c8b3f in process_emphasis /src/octofuzz/src/inlines.c:691
    #2 0x5c0c8c in handle_close_bracket /src/octofuzz/src/inlines.c:1203:7
    #3 0x5c0c8c in parse_inline /src/octofuzz/src/inlines.c:1397
    #4 0x5c0c8c in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #5 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #6 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #7 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #8 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #9 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #10 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #11 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #12 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #13 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #14 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #15 0x7f99fa83283f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
    #16 0x41c058 in _start (/work/work842279636/cmark_fuzzer+0x41c058)
0x60e000257dbc is located 124 bytes inside of 152-byte region [0x60e000257d40,0x60e000257dd8)
freed by thread T0 here:
    #0 0x4dcf18 in __interceptor_cfree.localalias.0 /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:76
    #1 0x59f6c7 in S_free_nodes /src/octofuzz/src/node.c:165:5
    #2 0x59f6c7 in cmark_node_free /src/octofuzz/src/node.c:173
    #3 0x5c0c32 in handle_close_bracket /src/octofuzz/src/inlines.c:1198:9
    #4 0x5c0c32 in parse_inline /src/octofuzz/src/inlines.c:1397
    #5 0x5c0c32 in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #6 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #7 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #8 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #9 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #10 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #11 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #12 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #13 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #14 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #15 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #16 0x7f99fa83283f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

previously allocated by thread T0 here:
    #0 0x4dd2f0 in calloc /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:97
    #1 0x59e513 in xcalloc /src/octofuzz/src/cmark.c:18:15
    #2 0x5bca49 in make_literal /src/octofuzz/src/inlines.c:80:33
    #3 0x5bca49 in handle_delim /src/octofuzz/src/inlines.c:548
    #4 0x5bca49 in parse_inline /src/octofuzz/src/inlines.c:1383
    #5 0x5bca49 in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #6 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #7 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #8 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #9 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #10 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #11 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #12 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #13 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #14 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #15 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #16 0x7f99fa83283f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

SUMMARY: AddressSanitizer: heap-use-after-free /src/octofuzz/src/./chunk.h:21:10 in cmark_chunk_free
Shadow bytes around the buggy address:
  0x0c1c80042f60: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
  0x0c1c80042f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1c80042f80: 00 00 00 fa fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1c80042f90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c1c80042fa0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c1c80042fb0: fd fd fd fd fd fd fd[fd]fd fd fd fa fa fa fa fa
  0x0c1c80042fc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c80042fd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c80042fe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c80042ff0: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1c80043000: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==17==ABORTING
MS: 3 ChangeByte-CopyPart-PersAutoDict- DE: "freedom planet 2"-; base unit: 1351aa79e45b28e5d15d0bc58ee55443269fedfd
0x54,0x66,0x72,0x65,0x65,0x64,0x6f,0x6d,0x20,0x70,0x6c,0x61,0x6e,0x65,0x74,0x20,0x32,0x2e,0x20,0x49,0x74,0x20,0x68,0x61,0x73,0x20,0x61,0x20,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x2e,0x5b,0x5e,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x5d,0xa,0xa,0x5b,0x5e,0x61,0x6e,0x6f,0x74,0x68,0x65,0x72,0x2d,0x63,0x69,0x74,0x61,0x74,0x69,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x2e,0xa,0xa,0x5b,0x5e,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6f,0x6e,0x5d,0x3a,0x20,0x4d,0x79,0x20,0x73,0x65,0x63,0x6f,0x6e,0x64,0x20,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x2e,0xa,0xa,0x5b,0x5e,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x5d,0x3a,0x20,0x54,0x68,0x69,0x73,0x20,0x69,0x73,0x20,0x61,0x20,0x6c,0x6f,0x6e,0x67,0x20,0x77,0x69,0x6e,0x64,0x65,0x64,0x20,0x70,0x61,0x72,0x61,0x70,0x67,0x72,0x61,0x70,0x68,0x20,0x74,0x68,0x61,0x74,0x20,0x61,0x6c,0x73,0x6f,0x20,0x68,0x61,0x73,0x20,0x61,0x6e,0x6f,0x74,0x68,0x65,0x72,0x20,0x63,0x69,0x74,0x61,0x74,0x69,0x6f,0x6e,0x2e,0x5b,0x5e,0x61,0x6e,0x6f,0x74,0x68,0x65,0x72,0x2d,0x63,0x69,0x74,0x61,0x74,0x27,0x6f,0x6e,0x5d,0xa,
Tfreedom planet 2. It has a citation.[^citation]\x0a\x0a[^another-citatiitation.\x0a\x0a[^citatioon]: My second citation.\x0a\x0a[^citation]: This is a long winded parapgraph that also has another citation.[^another-citat'on]\x0a
artifact_prefix='./'; Test unit written to /work/libfuzzer609519651/testcase
Base64: VGZyZWVkb20gcGxhbmV0IDIuIEl0IGhhcyBhIGNpdGF0aW9uLlteY2l0YXRpb25dCgpbXmFub3RoZXItY2l0YXRpaXRhdGlvbi4KClteY2l0YXRpb29uXTogTXkgc2Vjb25kIGNpdGF0aW9uLgoKW15jaXRhdGlvbl06IFRoaXMgaXMgYSBsb25nIHdpbmRlZCBwYXJhcGdyYXBoIHRoYXQgYWxzbyBoYXMgYW5vdGhlciBjaXRhdGlvbi5bXmFub3RoZXItY2l0YXQnb25dCg==

which corresponds to closer->inl_text->as.literal in:

cmark-gfm/src/inlines.c

Lines 690 to 691 in 993e869

} else if (closer->delim_char == '\'') {
cmark_chunk_free(subj->mem, &closer->inl_text->as.literal);


Similarly, an input like:

$ hexdump -C github/cmark-gfm/cmark_fuzzer/1253ff32ca7acf033392eb4721b44148c3e7aa07/0fba63ca46ff5e2fbab000345cecf182b87d9bd9/minimized_test_case
00000000  54 6f 74 82 bb e3 0a 0a  5b 5e 5f 61 5f 5d        |Tot.....[^_a_]|
0000000e

gives:

/work/work202675120/cmark_fuzzer -timeout_exitcode=77 -error_exitcode=78 -rss_limit_mb=1024 -max_len=256 -timeout=1 /work/corpus403151970 -exact_artifact_path=/work/libfuzzer395857231/testcase -dict=/work/work202675120/cmark_fuzzer.dict
Dictionary: 64 entries
INFO: Seed: 25948553
INFO: Loaded 1 modules   (9519 guards): 9519 [0x9c4600, 0x9cdabc), 
INFO:       49 files found in /work/corpus403151970
INFO: seed corpus: files: 49 min: 5b max: 566b total: 5570b rss: 31Mb
=================================================================
==18==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000085b8 at pc 0x0000005c8b45 bp 0x7ffd5322c850 sp 0x7ffd5322c848
READ of size 4 at 0x60e0000085b8 thread T0
SCARINESS: 45 (4-byte-read-heap-use-after-free)
    #0 0x5c8b44 in S_insert_emph /src/octofuzz/src/inlines.c:734:55
    #1 0x5c8b44 in process_emphasis /src/octofuzz/src/inlines.c:686
    #2 0x5c0c8c in handle_close_bracket /src/octofuzz/src/inlines.c:1203:7
    #3 0x5c0c8c in parse_inline /src/octofuzz/src/inlines.c:1397
    #4 0x5c0c8c in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #5 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #6 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #7 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #8 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #9 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #10 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #11 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #12 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #13 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #14 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #15 0x7fc7b343f83f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
    #16 0x41c058 in _start (/work/work202675120/cmark_fuzzer+0x41c058)
0x60e0000085b8 is located 120 bytes inside of 152-byte region [0x60e000008540,0x60e0000085d8)
freed by thread T0 here:
    #0 0x4dcf18 in __interceptor_cfree.localalias.0 /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:76
    #1 0x59f6c7 in S_free_nodes /src/octofuzz/src/node.c:165:5
    #2 0x59f6c7 in cmark_node_free /src/octofuzz/src/node.c:173
    #3 0x5c0c32 in handle_close_bracket /src/octofuzz/src/inlines.c:1198:9
    #4 0x5c0c32 in parse_inline /src/octofuzz/src/inlines.c:1397
    #5 0x5c0c32 in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #6 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #7 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #8 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #9 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #10 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #11 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #12 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #13 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #14 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #15 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #16 0x7fc7b343f83f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

previously allocated by thread T0 here:
    #0 0x4dd2f0 in calloc /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:97
    #1 0x59e513 in xcalloc /src/octofuzz/src/cmark.c:18:15
    #2 0x5bca49 in make_literal /src/octofuzz/src/inlines.c:80:33
    #3 0x5bca49 in handle_delim /src/octofuzz/src/inlines.c:548
    #4 0x5bca49 in parse_inline /src/octofuzz/src/inlines.c:1383
    #5 0x5bca49 in cmark_parse_inlines /src/octofuzz/src/inlines.c:1443
    #6 0x5aaefe in process_inlines /src/octofuzz/src/blocks.c:440:9
    #7 0x5aaefe in finalize_document /src/octofuzz/src/blocks.c:632
    #8 0x5aaefe in cmark_parser_finish /src/octofuzz/src/blocks.c:1497
    #9 0x58f2bc in LLVMFuzzerTestOneInput /src/octofuzz/test/cmark-fuzz.c:46:23
    #10 0x53911e in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:463:13
    #11 0x537826 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*) /src/libfuzzer/FuzzerLoop.cpp:392:3
    #12 0x53aac6 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:587:9
    #13 0x53d367 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:699:5
    #14 0x524d96 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:738:6
    #15 0x5181f8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #16 0x7fc7b343f83f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

SUMMARY: AddressSanitizer: heap-use-after-free /src/octofuzz/src/inlines.c:734:55 in S_insert_emph
Shadow bytes around the buggy address:
  0x0c1c7fff9060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff9070: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c7fff9080: fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff9090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff90a0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c1c7fff90b0: fd fd fd fd fd fd fd[fd]fd fd fd fa fa fa fa fa
  0x0c1c7fff90c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff90d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff90e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c7fff90f0: fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff9100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18==ABORTING
MS: 2 ShuffleBytes-CrossOver-; base unit: 9bd560bc9f34acce66a6694c78e223a79be55e9c
0x54,0x68,0x69,0x73,0x20,0x69,0x73,0x20,0x73,0x6f,0x6d,0x65,0x20,0x74,0x65,0x78,0x74,0x21,0x5b,0x5e,0x31,0x5d,0x2e,0x20,0x4f,0x74,0x68,0x65,0x72,0x20,0x74,0x65,0x78,0x74,0x2e,0x20,0x72,0x65,0x66,0x65,0x72,0x65,0x6e,0x74,0x5b,0x5e,0x6e,0x6f,0x70,0x65,0x5d,0x2e,0xa,0xa,0xa,0x5b,0x5e,0x6f,0x74,0x68,0x65,0x72,0x2d,0x6e,0x6f,0x74,0x65,0x5d,0x3a,0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x6e,0x6f,0x20,0x63,0x6f,0x64,0x65,0x20,0x62,0x6c,0x6f,0x63,0x6b,0x72,0x20,0x74,0x68,0x69,0x6e,0x67,0x5b,0x5e,0x63,0x6f,0x64,0x65,0x62,0x6c,0x6f,0x67,0x6b,0x2d,0x6e,0x6f,0x74,0x48,0xff,0x7,0x65,0x6c,0x6c,0x6f,0x21,0xa,0x7d,0xa,0x7c,0x20,0x5f,0x61,0x62,0x5f,0x20,0x20,0xe3,0x7c,0x63,0x82,0xbb,0xe3,0x83,0xb3,0x29,0x7c,0xa,0x7c,0x20,0x2d,0x2d,0x2d,0x2d,0x2d,0x20,0x7c,0x20,0x6c,0x6c,0x6f,0x21,0xa,0x7d,0xa,0x7c,0x20,0x5f,0x9a,0x6d,0x65,0x6e,0x74,0x73,0x20,0x64,0x6f,0x2a,0x2a,0x5f,0x2e,0x20,0x7c,0x20,0x78,0x20,0x74,0x4a,0xa,0x65,0x5d,0x2e,0xa,0xa,0x54,0x68,0x69,0x73,0x20,0x64,0x6f,0x65,0x73,0x6e,0x27,0x74,0x20,0x68,0x61,0x76,0x65,0x20,0x61,0x20,0x72,0x65,0x66,0x65,0x72,0x65,0x20,0x2d,0x2d,0x2d,0x2d,0xdf,0x3a,0x9a,0x93,0x6e,0x74,0x5b,0x5e,0x6e,0x6f,0x70,0x65,0x5d,0x2e,0xa,0xa,0xa,0x5b,0x5e,0x6f,0x74,0x68,0x65,0x9a,0x6d,0x65,0x6e,0x72,
This is some text![^1]. Other text. referent[^nope].\x0a\x0a\x0a[^other-note]:       no code blockr thing[^codeblogk-notH\xff\x07ello!\x0a}\x0a| _ab_  \xe3|c\x82\xbb\xe3\x83\xb3)|\x0a| ----- | llo!\x0a}\x0a| _\x9aments do**_. | x tJ\x0ae].\x0a\x0aThis doesn't have a refere ----\xdf:\x9a\x93nt[^nope].\x0a\x0a\x0a[^othe\x9amenr
artifact_prefix='./'; Test unit written to /work/libfuzzer395857231/testcase
Base64: VGhpcyBpcyBzb21lIHRleHQhW14xXS4gT3RoZXIgdGV4dC4gcmVmZXJlbnRbXm5vcGVdLgoKClteb3RoZXItbm90ZV06ICAgICAgIG5vIGNvZGUgYmxvY2tyIHRoaW5nW15jb2RlYmxvZ2stbm90SP8HZWxsbyEKfQp8IF9hYl8gION8Y4K744OzKXwKfCAtLS0tLSB8IGxsbyEKfQp8IF+abWVudHMgZG8qKl8uIHwgeCB0SgplXS4KClRoaXMgZG9lc24ndCBoYXZlIGEgcmVmZXJlIC0tLS3fOpqTbnRbXm5vcGVdLgoKClteb3RoZZptZW5y

which corresponds to opener_inl->as.literal.len in:

cmark-gfm/src/inlines.c

Lines 728 to 734 in 993e869

static delimiter *S_insert_emph(subject *subj, delimiter *opener,
delimiter *closer) {
delimiter *delim, *tmp_delim;
bufsize_t use_delims;
cmark_node *opener_inl = opener->inl_text;
cmark_node *closer_inl = closer->inl_text;
bufsize_t opener_num_chars = opener_inl->as.literal.len;


So it looks like we are incorrectly handling footnotes that themselves contain markdown formatting? Perhaps we shouldn't be calling into process_emphasis from handle_close_bracket?

@phillmv
Copy link
Member Author

phillmv commented Sep 1, 2021

Very powerful find, thanks @philipturnbull!

I poked around in this, and I was able to reproduce an infinite hang using |[^_a_]| (but surprisingly, no issue with |Tf[^.'no[^']|) as an input string.

I strongly suspect I am either failing to reset some aspect of the parser state, or there's a dangling pointer elsewhere 🤔 I shall look at it soon.

@phillmv
Copy link
Member Author

phillmv commented Sep 2, 2021

@philipturnbull I may have figured it out in 6e186b3 ! go figure, trying to free things before calling the next function would lead to shenanigans. I should be able to clean it up, and add a regression test.

I then set up the fuzzer on my end and it found a subtle use-after-free stemming from 71e27f2 which I have also figured out! but like with above, shall have to write into a patch another time.

Fuzzers! What a great tool.

tldr, avoid freeing memory before passing it along to another function.
Sometimes, when cleaning up unusued footnotes, two footnote->nodes may
end up referencing each other. As they get free()'d up, this can lead to
problems. Instead, first we unlink every node and _then_ free them up.
@phillmv
Copy link
Member Author

phillmv commented Sep 2, 2021

I've got the fuzzer running the background trying to dredge up more hits, but afaict I have fixed both issues covered thus far in d3a819c and in a1d171a 😄.

Copy link

@brasic brasic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving from a functionality perspective with the understanding that fuzzing and review from others has addressed known memory safety issues.

src/inlines.c Outdated Show resolved Hide resolved
src/inlines.c Outdated Show resolved Hide resolved
@phillmv phillmv requested a review from bk2204 September 15, 2021 19:26
@phillmv phillmv merged commit eb5b719 into master Sep 16, 2021
@phillmv phillmv deleted the fix-footnotes-nested-linkrefs-autolinker branch September 16, 2021 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autolink and footnote extensions incorrectly process footnotes with certain letters
6 participants