Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on FoLiA in to FoLiA out (speech data with events and utterances) #96

Closed
proycon opened this issue Nov 10, 2022 · 7 comments
Closed
Assignees

Comments

@proycon
Copy link
Member

proycon commented Nov 10, 2022

Frog (libfolia) segfaults on the attached FoLiA input upon FoLiA serialisation.

<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.5" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
                         <annotator processor="p1" />
          </text-annotation>
          <utterance-annotation>
                         <annotator processor="p1" />
          </utterance-annotation>
          <event-annotation set="speech">
                         <annotator processor="p1" />
          </event-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.speech">
      <event xml:id="turn.1" class="turn" src="piet.wav" begintime="00:00:00.720" endtime="00:00:53.230">
        <utt xml:id="example.utt.1" speaker="Piet">
            <t>Het is vandaag 1 januari 2019. Mijn naam is Piet voor het project Diplomatieke Getuigenissen heb ik vandaag een gesprek met Piet. Ook met ons in de kamer is Piet die voor ons het geluid en de video verzorgt. Meneer Piet misschien dat we gewoon kunnen beginnen met dat u iets over uw opleiding vertelt en hoe u bij Buitenlandse Zaken bent komen te werken?</t>
        </utt>
        <utt xml:id="example.utt.2" speaker="Piet">
            <t>Ja ik ben geboren in 1936. Volgens de boeken het heilige jaar voor de Chinezen. 1936. In 2036 is er weer zo'n heilig jaar. Ik ben ... </t>
        </utt>
      </event>
  </text>
</FoLiA>

Call: frog --skip=pac -x anon_1.folia.xml -X anon_1.out.folia.xml

All actual processing goes fine, it is the FoLiA serialisation in the end that fails.

gdb backtrace:

Thread 1 "frog" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fa4eae08999 in folia::AbstractElement::append (this=<optimized out>, this@entry=0x7fa4e700a580, child=<optimized out>, child@entry=0x7fa4e659a7f0) at folia_impl.cxx:3129
#2  0x00007fa4eae98ee2 in folia::AbstractStructureElement::append (this=0x7fa4e700a580, child=0x7fa4e659a7f0) at folia_subclasses.cxx:784
#3  0x00007fa4eae306fc in folia::AbstractElement::AbstractElement (this=this@entry=0x7fa4e659a7f0, __vtt_parm=__vtt_parm@entry=0x7fa4eb5abfc0 <VTT for folia::Paragraph+16>, p=..., el=el@entry=0x7fa4e700a580, __in_chrg=<optimized out>) at folia_impl.cxx:293
#4  0x00007fa4eb4cd949 in folia::AbstractStructureElement::AbstractStructureElement (p=0x7fa4e700a580, props=..., __vtt_parm=0x7fa4eb5abfb8 <VTT for folia::Paragraph+8>, this=0x7fa4e659a7f0, __in_chrg=<optimized out>)
    at /usr/local/include/libfolia/folia_subclasses.h:59
#5  folia::Paragraph::Paragraph (p=0x7fa4e700a580, a=..., this=0x7fa4e659a7f0, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at /usr/local/include/libfolia/folia_subclasses.h:626
#6  folia::FoliaElement::add_child<folia::Paragraph> (args=..., this=0x7fa4e700a580) at /usr/local/include/libfolia/folia_impl.h:125
#7  FrogAPI::handle_one_text_parent (this=0x7ffc1bc9e600, os=..., e=0x7fa4e700a580, sentence_done=<optimized out>) at FrogAPI.cxx:2567
#8  0x00007fa4eb4ce462 in FrogAPI::run_folia_engine (this=0x7ffc1bc9e600, infilename=..., output_stream=...) at FrogAPI.cxx:2661
#9  0x00007fa4eb4d0bf1 in FrogAPI::FrogFile (this=0x7ffc1bc9e600, infilename=...) at FrogAPI.cxx:2743
#10 0x00007fa4eb4d3cbd in FrogAPI::run_on_files (this=0x7ffc1bc9e600) at FrogAPI.cxx:1175
#11 0x000055c8b0feafd2 in main (argc=<optimized out>, argv=<optimized out>) at Frog.cxx:229
frog_segfault (END)
@proycon proycon added the bug label Nov 10, 2022
@proycon proycon self-assigned this Nov 10, 2022
@kosloot
Copy link
Collaborator

kosloot commented Nov 10, 2022

Well, a quick analyse showed me that Frog creates a paragraph and then attempts to append that to the <utt>
This is forbidden (folia_properties.cxx).
The append will throw in libfolia, but then the exception is not handled correctly.
Needs more investigation.

Bottomline: Do we want <p> nodes in an <utt>?

@proycon
Copy link
Member Author

proycon commented Nov 10, 2022

No, we don't, just sentences and words.... (ucto seems to do it properly)

@kosloot
Copy link
Collaborator

kosloot commented Nov 10, 2022

Ok, but in this example the <t> in the <utt> contains more then one sentence. Ergo a PARAGRAPH.
Not sure why ucto delivers only a sequence of <s> and NO paragraph

@kosloot
Copy link
Collaborator

kosloot commented Nov 10, 2022

So I added a small fix to libfolia. Now the exception is handled correctly.
Leaving the problem of creating an unwanted <p> in Frog

kosloot added a commit that referenced this issue Nov 10, 2022
@kosloot
Copy link
Collaborator

kosloot commented Nov 10, 2022

a small fix is now in Git. seems to work

@proycon
Copy link
Member Author

proycon commented Nov 10, 2022

Thanks! That fixes it indeed. Are things ready enough for a new release? I see you've been hacking some more lately.

@proycon proycon added the ready label Nov 10, 2022
@kosloot
Copy link
Collaborator

kosloot commented Feb 22, 2023

assuming it is done

@kosloot kosloot closed this as completed Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants