-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POSTCLITIC gets generated in the list of word even though not present in source transcript #26
Comments
If you're accessing data from the Also, it sounds like you're interested in getting the transcription data that's cleaned up and without the CHAT annotations? The way the package does the utterance cleaning is to use the currently private _clean_utterance function. I think I can consider adding a new attribute to the |
that would be great |
actually, calling
and the POSTCLITIC is still there (eg and (optionally) an API to get raw transcript with just the rich annotations like breaths, laughs etc(I realize that might be hard to define, but something like "all the audible sounds"):
How do I get these back? |
Describe the bug
POSTCLITIC gets output as word. I wonder what else gets similarly generated; it makes it harder to use this data for transcription purposes
Relevant CHILDES or TalkBank data
https://sla.talkbank.org/TBB/homebank/Public/VanDam-5minute/ML77/ML77_020400a.cha
To reproduce
this shows a >0 number
Expected behavior
no POSTCLITIC should be output
Note
zooming in on where this occurs:
%mor: co|okay mod|got~inf|to v|put&ZERO pro:dem|these v|back .
=>
"okay gotta POSTCLITIC put these back ."
Note 2
in #23 (comment) @jacksonllee mentions:
which makes me wonder, is this even intentional? how can caller distinguish what are actual words?
should this code (to get transcript) be replaced by something else?
The text was updated successfully, but these errors were encountered: