Skip to content

HIPE-2022 data v2.0

Compare
Choose a tag to compare
@mromanello mromanello released this 22 Mar 10:45
· 52 commits to main since this release
165773f

Release notes

This release contains:

  • πŸ“ƒ ajmc: full train and dev sets for fr, en, de.
  • πŸ“ƒ ajmc: mappings [OCR-gold transcript] for ajmc entities (see README-ajmc)
  • πŸ› newseye: correction of document_id number in metadata line # hipe2022:document_id = + removal of unannotated documents from DE train set (see README-newseye)
  • πŸ› sonar: thorough revision of NER and NEL annotations + removal of unrevised materials from dev set (see README-sonar.md)
  • updated stats in the dedicated notebook
  • updated corpus statistics in the dedicated notebook