Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What DH & Repository tooling is out there for working on a Corpus: Corpus Management, Packaging, Semtification, carry out Data Analysis and producing research outputs. #1

Open
mrchristian opened this issue Oct 14, 2024 · 0 comments

Comments

@mrchristian
Copy link
Contributor

Also posted to NFDI4Culture: https://tickets.nfdi4culture.de/work_packages/9750/activity

What DH & Repository tooling (software) is out there for working on a Corpus: Corpus Management, Packaging, Semtification, carry out Data Analysis and producing research outputs.

The reason for asking the question is that for an individual publication how do we make a publication usable, compatible with standards used in existing systems for corpus packaging and data analysis.

The kinds of tasks, functions, capabilities being looked at are:

  • Collect corpus into one file system
  • Package corpus with an inventory
  • Corpus conversion to open standard format, interoperable standard, have validation of open standard format
  • Corpus versioning and forking
  • Semantification: Annotate with Names Entifty Recognition,
  • Semantic concept annotation
  • Enable NLP anaysis: Word frequency
  • Enable syntactic and syntactic/semantic markup
  • Enable TDM
  • Research outputs: Allow for analysis of finding and results outputted as data and corpus copy if needed as Open Science being compatible
  • Reporting on Corpus: Bibliometric, Presenting Knowledge and ideas, statitics to back findings, etc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant