Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional categories #15

Open
duklin opened this issue Jan 17, 2020 · 3 comments
Open

Add additional categories #15

duklin opened this issue Jan 17, 2020 · 3 comments

Comments

@duklin
Copy link

duklin commented Jan 17, 2020

Is it possible to start training with additional categories such as: heading2, heading3, ..., image description, ...?

@zhxgj
Copy link
Contributor

zhxgj commented Jan 19, 2020

It is possible to do that. Level of headings and caption of image/tables are in the xml files. It is possible to link them to the PDFs. But we currently do not have a plan to do it due to other commitments.

@dijana-sagit
Copy link

Hi @zhxgj, thank you for your research and for providing such a useful resource! I was wondering if you would be allowed to share the original XMLs of the PDF files collected from PubMed, or the file IDs so I can re-collect them myself in order to add extra classes?
Regards

@zhxgj
Copy link
Contributor

zhxgj commented Jun 26, 2020

Hi @dijana-sagit , thanks for your interest. The XML and PDF files can be download directly from the PubMed Central Open Access Subset via FTP. Here is their link: https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants