Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of converting SQLite to FHIR #64

Open
joeflack4 opened this issue Dec 15, 2022 · 2 comments
Open

Improve speed of converting SQLite to FHIR #64

joeflack4 opened this issue Dec 15, 2022 · 2 comments

Comments

@joeflack4
Copy link

joeflack4 commented Dec 15, 2022

Overview

I tried to convert HPO to FHIR using semsql as an intermediary. However, after about 40 minutes, I decided to give up and switch to Obographs for speed. I think it took about 10 minutes to convert to a .db, and the rest of the time in my process was just OAK trying to load the DB. Normally semsql is much faster to load than using rdflib, but not in this case. I looked and saw that my hpo.db was about 1GB, which is about 10x larger than my hpo.owl. I looked at some of my other conversions, and it looks like this 5-10x file size was normal.

If I'm correct that the issue is not so much OAK performance, but just the file size in general, is there anything we can do to reduce these file sizes? Or maybe it's not so much the size, but the structure that is taking OAK a long time to parse downstream? If this is more of an OAK issue (or both an OAK issue and a semsql issue), I can open up a ticket over there.

Potential causes

May be 1 or more of the following that's taking a lot of time.
a. Semsql: File size
b. Semsql: Non-optimal structures for downstream parsing
c. OAK: Not parsing optimally
d. OAK: Spending time doing things that are maybe not needed for my use case

@joeflack4 joeflack4 changed the title File size optimization File size, structure, and downstream performance Dec 15, 2022
@cmungall cmungall changed the title File size, structure, and downstream performance Improve speed of converting SQLite to FHIR Dec 16, 2022
@cmungall
Copy link
Collaborator

I don't think it's anything to do with file sizes. it's likely it is iterating and performing multiple SQL queries. this shoulld be easy to optimize

@joeflack4
Copy link
Author

That sounds hopeful! Thanks Chris.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants