Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In mem copy rdf graph #2619

Merged
merged 1 commit into from
Jan 1, 2024
Merged

In mem copy rdf graph #2619

merged 1 commit into from
Jan 1, 2024

Conversation

andyfengHKU
Copy link
Contributor

@andyfengHKU andyfengHKU commented Dec 30, 2023

This PR adds in_memory option when copying rdf dataset. In memory version will first cache entire rdf dataset into memory before copying them into different tables. This approach avoids repeated parsing (which is a significant overhead) of rdf dataset.

Additional Changes

  • Remove ExtraCopierConfig and store them inside each scan function's bind data respectively. In fact, we only need additional config for CSV reader.
  • Regiester 9 rdf reader functions. Out-of-memory version should use the first 4 scan functions. In-memory version uses the fifth function to first scan into memory and then re-scan memory with the first 4 functions (in-memory version).
    • (in memory) scan resource
    • (in memory) scan literal
    • (in memory) scan rrr
    • (in memory) scan url
    • scan all

Copy link

codecov bot commented Dec 30, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (9446b7b) 93.27% compared to head (0a35439) 93.26%.

Files Patch % Lines
...cessor/operator/persistent/reader/rdf/rdf_scan.cpp 95.86% 5 Missing ⚠️
src/common/copier_config/rdf_reader_config.cpp 70.00% 3 Missing ⚠️
...rocessor/operator/persistent/reader/rdf/rdf_scan.h 92.30% 1 Missing ⚠️
...ssor/operator/persistent/reader/rdf/rdf_reader.cpp 97.95% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2619      +/-   ##
==========================================
- Coverage   93.27%   93.26%   -0.02%     
==========================================
  Files        1036     1041       +5     
  Lines       38870    39033     +163     
==========================================
+ Hits        36257    36405     +148     
- Misses       2613     2628      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andyfengHKU andyfengHKU merged commit c471abe into master Jan 1, 2024
14 checks passed
@andyfengHKU andyfengHKU deleted the in-mem-copy-rdf-graph branch January 1, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants