Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update poi read streaming benchmark #181

Merged
merged 4 commits into from
Jan 4, 2022

Conversation

pjfanning
Copy link
Contributor

Thanks for maintaining this great project. I'm a POI PMC member and maintainer of a fork of monitorjbl's excel-streaming-reader.

I updated your benchmark for POI reading with streaming. The test was set to read in styles which is not needed for the test as it is written. Styles are only needed if you want to use POI DataFormatter to format the data. It has a material effect on the test to read that unnecessary data.

I commented out the monitorjbl test because it crashes with the POI version that is loaded. It can be re-enabled if the POI version is forced to version 4 (eg 4.1.2).

I plan on testing my fork of monitorjbl. I'm adding some extra tuning features to it (eg to optionally not load styles). It will probably not match fastexcel for speed (but has a few extra features - so far, I've prioritised features over performance).

Good news is fastexcel has best read result but the POI streaming catches up a bit.

Benchmark                                        Mode  Cnt  Score   Error  Units
ReaderBenchmark.apachePoi                          ss   15  3.259 ± 0.784   s/op
ReaderBenchmark.fastExcelReader                    ss   15  0.462 ± 0.069   s/op
ReaderBenchmark.streamingApachePoiWithStyles       ss   15  2.387 ± 0.088   s/op
ReaderBenchmark.streamingApachePoiWithoutStyles    ss   15  0.887 ± 0.045   s/op

e2e/pom.xml Outdated
@@ -14,12 +14,12 @@
<dependency>
<groupId>org.dhatim</groupId>
<artifactId>fastexcel</artifactId>
<version>0-SNAPSHOT</version>
<version>0.12.12</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep 0-SNAPSHOT to use the local artifact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versions reverted

e2e/pom.xml Outdated
</dependency>
<dependency>
<groupId>org.dhatim</groupId>
<artifactId>fastexcel-reader</artifactId>
<version>0-SNAPSHOT</version>
<version>0.12.12</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -121,22 +130,20 @@ public int streamingApachePoi() throws IOException, OpenXML4JException, SAXExcep
SheetContentHandler sheetHandler = new SheetContentHandler();
processSheet(styles, strings, sheetHandler, sheetStream);
assertEquals(RESULT, sheetHandler.result);
return sheetHandler.result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we process only one sheet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    @Benchmark
    public long fastExcelReader() throws IOException {
        try (InputStream is = openResource(FILE); ReadableWorkbook wb = new ReadableWorkbook(is)) {
            Sheet sheet = wb.getFirstSheet();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fair that if one benchmark gets just the first sheet - that other benchmarks also only get first sheet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

@ochedru ochedru merged commit 9b226e2 into dhatim:master Jan 4, 2022
@pjfanning pjfanning deleted the poi-streaming-bench branch January 21, 2022 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants