Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of mappedColumnNames.contains #3023

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

kelunik
Copy link
Contributor

@kelunik kelunik commented Nov 29, 2023

If the number of columns is very large, this is quite a performance hit, due to O(n) runtime.

@harawata
Copy link
Member

Hello @kelunik ,

Please answer the following questions:

  1. Are you actually experiencing a performance problem?
  2. If the answer to Q1 is "yes", did you verify this change resolves the problem?

I understand that Set#contains() is faster than List#contains(), but it seems unlikely that this change makes significant impact on real-world applications.
And if there are so many items in the list, creating a new set instance for each row must be harmful in terms of memory usage.

@kelunik
Copy link
Contributor Author

kelunik commented Nov 30, 2023

Hi @harawata,

Yes, this is an actually experienced performance problem. I'm debugging select performance for a mapper locally in isolation and the contains call is quite obvious there (measured in CPU time, not wall time):

Before

image

537ms for contains

After

image

54ms for HashSet init, 54ms for contains

The table I test with has 99 columns.

@harawata
Copy link
Member

Thank you, the data is helpful.
Let me check if it makes more sense to change the type of ResultSetWrapper.mappedColumnNamesMap from Map<String, List<String>> to Map<String, Set<String>>.

@harawata
Copy link
Member

It is definitely better to change the type of ResultSetWrapper.mappedColumnNamesMap from Map<String, List<String>> to Map<String, Set<String>>.

If you are keen to update this PR, please do.
If not, that is perfectly fine. I will do it later.

@kelunik
Copy link
Contributor Author

kelunik commented Nov 30, 2023

Yes, that's definitely better, I just wasn't sure whether this is an accepted breaking change / whether this is part of the public API of the package. I'll update the PR.

@kelunik kelunik force-pushed the fix-mappedColumnNames-contains branch from a739b90 to a2d6ab0 Compare November 30, 2023 10:12
@kelunik
Copy link
Contributor Author

kelunik commented Nov 30, 2023

Updated. 🚀

@harawata
Copy link
Member

Thanks!
I should've been clearer, but the unMappedColumnNamesMap should still use List (the order could matter here).
As contains() is not called against the "unmapped" list, it should not be a problem in terms of performance.
Please let me know if I am missing something.

If the number of columns is very large, this is quite a performance hit, due to O(n) vs. O(1) runtime on List vs. Set.
@kelunik kelunik force-pushed the fix-mappedColumnNames-contains branch from a2d6ab0 to 9f93220 Compare November 30, 2023 11:49
@kelunik
Copy link
Contributor Author

kelunik commented Nov 30, 2023

Oops, order indeed matters there, updated.

@coveralls
Copy link

Coverage Status

coverage: 86.606%. remained the same
when pulling 9f93220 on kelunik:fix-mappedColumnNames-contains
into f83188d on mybatis:master.

@harawata harawata merged commit 2d5358a into mybatis:master Nov 30, 2023
14 checks passed
@harawata harawata self-assigned this Nov 30, 2023
@harawata harawata added this to the 3.5.15 milestone Nov 30, 2023
@harawata harawata added the polishing Improve a implementation code or doc without change in current behavior/content label Nov 30, 2023
@harawata
Copy link
Member

Merged. It should be included in the latest 3.5.15-SNAPSHOT.
Thanks a lot for your contribution, @kelunik !

@kelunik kelunik deleted the fix-mappedColumnNames-contains branch November 30, 2023 12:31
@kelunik
Copy link
Contributor Author

kelunik commented Nov 30, 2023

Thanks, @harawata!

@kelunik
Copy link
Contributor Author

kelunik commented Dec 5, 2023

@harawata Are there any guidelines when a new release will be tagged?

@harawata
Copy link
Member

harawata commented Dec 5, 2023

There is no clear guideline, but we try to release new version every 2-4 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
polishing Improve a implementation code or doc without change in current behavior/content
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants