Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when having large object set to save #7514

Closed
jcgouveia opened this issue Jun 29, 2017 · 5 comments
Closed

Performance issue when having large object set to save #7514

jcgouveia opened this issue Jun 29, 2017 · 5 comments
Assignees
Milestone

Comments

@jcgouveia
Copy link

jcgouveia commented Jun 29, 2017

OrientDB Version: 2.2.20

Java Version: 1.8

OS: Windows 8

I´ve found a performance issue when having large sets of objects pending to save.
The problem occurs when setting a value in a POJO with another one that is already connected with DB (OIdentifiable)

Consider the situation when having 10.000 objects pending to save.
In the following code, the operation newRecords.addAll(newRecords) will insert 10.000 objects (and increasing), every time a setter is used on a POJO object. This code will take almost all the processing time of creating the object set.

public void merge(ODirtyManager toMerge) {
if (isSame(toMerge))
return;
final Set newRecords = toMerge.getNewRecords();
if (newRecords != null) {
if (this.newRecords == null)
this.newRecords = Collections.newSetFromMap(new IdentityHashMap<ORecord, Boolean>(newRecords.size()));
this.newRecords.addAll(newRecords);
}
final Set updateRecords = toMerge.getUpdateRecords();
if (updateRecords != null) {
if (this.updateRecords == null)
this.updateRecords = Collections.newSetFromMap(new IdentityHashMap<ORecord, Boolean>(updateRecords.size()));
this.updateRecords.addAll(updateRecords);
}
...

image

This situation may not be the most recommended one (using large set of pending objects), but I was not expecting this behaviour on every setter. I suppose it can be optimized or implemented in another way.

@tglman
Copy link
Member

tglman commented Jun 29, 2017

hi @jcgouveia,

Yes this is a good point, we rarely have really big transactions, but the case exists, it can be optimized easily moving the collection in case the target collection is empty and do some check on the size and merge the small in the big collection with relative swap if necessary, all this code not need to be multi-thread aware so easy to change.

In any case for a few more version orientdb will need to keep the transaction all in memory so the list has to exist.

thanks for the detailed report, I will work on some optimization soon, feel free to give more suggestion or propose a solution.

Regards

@lvca lvca added this to the 2.2.x (next hotfix) milestone Aug 5, 2017
@HuangKBAaron
Copy link

HuangKBAaron commented Aug 12, 2017

@tglman @lvca , any update on this issue. I am having a similar performance issue right now. Importing around 370,000 records to OrientDB (DB v 2.2.25 with default configure, win10 x64, JDK 1.8). At the first 2000 records, it runs real quick, but after that, it became 1 record/3 sec, and then stable with 1 record/4 sec.

the script:

BEGIN;
LET vol=CREATE VERTEX v_volume SET f_year=2017,f_quantity=59,f_recorded='2017-08-08 15:58:44';
LET province=UPDATE v_province SET f_name='Guangdong', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Guangdong';
LET city=UPDATE v_city SET f_name='Guangzhou', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Guangzhou';
LET pr=UPDATE v_plan_round SET f_name='PR67', f_number=67, f_recorded='2017-08-08 15:58:44', f_updated='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_number=67;
LET brand=UPDATE v_brand SET f_name='Bentley', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Bentley';
LET model=UPDATE v_model SET f_name='Continental GT/GTC/SSp', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Continental GT/GTC/SSp';
CREATE EDGE e_in_pr FROM $vol TO $pr SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO $brand SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO $model SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO [#41:1,#47:0,#18:0,#22:0,#38:0,#29:0,#25:0] SET f_recorded='2017-08-08 15:58:44';
COMMIT RETRY 100;

@lvca
Copy link
Member

lvca commented Aug 12, 2017

Have you created indexes on all the fields you're looking up in the WHERE conditions?

@tglman
Copy link
Member

tglman commented Sep 5, 2017

@jcgouveia,

I did some optimization of your case that will be released in 2.2.27.

@HuangKBAaron for your case I agree whit the Luca suggestion to introduce indexing for the upsert/where fields

Regards

@tglman
Copy link
Member

tglman commented Sep 11, 2017

Hi,

My fix should have solved this problem, closing this issue.

Regards

@tglman tglman closed this as completed Sep 11, 2017
@robfrank robfrank modified the milestones: 2.2.x (next hotfix), 2.2.27 Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants