Performance issue when having large object set to save #7514

jcgouveia · 2017-06-29T13:21:07Z

OrientDB Version: 2.2.20

Java Version: 1.8

OS: Windows 8

I´ve found a performance issue when having large sets of objects pending to save.
The problem occurs when setting a value in a POJO with another one that is already connected with DB (OIdentifiable)

Consider the situation when having 10.000 objects pending to save.
In the following code, the operation newRecords.addAll(newRecords) will insert 10.000 objects (and increasing), every time a setter is used on a POJO object. This code will take almost all the processing time of creating the object set.

public void merge(ODirtyManager toMerge) {
if (isSame(toMerge))
return;
final Set newRecords = toMerge.getNewRecords();
if (newRecords != null) {
if (this.newRecords == null)
this.newRecords = Collections.newSetFromMap(new IdentityHashMap<ORecord, Boolean>(newRecords.size()));
this.newRecords.addAll(newRecords);
}
final Set updateRecords = toMerge.getUpdateRecords();
if (updateRecords != null) {
if (this.updateRecords == null)
this.updateRecords = Collections.newSetFromMap(new IdentityHashMap<ORecord, Boolean>(updateRecords.size()));
this.updateRecords.addAll(updateRecords);
}
...

This situation may not be the most recommended one (using large set of pending objects), but I was not expecting this behaviour on every setter. I suppose it can be optimized or implemented in another way.

tglman · 2017-06-29T14:32:56Z

hi @jcgouveia,

Yes this is a good point, we rarely have really big transactions, but the case exists, it can be optimized easily moving the collection in case the target collection is empty and do some check on the size and merge the small in the big collection with relative swap if necessary, all this code not need to be multi-thread aware so easy to change.

In any case for a few more version orientdb will need to keep the transaction all in memory so the list has to exist.

thanks for the detailed report, I will work on some optimization soon, feel free to give more suggestion or propose a solution.

Regards

HuangKBAaron · 2017-08-12T05:40:47Z

@tglman @lvca , any update on this issue. I am having a similar performance issue right now. Importing around 370,000 records to OrientDB (DB v 2.2.25 with default configure, win10 x64, JDK 1.8). At the first 2000 records, it runs real quick, but after that, it became 1 record/3 sec, and then stable with 1 record/4 sec.

the script:

BEGIN;
LET vol=CREATE VERTEX v_volume SET f_year=2017,f_quantity=59,f_recorded='2017-08-08 15:58:44';
LET province=UPDATE v_province SET f_name='Guangdong', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Guangdong';
LET city=UPDATE v_city SET f_name='Guangzhou', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Guangzhou';
LET pr=UPDATE v_plan_round SET f_name='PR67', f_number=67, f_recorded='2017-08-08 15:58:44', f_updated='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_number=67;
LET brand=UPDATE v_brand SET f_name='Bentley', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Bentley';
LET model=UPDATE v_model SET f_name='Continental GT/GTC/SSp', f_recorded='2017-08-08 15:58:44' UPSERT RETURN AFTER @rid WHERE f_name='Continental GT/GTC/SSp';
CREATE EDGE e_in_pr FROM $vol TO $pr SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO $brand SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO $model SET f_recorded='2017-08-08 15:58:44';
CREATE EDGE e_has_type FROM $vol TO [#41:1,#47:0,#18:0,#22:0,#38:0,#29:0,#25:0] SET f_recorded='2017-08-08 15:58:44';
COMMIT RETRY 100;

lvca · 2017-08-12T16:58:51Z

Have you created indexes on all the fields you're looking up in the WHERE conditions?

…, issue #7514

tglman · 2017-09-05T10:13:26Z

@jcgouveia,

I did some optimization of your case that will be released in 2.2.27.

@HuangKBAaron for your case I agree whit the Luca suggestion to introduce indexing for the upsert/where fields

Regards

tglman · 2017-09-11T11:23:20Z

Hi,

My fix should have solved this problem, closing this issue.

Regards

wolf4ood assigned tglman Jun 29, 2017

lvca added the performance label Aug 5, 2017

lvca added this to the 2.2.x (next hotfix) milestone Aug 5, 2017

tglman added a commit that referenced this issue Aug 31, 2017

add small optimization for reduce tracking copy with big linked graph…

202ca5e

…, issue #7514

tglman added a commit that referenced this issue Aug 31, 2017

add small optimization for reduce tracking copy with big linked graph…

731aa43

…, issue #7514

tglman closed this as completed Sep 11, 2017

robfrank modified the milestones: 2.2.x (next hotfix), 2.2.27 Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue when having large object set to save #7514

Performance issue when having large object set to save #7514

jcgouveia commented Jun 29, 2017 •

edited

Loading

tglman commented Jun 29, 2017

HuangKBAaron commented Aug 12, 2017 •

edited by lvca

Loading

lvca commented Aug 12, 2017

tglman commented Sep 5, 2017

tglman commented Sep 11, 2017

Performance issue when having large object set to save #7514

Performance issue when having large object set to save #7514

Comments

jcgouveia commented Jun 29, 2017 • edited Loading

OrientDB Version: 2.2.20

Java Version: 1.8

OS: Windows 8

tglman commented Jun 29, 2017

HuangKBAaron commented Aug 12, 2017 • edited by lvca Loading

lvca commented Aug 12, 2017

tglman commented Sep 5, 2017

tglman commented Sep 11, 2017

jcgouveia commented Jun 29, 2017 •

edited

Loading

HuangKBAaron commented Aug 12, 2017 •

edited by lvca

Loading