Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oetl.sh gets stuck #7224

Closed
danciui opened this issue Mar 10, 2017 · 20 comments
Closed

oetl.sh gets stuck #7224

danciui opened this issue Mar 10, 2017 · 20 comments
Assignees
Labels
Milestone

Comments

@danciui
Copy link

danciui commented Mar 10, 2017

OrientDB Version: orientdb-community-2.2.16

Java Version: Java 8 Update 91

OS: Mac Os Sierra 10.12.3

Expected behavior

Expected behavior

oetl.sh finishes gracefully

Actual behavior

oetl.sh gets stuck

Steps to reproduce

![p1](https://cloud.githubusercontent.com/assets/15330145/23781627/5a19f2ec-0514-11e7-91e5-de4d501ecd46.png) ![p2](https://cloud.githubusercontent.com/assets/15330145/23781628/5a22890c-0514-11e7-97ab-213e63493aca.png)
@robfrank
Copy link
Contributor

Can you provide the json conf file and a sample of your data? Thanks in advance.

@robfrank
Copy link
Contributor

Without the json configuration and a sample of the data set, it is very difficult to understand why ETL stuck.

@danciui
Copy link
Author

danciui commented Mar 20, 2017

Good morning,
I tried to reproduce it but now it doesn't get stuck, it just throws an error. "ETL process has problem: java.util.concurrent.TimeoutException"
Archive 2.zip

I also added tickets for the other problems I had with oetl.sh

@robfrank
Copy link
Contributor

robfrank commented Mar 21, 2017

I wasn't able to reproduce the problem. The process ends without problems:

orientdb-community-2.2.17/bin/oetl.sh $(pwd)/loadr.json
OrientDB etl v.2.2.17-SNAPSHOT (build UNKNOWN@re7562eaab24ebfbebf10c2a819f660c3b584ac98; 2017-02-16 09:22:14+0000) www.orientdb.com
[orientdb] INFO Dropping existent database 'plocal:....../database'...
BEGIN ETL PROCESSOR
[file] INFO Reading from file .....ExampleSat34tb.json with encoding UTF-8
Started execution with 1 worker threads
+ extracted 641 entries (0 entries/sec) - 641 entries -> loaded 139 vertices (0 vertices/sec) Total time: 1003ms [0 warnings, 0 errors]
....
END ETL PROCESSOR
+ extracted 7,048 entries (0 entries/sec) - 7,048 entries -> loaded 7,048 vertices (53 vertices/sec) Total time: 130648ms [0 warnings, 0 errors]

@danciui
Copy link
Author

danciui commented Mar 21, 2017

You're using plocal. Could this be a problem when using a remote server?

@robfrank
Copy link
Contributor

I use plocal because for me it's easier to test and because it if faster. If you need do load data only one time, prefer plocal. Then , after import, startup the server.

@danciui
Copy link
Author

danciui commented Mar 21, 2017

My log looks like this:
"....
ETL process has problem: java.util.concurrent.TimeoutException
END ETL PROCESSOR

  • extracted 7,048 entries (0 entries/sec) - 7,048 entries -> loaded 6,732 vertices (4 vertices/sec) Total time: 1517051ms [0 warnings, 0 errors]
    "

@robfrank
Copy link
Contributor

Can you test with 2.2.17? Are you on plocal or still in remote?

@danciui
Copy link
Author

danciui commented Mar 21, 2017

That was remote. I am building 2.2.17 right now and will report back on behavior in ~1hr.

@robfrank
Copy link
Contributor

Why "building"? Just download it!

@danciui
Copy link
Author

danciui commented Mar 21, 2017

It's all done automatically via a script that:
-pulls the new code
-mvn clean installs it
-links the databases directory and the config
-sets the ORIENTDB_HOME.

@danciui
Copy link
Author

danciui commented Mar 21, 2017

"ETL process has problem: java.util.concurrent.TimeoutException
END ETL PROCESSOR

  • extracted 7,048 entries (0 entries/sec) - 7,048 entries -> loaded 6,756 vertices (5 vertices/sec) Total time: 1547593ms [0 warnings, 0 errors]
    [6756:edge] DEBUG joinCurrentValue=id-0067-00004b18, lookupResult=[LocalCreateObjectSpeedTest is very very slow [moved] #19:1450]
    Error in Pipeline execution: com.orientechnologies.orient.core.exception.ODatabaseException: Error during saving of record with rid #-1:-1
    "

@robfrank
Copy link
Contributor

Is the dataset the same you provided to me? And, are you building from master branch fetching tags? OR? From github releases? Just to be able to reproduce the same env.
Even if I develop OrientDB, I prefer to rely on binaries.

@danciui
Copy link
Author

danciui commented Mar 21, 2017

Same dataset. You can see we are trying to load the same number of entries: 7,048. Mine crashes at 6,756.

This is my .git/conf for orientdb.

[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = https://github.com/orientechnologies/orientdb.git
fetch = +refs/heads/:refs/remotes/origin/
[branch "master"]
remote = origin
merge = refs/heads/master

@robfrank
Copy link
Contributor

robfrank commented Apr 4, 2017

As for others issues, my suggestion is to define an index on the property/ies used to lookup.
This will speedup the ETL process.

@robfrank robfrank added this to the 2.2.x (next hotfix) milestone Apr 4, 2017
@danciui
Copy link
Author

danciui commented Apr 6, 2017

I need to get the etl to the remote db to be as fast as possible. It's now at at 1.5 mins and it needs to be below 30s, so I need to use the parallel execution, preferably with more threads too. I am not sure how to change the # of threads.

@robfrank
Copy link
Contributor

robfrank commented Apr 6, 2017

Ok for parallel, but please create the indexes to help components during lookups.
Number of thread is number of cores minus 1, because 1 thread is dedicated to the source reader (file/jdbc). Note that the ETL is a "generic" tool, so maybe sometimes it could be better to write your own small data importer, just my 2cents.

@danciui
Copy link
Author

danciui commented Apr 6, 2017

ok.

@danciui
Copy link
Author

danciui commented Apr 6, 2017

thank you!

@venki-hiya
Copy link

I am really annoyed how scanty and non intutive this error message is. I ran into this issue, while setting a link property before a vertex is created in the list of transformers. You got to create a vertex object, if you are setting a link property later in 'link' transformer. Took me good 2 hours of wasted time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants