Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instability using tinkerpop remote client #4131

Closed
nmcintyre2014 opened this issue May 11, 2015 · 16 comments
Closed

Instability using tinkerpop remote client #4131

nmcintyre2014 opened this issue May 11, 2015 · 16 comments
Assignees
Milestone

Comments

@nmcintyre2014
Copy link

We're experiencing some instability in our orient remote client application. We're not sure of the root cause, but when it happens, our application is unable to connect to orientdb. The orientdb server seems healthy. The excpetion reported in our logs is: Caused by: com.orientechnologies.orient.core.exception.OStorageException: Error on executing command: sql.SELECT * FROM LandsatAcquisition WHERE timestamp >= date('2015-05-10 17:41:02.000','yyyy-MM-dd HH:mm:ss.SSS')
at com.orientechnologies.orient.client.remote.OStorageRemote.handleException(OStorageRemote.java:1501) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1059) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.client.remote.OStorageRemoteThread.command(OStorageRemoteThread.java:436) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63) ~[orientdb-core-2.0.1.jar:2.0.1]
at com.tinkerpop.blueprints.impls.orient.OrientGraphCommand.execute(OrientGraphCommand.java:48) ~[orientdb-graphdb-2.0.1.jar:2.0.1]
at io.geobigdata.persistence.catalog.RecordDataAccessObject.executeTypedSearch(RecordDataAccessObject.java:285) ~[persistence-catalog-1.0-SNAPSHOT.jar:na]
at io.geobigdata.persistence.catalog.RecordDataAccessObject.search(RecordDataAccessObject.java:412) ~[persistence-catalog-1.0-SNAPSHOT.jar:na]
at io.geobigdata.services.catalog.SearchService.search(SearchService.java:34) ~[SearchService.class:na]
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
at org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:181) ~[cxf-core-3.0.3.jar:3.0.3]
at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:97) ~[cxf-core-3.0.3.jar:3.0.3]
... 36 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: -1
at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_75]
at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_75]
at com.orientechnologies.orient.client.remote.OStorageRemote.useNewServerURL(OStorageRemote.java:1687) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.client.remote.OStorageRemote.getAvailableNetwork(OStorageRemote.java:1840) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.client.remote.OStorageRemote.beginRequest(OStorageRemote.java:1806) ~[orientdb-client-2.0.1.jar:2.0.1]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:970) ~[orientdb-client-2.0.1.jar:2.0.1]Any suggestions as to what is causing this?

@lvca
Copy link
Member

lvca commented May 11, 2015

Have you tied upgrading to a more recent hotfix version, like 2.0.8?

@lvca lvca added this to the 2.0.9 milestone May 11, 2015
@nmcintyre2014
Copy link
Author

We haven't tried upgrading yet. Would you recommend we upgrade both the server and client, or would just the client be sufficient?

@lvca
Copy link
Member

lvca commented May 12, 2015

Upgrade both.

@tglman
Copy link
Member

tglman commented May 13, 2015

@nmcintyre2014 you should have some additional stack traces in the log that should show in a better way the main cause of the error, by the way seems just a network problem.

@lvca lvca modified the milestones: 2.0.9, 2.0.10 May 14, 2015
@nricklin
Copy link

Hey guys, highly doubtful that this is a network problem as the client and server are both within the same VPC in AWS. We'll upgrade and provide more info.

@nricklin
Copy link

We are on 2.0.9 both client and server. Here is the stack trace we are getting on the client. Can you help explain what this means?

23:06:54.947 [http-bio-8080-exec-186] INFO i.g.p.c.RecordDataAccessObject - Executing SQL: SELECT * FROM [#18:65064,#18:5378,#18:15898,#18:58784,#18:10246,#18:3903,#18:14454,#18:79091,#18:8553,#18:2592,#18:13020,#18:67273,#18:7441,#18:8552,#18:57210,#18:79090,#18:11414,#18:13019,#18:7440,#18:67326,#18:14453,#18:3902,#18:65814,#18:105459,#18:106961,#18:108260,#18:109952,#18:121149,#18:127364,#18:129255,#18:129862,#18:131945,#18:132377,#18:133199,#18:133244,#18:134694,#18:136696,#18:137800,#18:139132,#18:139285,#18:140698,#18:110030,#18:108259,#18:131213,#18:122125,#18:6027,#18:16603,#18:137983,#18:141568,#17:1025504,#17:1014347,#17:1011254,#17:592589,#17:2404727,#17:2412061,#17:3195758,#17:3195799,#17:1054905,#17:1028615,#17:1014349,#17:832401,#17:981067,#17:600239,#17:831650,#17:1405913,#17:1741277,#17:763941,#17:876922,#17:1026578,#17:2415403,#17:1015497,#17:763840,#17:1016811,#17:1028018,#17:1490843,#17:1016845,#17:2415581,#17:784855,#17:2194143,#17:1961008,#17:994155,#17:918756,#17:786886,#17:1147297,#17:1055624,#17:1741416,#17:2192763,#17:1543173,#17:2647982,#17:3282029,#17:3282092,#17:3283731,#17:3283892,#17:50306,#18:295,#18:1572,]
23:06:54.947 [http-bio-8080-exec-186] INFO i.g.p.c.RecordDataAccessObject - Executing SQL: SELECT * FROM [#18:65064,#18:5378,#18:15898,#18:58784,#18:10246,#18:3903,#18:14454,#18:79091,#18:8553,#18:2592,#18:13020,#18:67273,#18:7441,#18:8552,#18:57210,#18:79090,#18:11414,#18:13019,#18:7440,#18:67326,#18:14453,#18:3902,#18:65814,#18:105459,#18:106961,#18:108260,#18:109952,#18:121149,#18:127364,#18:129255,#18:129862,#18:131945,#18:132377,#18:133199,#18:133244,#18:134694,#18:136696,#18:137800,#18:139132,#18:139285,#18:140698,#18:110030,#18:108259,#18:131213,#18:122125,#18:6027,#18:16603,#18:137983,#18:141568,#17:1025504,#17:1014347,#17:1011254,#17:592589,#17:2404727,#17:2412061,#17:3195758,#17:3195799,#17:1054905,#17:1028615,#17:1014349,#17:832401,#17:981067,#17:600239,#17:831650,#17:1405913,#17:1741277,#17:763941,#17:876922,#17:1026578,#17:2415403,#17:1015497,#17:763840,#17:1016811,#17:1028018,#17:1490843,#17:1016845,#17:2415581,#17:784855,#17:2194143,#17:1961008,#17:994155,#17:918756,#17:786886,#17:1147297,#17:1055624,#17:1741416,#17:2192763,#17:1543173,#17:2647982,#17:3282029,#17:3282092,#17:3283731,#17:3283892,#17:50306,#18:295,#18:1572,]
23:06:54.947 [http-bio-8080-exec-186] DEBUG i.g.p.o.OrientDbClientProvider - Getting orient graph. 43 instances available. 144 instances created.
23:06:54.947 [http-bio-8080-exec-186] DEBUG i.g.p.o.OrientDbClientProvider - Getting orient graph. 43 instances available. 144 instances created.
23:06:54.948 [http-bio-8080-exec-186] WARN o.a.c.j.i.WebApplicationExceptionMapper - javax.ws.rs.WebApplicationException: HTTP 500 Internal Server Error
at io.geobigdata.services.catalog.HeartbeatService.heartbeat(HeartbeatService.java:29)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:181)
at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:97)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:223)
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:197)
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:149)
at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:171)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:290)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:214)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:620)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:265)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:683)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.ws.rs.WebApplicationException: HTTP 500 Internal Server Error
at io.geobigdata.services.catalog.SearchService.search(SearchService.java:51)
at io.geobigdata.services.catalog.HeartbeatService.runSearch(HeartbeatService.java:38)
at io.geobigdata.services.catalog.HeartbeatService.heartbeat(HeartbeatService.java:26)
... 41 more
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:277)
at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.openDatabase(OPartitionedDatabasePool.java:313)
at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.acquire(OPartitionedDatabasePool.java:286)
at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.(OrientBaseGraph.java:153)
at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.(OrientTransactionalGraph.java:78)
at com.tinkerpop.blueprints.impls.orient.OrientGraph.(OrientGraph.java:128)
at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getTx(OrientGraphFactory.java:74)
at io.geobigdata.persistence.orient.OrientDbClientProvider.getOrientGraph(OrientDbClientProvider.java:50)
at io.geobigdata.persistence.catalog.RecordDataAccessObject.executeSpatialSearch(RecordDataAccessObject.java:247)
at io.geobigdata.persistence.catalog.RecordDataAccessObject.search(RecordDataAccessObject.java:432)
at io.geobigdata.services.catalog.SearchService.search(SearchService.java:36)
... 43 more
Caused by: java.lang.IllegalStateException: Database dgcat is already open
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:238)
... 53 more

@tglman
Copy link
Member

tglman commented May 19, 2015

Hi @nricklin,
The original error just say that it cannot open the databases because is already open, the only reason i can think is that there are multiple OrientGraphFactory instances in your application, it should be only one, can you also post the snippet of code of:OrientDbClientProvider.getOrientGraph so i can check that there is all correct.

@nmcintyre2014
Copy link
Author

Here's the entire class. It's a spring singleton and the init method is called once, upon application startup.

package io.geobigdata.persistence.orient;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.tinkerpop.blueprints.impls.orient.OrientGraph;
import com.tinkerpop.blueprints.impls.orient.OrientGraphFactory;
import com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx;

public class OrientDbClientProvider {

private static final Logger LOG = LoggerFactory.getLogger(OrientDbClientProvider.class);
private String databaseUrl = null;
private String user = null;
private String password = null;
private OrientGraphFactory orientGraphFactory = null;
private Boolean memory = Boolean.FALSE;
public OrientDbClientProvider(String dbURL) {
this.setDatabaseUrl(dbURL);
init();
}
public OrientDbClientProvider(){
}
public void init(){
if(databaseUrl.startsWith("memory")){
memory = Boolean.TRUE;
return;
}
// create the pool
if (getUser()==null) {
orientGraphFactory = new OrientGraphFactory(databaseUrl).setupPool(1, 50);
} else {
orientGraphFactory = new OrientGraphFactory(databaseUrl,getUser(), getPassword()).setupPool(1, 50);
}
}
// Standard transaction handle
public OrientGraph getOrientGraph(){
if(memory){
return new OrientGraph(databaseUrl);
} else {
LOG.debug("Getting orient graph.  "+
orientGraphFactory.getAvailableInstancesInPool()+" instances available.  "+
orientGraphFactory.getCreatedInstancesInPool()+" instances created.");
return orientGraphFactory.getTx();
}
}
public String getDatabaseUrl() {
return databaseUrl;
}

public void setDatabaseUrl(String databaseUrl) {
this.databaseUrl = databaseUrl;
}

public String getUser() {
return user;
}

public void setUser(String user) {
this.user = user;
}

public String getPassword() {
return password;
}

public void setPassword(String password) {
this.password = password;
}
}

@nmcintyre2014
Copy link
Author

We see this sort of error under extremely high load.

@tglman
Copy link
Member

tglman commented May 19, 2015

Ok, Thank you for the info, the OrientDbClientProvider look fine, i'll investigate inside the pool implementation to be sure that is everything right also under heavy load.

@tglman
Copy link
Member

tglman commented May 20, 2015

hi @nmcintyre2014, i checked the code and also ran a small concurrency test to check the pool with 10000 threads and 1M of tasks, but i wasn't able to reproduce the problem, the only thing i found that could cause this error was if was call by hand an 'open' from a database taken from the pool, i added some check for avoid that it will be out with the 2.0.10, in the meanwhile can you check if you call some open on a database from the pool?

thank you

@lvca lvca modified the milestones: 2.0.10, 2.0.11 May 22, 2015
@lvca lvca closed this as completed May 30, 2015
@lvca
Copy link
Member

lvca commented May 30, 2015

Issue closed because no answer. In case, please reopen/comment this issue.

@jamesconradshea
Copy link

We are definitely seeing a similar issue (IllegalStateException thrown from the same spot), rarely and irreproducibly although it also seems to happen under high load. I've verified that we are properly calling shutdown() within a finally block on every graph usage and that this is properly calling close() on the underlying database.

I do have a theory about what may be going on but I don't have the resources or understanding of the code to check it out. The database object in question is an instance of OPartitionedDatabasePool$DatabaseDocumentTxPolled. When close() is called It has a "status" field which is set to CLOSED before the db is returned to the pool, which is implemented as a Queue.

What I think might be happening is that under load the pool is becoming empty. So if a second thread comes in and attempts to acquire the database right after it has been added, the first thread's update to the status field may not yet be visible.

In short, I think the status field needs to be marked volatile.

@andrii0lomakin
Copy link
Member

It violates JMM but I think we need investigate why update to database are not visible

@andrii0lomakin andrii0lomakin modified the milestones: 2.2, 2.0.11 Sep 14, 2015
@prjhub
Copy link

prjhub commented Oct 7, 2015

The issue has been closed because no response has been received after a while. In case the issue is still valid, please reopen it or write a comment that explains why it's still valid and a way to reproduce it. Thank you.

@andrii0lomakin
Copy link
Member

Hi seems I found the reason if remote storage is closed when you call close on database instance then status of database instance itself is not updated as closed, so you put open database in pool. I update code accordingly and push the change if you still be able to reproduce issue please reopen it with updated stack trace.

andrii0lomakin added a commit that referenced this issue Nov 18, 2015
andrii0lomakin added a commit that referenced this issue Nov 18, 2015
(cherry picked from commit 1ab0fbe)
andrii0lomakin added a commit that referenced this issue Nov 18, 2015
(cherry picked from commit 1ab0fbe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants