-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceptions when adding vertices/edges that do on-demand schema creation #338
Comments
Thanks, I'll have a look at your test program. Are there any write transactions that take longer than 2 minutes? |
No, all write transactions should be quick, nothing complicated, either in our real app or this test client. It is just adding a vertex or edge and a property or two. We actually are committing fairly frequently, you can see the test client is not batching up any of the adds. |
Had a look. I can not get postgresql to not work. I ported your test to Sqlg so perhaps have a look if it fails by you on postgresql. |
Ok, made some progress but no solutions yet.
For the part where MariaDB hangs, its because it tried to execute an There is code to throw a As an aside the |
I did discover that due to another issue of our own, on-demand schema
creation was occurring that shouldn't have, so fixing that which should at
least mitigate the effects of this issue, but thanks for the update. I'll
share more as we observe anything new/interesting, but it seems the safest
approach is to try to get the schema created up front using Topology so
trying to do locks for the CREATE/ALTER statements isn't necessary.
…On Sun, Mar 17, 2019 at 12:15 PM pietermartin ***@***.***> wrote:
Ok, made some progress but no solutions yet.
java.lang.IllegalStateException: Edge vertex 'prior' does not exist in
schema 'PUBLIC' happens I think because of MVCC semantics. 2 transactions
run at the same time. transaction 1 created the edge 'prior' and commits
which puts it into Sqlg's global cache. Transaction 2 now wants to do
something to edge 'prior'. Sqlg says it exist but due to MVCC transaction 1
can not see it as it was not there when the transaction started. Secretly
committing and starting a new transaction after taking the topology lock
resolves the issue but is kinda unsatisfactory as it violates the users
transaction boundaries.
This is a rather serious issue. My first concern is why I have never seen
this happening before on Postgresql as it too has MVCC semantics. Kinda
late to only realize this now.
For the part where MariaDB hangs, its because it tried to execute an ALTER
STATEMENT which is waiting for another write thread that executed an
UPDATE to commit . The write thread is also trying to do a schema
modification and can not get the topology lock as its taken by the ALTER
STATEMENT's thread. Boom bam dead lock.
There is code to throw a deadLockDetected exception but its only
implemented for posgresql at present. I'll try to implement the same on
MariaDB.
As an aside the ALTER STATEMENT does not block for the UPDATE if you
disable foreign keys. I imagine this is why its not happening on postgresql
as its locking is probably finer grained with respect to foreign keys.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#338 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABHUDlEQrlxJRpjM9ea4CvJTrm34KNYUks5vXpRFgaJpZM4bxb4K>
.
|
I am planning on removing all lock mechanisms and tricks from Sqlg. To get it to work the granularity is to course grained. Schema creation can only proceed in one thread at a time and when no other thread is writing. So the idea to protect the user from dead locks did not really work. The pattern going forward is to use one thread for lazy creation or the schema creation api. I am going to close this for now but open it again if you want to. |
@pietermartin are you removing all locks entirely? Or just those related to schema creation? I want to make sure I understand the behavior when Sqlg is used from multiple threads or even multiple nodes. And we do use the schema creation API and try to avoid lazy creation, but what would happen if multiple threads try to do lazy creation now? |
Yes, the plan to remove all attempts by Sqlg to take locks. In my day job where we manage many large Sqlg databases the locking strategy has not really worked out. Its not that there are bugs but rather that its very slow, forcing everything into one thread, and often occasionally throws To overcome this we now pre-create the schema object using the topology api. This works fine but because of Sqlg's locking strategy this still only happens in one thread even is you are creating unrelated schema objects. Without any locking strategy the user will be free to create schema objects in many threads but is responsible for understanding postgresql/rdbms locks and will have to manually handle dead lock exceptions. When I get to this I'll see if there is any deadlock detection that Sqlg can maybe do. I you have ideas coming from your experience then please let me know as I have not started on this yet. Want to squash the outstanding bugs first. |
We use the schema creation API because we do want to avoid any complications of locks or contention for lazy creation as we have high volume transactions at times that may be stepping on each other trying to add new properties to the schema. Thus, the locking is just a failsafe should something slip through our schema creation. What would be useful is perhaps an option to disable lazy creation so we have to use the schema creation API and can avoid the issue of locking/contention of multiple threads modifying the schema. Is this something reasonably straight-forward to add? |
It should be fairly easy to disable lazy creation. Just to be clear it will be up to the client to ensure that using the schema creation api does not lock. So either just force it to be in one thread or as in our case we have disparate parts of the system which can run in parallel as there is no risk of locking. |
Yep, that is. fine. We do the schema creation from a single node/thread as part of config/setup. |
When using our full application which is processing a stream of messages in a multi-threaded fashion, if vertices/edges/properties are added that require schema creation, we have seen cases where we get deadlocks with MySQL/MariaDB. We've taken steps to try to validate requests and not allow on-demand creation, but we've still seen cases of deadlocks timing out.
Here is an example of the stacktrace we see:
20190313 08:07:57.722 [ERROR] Camel (ngms-mmsender-context) thread #67 - JmsConsumer[ngmsEventRegistryEnrichedFeed] | 413:com.nextgate.graph.ngg-ngms-feed | com.nextgate.graph.ngmsfeed.Feed | Exception occurred processing message
java.lang.RuntimeException: Timeout lapsed to acquire write lock for notification.
at org.umlg.sqlg.structure.topology.Topology.z_internalSqlWriteLock(Topology.java:619)
at org.umlg.sqlg.structure.topology.Topology.lock(Topology.java:584)
at org.umlg.sqlg.structure.topology.VertexLabel.ensurePropertiesExist(VertexLabel.java:411)
at org.umlg.sqlg.structure.topology.Schema.ensureVertexLabelExist(Schema.java:179)
at org.umlg.sqlg.structure.topology.Schema.ensureVertexLabelExist(Schema.java:156)
at org.umlg.sqlg.structure.topology.Topology.ensureVertexLabelExist(Topology.java:763)
at org.umlg.sqlg.structure.SqlgGraph.addVertex(SqlgGraph.java:382)
at org.umlg.sqlg.step.SqlgAddVertexStartStep.processNextStart(SqlgAddVertexStartStep.java:61)
at org.umlg.sqlg.step.SqlgAbstractStep.next(SqlgAbstractStep.java:106)
at org.umlg.sqlg.step.SqlgAbstractStep.next(SqlgAbstractStep.java:17)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:200)
This is not easily reproducible since it is in context of our full application, but in an effort to reproduce it, I wrote a simple client that adds vertices/properties/edges, some adds that will cause on-demand schema creation, and running multiple threads to try to cause problems to occur.
The client is available at https://github.com/ktschmidt/sqlgstuff in the populatetest module. Included there is an output file from using MySQL/MariaDB with the following exception:
Exception in thread "Thread-5" java.lang.IllegalStateException: Edge vertex 'prior' does not exist in schema 'PUBLIC'
at com.google.common.base.Preconditions.checkState(Preconditions.java:817)
at org.umlg.sqlg.structure.topology.TopologyManager.addLabelToEdge(TopologyManager.java:695)
at org.umlg.sqlg.structure.topology.EdgeLabel.ensureEdgeVertexLabelExist(EdgeLabel.java:768)
at org.umlg.sqlg.structure.topology.Schema.internalEnsureEdgeTableExists(Schema.java:344)
at org.umlg.sqlg.structure.topology.Schema.ensureEdgeLabelExist(Schema.java:251)
at org.umlg.sqlg.structure.topology.Topology.ensureEdgeLabelExist(Topology.java:836)
at org.umlg.sqlg.structure.topology.Topology.ensureEdgeLabelExist(Topology.java:809)
at org.umlg.sqlg.structure.SqlgVertex.addEdgeInternal(SqlgVertex.java:171)
at org.umlg.sqlg.structure.SqlgVertex.addEdge(SqlgVertex.java:137)
at com.nextgate.test.sqlgtest.CreateData.run(CreateData.java:47)
at java.lang.Thread.run(Thread.java:748)
Followed by the following for each other thread.
Exception in thread "Thread-9" java.lang.RuntimeException: Timeout lapsed to acquire write lock for notification.
at org.umlg.sqlg.structure.topology.Topology.z_internalSqlWriteLock(Topology.java:619)
at org.umlg.sqlg.structure.topology.Topology.lock(Topology.java:584)
at org.umlg.sqlg.structure.topology.VertexLabel.ensurePropertiesExist(VertexLabel.java:411)
at org.umlg.sqlg.structure.topology.Schema.ensureVertexColumnsExist(Schema.java:488)
at org.umlg.sqlg.structure.topology.Topology.ensureVertexLabelPropertiesExist(Topology.java:902)
at org.umlg.sqlg.structure.SqlgElement.property(SqlgElement.java:239)
at org.umlg.sqlg.structure.SqlgVertex.property(SqlgVertex.java:211)
at com.nextgate.test.sqlgtest.CreateData.run(CreateData.java:43)
at java.lang.Thread.run(Thread.java:748)
I'm not sure why the first problem occurs, but once it does the other threads all deadlock. So perhaps two separate issues.
The above is somewhat repeatable by just running the client, and is run from a completely clean database, e.g. no sqlg_schema or gui_schema and an empty public database.
I also ran the client with PostgreSQL and the same problems don't occur, but an odd NullPointerException did occur:
Exception in thread "Thread-9" java.lang.NullPointerException
at org.umlg.sqlg.structure.SqlgEdge.internalAddEdge(SqlgEdge.java:169)
at org.umlg.sqlg.structure.SqlgEdge.insertEdge(SqlgEdge.java:134)
at org.umlg.sqlg.structure.SqlgEdge.(SqlgEdge.java:57)
at org.umlg.sqlg.structure.SqlgVertex.addEdgeInternal(SqlgVertex.java:175)
at org.umlg.sqlg.structure.SqlgVertex.addEdge(SqlgVertex.java:137)
The text was updated successfully, but these errors were encountered: