Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support task auto manage by server role state machine #2130

Merged
merged 16 commits into from
Mar 8, 2023
Merged

Conversation

zyxxoo
Copy link
Contributor

@zyxxoo zyxxoo commented Feb 26, 2023

No description provided.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 893 files.

Valid Invalid Ignored Fixed
670 3 220 0
Click to see the invalid file list
  • hugegraph-api/src/main/java/org/apache/hugegraph/core/RoleTypeDataAdapterImpl.java
  • hugegraph-api/src/main/java/org/apache/hugegraph/core/StateMachineCallbackImpl.java
  • hugegraph-core/src/main/java/org/apache/hugegraph/election/HugeRoleStateMachineConfig.java

@@ -0,0 +1,146 @@
package org.apache.hugegraph.core;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package org.apache.hugegraph.core;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hugegraph.core;

@@ -0,0 +1,74 @@
package org.apache.hugegraph.core;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package org.apache.hugegraph.core;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hugegraph.core;

@@ -0,0 +1,53 @@
package org.apache.hugegraph.election;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package org.apache.hugegraph.election;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hugegraph.election;

@zyxxoo zyxxoo force-pushed the zy_dev branch 4 times, most recently from c822b43 to 78cd8cd Compare February 26, 2023 11:52
@codecov
Copy link

codecov bot commented Feb 26, 2023

Codecov Report

Merging #2130 (5523064) into master (76fa644) will increase coverage by 6.25%.
The diff coverage is 79.52%.

@@             Coverage Diff              @@
##             master    #2130      +/-   ##
============================================
+ Coverage     62.60%   68.86%   +6.25%     
- Complexity      728      977     +249     
============================================
  Files           481      488       +7     
  Lines         39704    40090     +386     
  Branches       5581     5608      +27     
============================================
+ Hits          24857    27607    +2750     
+ Misses        12389     9824    -2565     
- Partials       2458     2659     +201     
Impacted Files Coverage Δ
...a/org/apache/hugegraph/api/gremlin/GremlinAPI.java 100.00% <ø> (ø)
...ava/org/apache/hugegraph/api/job/AlgorithmAPI.java 0.00% <ø> (ø)
...java/org/apache/hugegraph/api/job/ComputerAPI.java 0.00% <ø> (ø)
.../java/org/apache/hugegraph/api/job/GremlinAPI.java 50.72% <ø> (ø)
.../java/org/apache/hugegraph/api/job/RebuildAPI.java 42.85% <ø> (ø)
...ain/java/org/apache/hugegraph/api/job/TaskAPI.java 52.08% <ø> (ø)
...in/java/org/apache/hugegraph/api/raft/RaftAPI.java 17.30% <ø> (ø)
...org/apache/hugegraph/api/schema/IndexLabelAPI.java 75.30% <0.00%> (-0.95%) ⬇️
...ava/org/apache/hugegraph/api/schema/SchemaAPI.java 100.00% <ø> (ø)
...rg/apache/hugegraph/api/schema/VertexLabelAPI.java 59.72% <0.00%> (-1.71%) ⬇️
... and 82 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more


private final TaskManager taskManager;

boolean isMaster = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark private volatile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the filed only use on one thread, so don't volatile

import org.apache.hugegraph.util.Log;
import org.slf4j.Logger;

public class StateMachineCallbackImpl implements StateMachineCallback {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not submit StateMachineCallback file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is mainly responsible for state machine and external interactions, so it was not previously committed

import org.apache.tinkerpop.gremlin.structure.T;
import org.apache.tinkerpop.gremlin.structure.Vertex;

public class RoleTypeDataAdapterImpl implements RoleTypeDataAdapter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoleTypeDataAdapter means RoleTypeStore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the adapter responsibility mainly applies to storage

@zyxxoo zyxxoo force-pushed the zy_dev branch 3 times, most recently from f7c2e6a to 5f17594 Compare February 27, 2023 03:27
E.checkArgument(this.roleStateWorker == null, "Repetition init");
this.applyThread = Executors.newSingleThreadExecutor();
this.roleStateWorker = this.authenticator().graph().roleElectionStateMachine();
this.applyThread.execute(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can move applyThread to RoleElectionStateMachine class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to write a single test with asynchronous threads outside

@simon824
Copy link
Member

Hi @zyxxoo , in the previous version, master schedule tasks and if the task is sent to the worker, will cause Exception 'Can't schedule task on non-master server'.
Will the worker forward the task to the master in this feature? because the master role will change, it is difficult to determine the master ip.

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Feb 28, 2023

Hi @zyxxoo , in the previous version, master schedule tasks and if the task is sent to the worker, will cause Exception 'Can't schedule task on non-master server'. Will the worker forward the task to the master in this feature? because the master role will change, it is difficult to determine the master ip.

yes, we will support redirect request to master node in this feature

@zyxxoo zyxxoo force-pushed the zy_dev branch 5 times, most recently from 32075c4 to fab5a17 Compare February 28, 2023 11:06

@Override
public void filter(ContainerRequestContext requestContext) throws IOException {
GlobalMasterInfo instance = GlobalMasterInfo.instance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get from graphManager.auth.masterInfo?

Copy link
Contributor Author

@zyxxoo zyxxoo Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have change to "graphManager.globalMasterInfo"

"server.role.node_external_url",
"The url of external accessibility",
disallowEmpty(),
"127.0.0.1:8080"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can pass it from restserver config to graph config, like rpc url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the question raised
now, I put the config to resetserver, but the stateMachine put in graph instance, so these config must be copy to graph config

@zyxxoo zyxxoo force-pushed the zy_dev branch 4 times, most recently from 5d065c8 to 85b722a Compare March 1, 2023 13:23
@@ -53,6 +53,7 @@ public class RebuildAPI extends API {
@Status(Status.ACCEPTED)
@Produces(APPLICATION_JSON_WITH_CHARSET)
@RolesAllowed({"admin", "$owner=$graph $action=index_write"})
@RedirectFilter.RedirectMasterRole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ACCEPTED and RedirectMasterRole often appear simultaneously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if ACCEPTED appear, the RedirectMasterRole also appear

@zyxxoo zyxxoo force-pushed the zy_dev branch 3 times, most recently from 68e186e to ae7954a Compare March 5, 2023 04:20
javeme
javeme previously approved these changes Mar 6, 2023
@@ -49,7 +53,11 @@ public void apply(StateMachineCallback stateMachineCallback) {
while (!this.shutdown) {
E.checkArgumentNotNull(this.state, "State don't be null");
try {
RoleState pre = this.state;
this.state = state.transform(context);
Copy link
Member

@imbajin imbajin Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: state.transform is not an atomic operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the machine run in single thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskManager and globalMadterInfo are use lock to sync

@@ -234,7 +240,7 @@ public boolean updateIfNodePresent(RoleTypeData stateData) {
if (Objects.equals(value.node(), copy.node()) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the value could be null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value dont become null

@zyxxoo zyxxoo merged commit 31c6c6d into master Mar 8, 2023
@zyxxoo zyxxoo deleted the zy_dev branch March 8, 2023 15:20
@wuchaojing
Copy link
Contributor

wuchaojing commented Apr 11, 2023

image
image

use hubble, submit task still fail on the worker, but success on the master

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

image image

use hubble, submit task still fail on the worker, but success on the master

目前在拦截器这一层做了转发,但是这期间如果发生了 master 切换,因为缓存原因还是会转发到旧的节点,这时候就会有这个错误,这里考虑性能并不会转发多次,可以考虑重试下这个语句应该就不会报错了

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

不过目前来看, g.V().limit() 这条语句应该涉及不到 task, 这里应该有 bug

@wuchaojing
Copy link
Contributor

image image
use hubble, submit task still fail on the worker, but success on the master

目前在拦截器这一层做了转发,但是这期间如果发生了 master 切换,因为缓存原因还是会转发到旧的节点,这时候就会有这个错误,这里考虑性能并不会转发多次,可以考虑重试下这个语句应该就不会报错了

集群run起来之后没有发生master切换,多次尝试g.V().limit()还是会失败

@wuchaojing
Copy link
Contributor

wuchaojing commented Apr 12, 2023

不过目前来看, g.V().limit() 这条语句应该涉及不到 task, 这里应该有 bug

我理解执行任务就会生成一个task吧,跟语句本身没有关系吧

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

不过目前来看, g.V().limit() 这条语句应该涉及不到 task, 这里应该有 bug

我理解执行任务就会生成一个task吧,跟语句本身没有关系吧

ok, 看着这里执行任务应该创建了一个task

执行这条语句一直失败吗?后端用什么存储呢?

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

image image
use hubble, submit task still fail on the worker, but success on the master

目前在拦截器这一层做了转发,但是这期间如果发生了 master 切换,因为缓存原因还是会转发到旧的节点,这时候就会有这个错误,这里考虑性能并不会转发多次,可以考虑重试下这个语句应该就不会报错了

集群run起来之后没有发生master切换,多次尝试g.V().limit()还是会失败

估计是拦截器漏加了,这个是指定方法才会生效

@wuchaojing
Copy link
Contributor

不过目前来看, g.V().limit() 这条语句应该涉及不到 task, 这里应该有 bug

我理解执行任务就会生成一个task吧,跟语句本身没有关系吧

ok, 看着这里执行任务应该创建了一个task

执行这条语句一直失败吗?后端用什么存储呢?

一直失败,使用rocksdb存储。报错都是Can't schedule task on non-master server

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

不过目前来看, g.V().limit() 这条语句应该涉及不到 task, 这里应该有 bug

我理解执行任务就会生成一个task吧,跟语句本身没有关系吧

ok, 看着这里执行任务应该创建了一个task
执行这条语句一直失败吗?后端用什么存储呢?

一直失败,使用rocksdb存储。报错都是Can't schedule task on non-master server

image image
use hubble, submit task still fail on the worker, but success on the master

目前在拦截器这一层做了转发,但是这期间如果发生了 master 切换,因为缓存原因还是会转发到旧的节点,这时候就会有这个错误,这里考虑性能并不会转发多次,可以考虑重试下这个语句应该就不会报错了

集群run起来之后没有发生master切换,多次尝试g.V().limit()还是会失败

估计是拦截器漏加了,这个是指定方法才会生效

还有,必须配置了授权才能生效,配置auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator 了吗?

@zyxxoo
Copy link
Contributor Author

zyxxoo commented Apr 12, 2023

@wuchaojing , 可以在检查下上面配置,应该是 ok的,拦截器没有漏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants