Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nebd-server: drop failed reqeusts' rpc #212

Merged
merged 1 commit into from
Jan 28, 2021

Conversation

SeanHai
Copy link
Contributor

@SeanHai SeanHai commented Jan 6, 2021

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

What's Changed:

How it Works:

Side effects(Breaking backward compatibility? Performance regression?):

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@@ -47,6 +47,9 @@ void NebdFileServiceCallback(NebdServerAioContext* context) {
response->set_retcode(RetCode::kNoOK);
LOG(ERROR) << "Read file failed. "
<< "return code: " << context->ret;
// drop the rpc to ensure not return ioerror
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以把所有的context->ret<0的分支单独提出来放到一起处理,然后再添加一个配置项,控制error的情况下,是否返回rpc。

@@ -358,18 +358,17 @@ TEST_F(FileServiceTest, CallbackTest) {
{
brpc::Controller cntl;
nebd::client::ReadResponse response;
FileServiceTestClosure done;
FileServiceTestClosure* done = new FileServiceTestClosure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个添加配置后,两种都测试一下。

context->buf = new butil::IOBuf();
context->ret = -1;
NebdFileServiceCallback(context);
ASSERT_TRUE(done.IsRunned());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以判断一下ASSERT_FALSE吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以判断一下ASSERT_FALSE吧

因为 NebdFileServiceCallback(context); 这里就不done释放了

@@ -32,71 +32,62 @@ namespace server {

using nebd::client::RetCode;

bool returnRpcWhenIoError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尽量不要把控制参数用全局变量表示。
这里可以在NebdServerAioContext里面加一个变量,表示在io error的情况下,是否返回请求。
在new NebdServerAioContext的地方设置变量。
所以参数保存到NebdFileServiceImpl里面。
然后配置文件的参数,通过构造函数初始化NebdFileServiceImpl里面的参数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尽量不要把控制参数用全局变量表示。
这里可以在NebdServerAioContext里面加一个变量,表示在io error的情况下,是否返回请求。
在new NebdServerAioContext的地方设置变量。
所以参数保存到NebdFileServiceImpl里面。
然后配置文件的参数,通过构造函数初始化NebdFileServiceImpl里面的参数。

done

@@ -38,6 +38,8 @@ std::string NebdFileType2Str(NebdFileType type);

std::string NebdFileStatus2Str(NebdFileStatus status);

std::string Op2Str(LIBAIO_OP op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个实现在哪里?
或者直接对LIBAIO_OP定义一个operator <<

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个实现在哪里?
或者直接对LIBAIO_OP定义一个operator <<

原来就有,在util.cpp中,但是头文件中没声明

} else {
// io error
if (context->ret < 0) {
LOG(ERROR) << Op2Str(context->op) << " file failed. "
Copy link
Contributor

@wu-hanqing wu-hanqing Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以直接用 LOG(ERROR) << *context吧?
util.h里面声明了std::ostream& operator<<(std::ostream& os, const NebdServerAioContext& c); 函数。
这个函数可以修改一下,加一下context->returnRpcWhenIoError的输出

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以直接用 LOG(ERROR) << *context吧?
util.h里面声明了std::ostream& operator<<(std::ostream& os, const NebdServerAioContext& c); 函数。
这个函数可以修改一下,加一下context->returnRpcWhenIoError的输出

done

Copy link
Contributor

@aspirer aspirer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not return io error to upper layer:这个标题改下?IoError应该是个专有名词吧?upper layer应该是指我们的client sdk吧?是不是改成 chunkserver: Never return IoError to client 比较好?

@SeanHai
Copy link
Contributor Author

SeanHai commented Jan 12, 2021

not return io error to upper layer:这个标题改下?IoError应该是个专有名词吧?upper layer应该是指我们的client sdk吧?是不是改成 chunkserver: Never return IoError to client 比较好?

这里主要是在nebd-server中将发生错误的rpc进行丢弃,目的是让上次IO卡住,等下层恢复后IO可以恢复,改成nebd-server: drop failed reqeusts' rpc

@SeanHai SeanHai changed the title not return io error to upper layer nebd-server: drop failed reqeusts' rpc Jan 12, 2021
} else {
// io error
if (context->ret < 0) {
LOG(ERROR) << *context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*context能正常输出吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*context能正常输出吗

可以

return;
}
} else {
switch (context->op) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这下面都没有处理returnRpcWhenIoError为true的情况

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这下面都没有处理returnRpcWhenIoError为true的情况

done

// don't return rpc
if (!context->returnRpcWhenIoError) {
// drop the rpc to ensure not return ioerror
doneGuard.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不return rpc,error日志打一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不return rpc,error日志打一下

done

@SeanHai SeanHai force-pushed the not_return_ioerror branch 5 times, most recently from c2c27e5 to 1948e6f Compare January 20, 2021 12:14
LOG(ERROR) << "Read file failed. "
<< "return code: " << context->ret;
LOG(ERROR) << "Read file failed: " << *context;
// don't return rpc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里处理重复的很多,是否能封装一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里处理重复的很多,是否能封装一下

封装了一个函数SetResponse,主要是在正确返回(context->ret>0)和发生IOError时返回rpc时设置response

@SeanHai SeanHai force-pushed the not_return_ioerror branch 2 times, most recently from 01bf07b to 8086cb5 Compare January 22, 2021 02:54
brpc::ClosureGuard doneGuard(context->done);

// io error
if (context->ret < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(context->ret < 0 && !context->returnRpcWhenIoError) {
LOG(ERROR) << *context;
doneGuard.release();
delete context->done;
LOG(ERROR) << Op2Str(context->op)
<< " file failed and drop the request rpc.";
} else {
SetResponse(context, RetCode::kOK);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想起来还有一个,为什么之前少了一个else分支,但是测试没测出来?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(context->ret < 0 && !context->returnRpcWhenIoError) {
LOG(ERROR) << *context;
doneGuard.release();
delete context->done;
LOG(ERROR) << Op2Str(context->op)
<< " file failed and drop the request rpc.";
} else {
SetResponse(context, RetCode::kOK);
}

这里主要是两层判断:
1:如果context->ret < 0 说明发生了Error,会打印一条IO Error的错误日志;如果context->ret >= 0 说明没有发生错误直接SetResponse(context, RetCode::kOK);
2:在发生错误的分支里,如果不需要向上层返回Error则丢弃rpc,并打印一条丢弃rpc的错误日志;如果需要向上层返回则设置 SetResponse(context, RetCode::kNoOK);

这样改的话逻辑不对,针对发生错误但是需要向上返回的 response设置了SetResponse(context, RetCode::kOK);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想起来还有一个,为什么之前少了一个else分支,但是测试没测出来?

因为单测里是判断返回的response->retcode为RetCode::kNoOK(-1),在没有返回的情况下,这个条件也是成立的,这次补充了测试(在发生错误后的返回码设置成别的值,单测能够判断到)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(context->ret < 0 && !context->returnRpcWhenIoError) {
LOG(ERROR) << *context;
doneGuard.release();
delete context->done;
LOG(ERROR) << Op2Str(context->op)
<< " file failed and drop the request rpc.";
} else if (context->ret < 0) {
} else {
}

<< "return code: " << context->ret;
} else {
response->set_retcode(retCode);
if (context->ret >= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???漏删了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???漏删了

不是,在READ的正常返回除了需要设置response的retcode还需要把读取的数据拷贝进去

@SeanHai SeanHai force-pushed the not_return_ioerror branch 8 times, most recently from f994a42 to 8a73898 Compare January 27, 2021 03:40
@SeanHai SeanHai force-pushed the not_return_ioerror branch 2 times, most recently from f91b7df to 50a287e Compare January 27, 2021 12:29
@ilixiaocui ilixiaocui merged commit fef25e0 into opencurve:master Jan 28, 2021
ilixiaocui pushed a commit to ilixiaocui/curve that referenced this pull request Feb 6, 2023
add myself as co-mentor for retroactive rules project + add RW project
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants