Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC-779 Multiple mesos executor fix #613

Merged
merged 2 commits into from
Sep 19, 2024
Merged

QC-779 Multiple mesos executor fix #613

merged 2 commits into from
Sep 19, 2024

Conversation

justonedev1
Copy link
Collaborator

sorry for some of the formatting fixes.
There was a race condition which was in method makeTaskForMesosResources where we sharing global object to write. This should create a deep copy of given object and it seems to fix the problem on staging.

} else {
log.Debug("scheduler quit, no errors")
state.sm.Event(context.Background(), "EXIT")
}
}()
}

func (state *schedulerState) CopyExecutor() *mesos.ExecutorInfo {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to CopyExecutorInfo for clarity?

also, perhaps proto.Clone() might be faster than marshalling and unmarshalling?

Copy link
Collaborator Author

@justonedev1 justonedev1 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I did not use Clone originally was, that it did not build, hence the stupid marshalling/unmarshalling. I got error

*mesos.ExecutorInfo does not implement protoreflect.ProtoMessage (missing method ProtoReflect)
but the object in defined in filename.pb.go

but now it works... ffs literally the same code I used earlier today. I wonder what I did differently before (I even copied the same code as before :D)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could have been that you imported a wrong lib, I saw in GoLand that you could pick three different libraries with proto.Clone

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe... truth is, that I let lsp handle adding dependencies...

@knopers8
Copy link
Collaborator

Is there still a risk that we deep-copy ExecutorInfo late enough that it might have been modified by another goroutine?

@justonedev1
Copy link
Collaborator Author

Nope, this was the only place where the executor from state is written to. It is supposed to be this immutable object that is safe to read from, so it is just created as a part of schedulerState here and the executorInfo is created few lines above here ... So nothing now modifies original executorInfo, just the copies I create.

@justonedev1
Copy link
Collaborator Author

justonedev1 commented Sep 18, 2024

And one more thing... this is a problem of golang, as there is no const keyword like in eg. c++, rust, ... If this was correctly written c++ code using const or rust code, this bug would not be able to happen. So maybe it might be worth it to note this somewhere, when someone will be doing +- of golang language...

… not race condition

creating copy of ExecutorInfo for each mesos offer so there is not race condition
@knopers8
Copy link
Collaborator

Nope, this was the only place where the executor from state is written to. It is supposed to be this immutable object that is safe to read from, so it is just created as a part of schedulerState here and the executorInfo is created few lines above here ... So nothing now modifies original executorInfo, just the copies I create.

Indeed, I missed the fact that executor ID is set just a line after the (now) deep-copy:

	executor := state.CopyExecutor()
	executor.ExecutorID.Value = taskPtr.GetExecutorId()

before I thought that the right executor ID is already in state.executor when we receive it.

all clear, thanks!

@@ -36,6 +36,7 @@ import (
"github.com/AliceO2Group/Control/common/logger/infologger"
"github.com/AliceO2Group/Control/core/controlcommands"
"github.com/AliceO2Group/Control/core/task/schedutil"
"github.com/gogo/protobuf/proto"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gogo/protobuf is deprecated though, can you try using a different one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is the official one now: https://github.com/golang/protobuf

Copy link
Collaborator Author

@justonedev1 justonedev1 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it will not work, see previous build error.. but this package is used in mesos, so we are using it either way inherently. So I don't mind using it for cloning an object, that is already created by this package in mesos

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I personally also don't mind using marshall and unmarshall because we are using it once per run and per offer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, since mesos brings it anyway, let's keep gogo/protobuf, thanks for checking.

@knopers8 knopers8 merged commit 09ec975 into master Sep 19, 2024
2 checks passed
@knopers8 knopers8 deleted the QC-779 branch September 19, 2024 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants