Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MONO][Marshal] Fix race condition in marshal callback installation #77383

Closed
wants to merge 2 commits into from

Conversation

naricc
Copy link
Member

@naricc naricc commented Oct 24, 2022

This fixes a race condition on ilgen_cb_inited; this flag was used to determine if callbacks were already installed, but had no synchronization around it. Replaced with a pointer to a dynamically allocated buffer and a cas.

Fixes: #74603
Fixes: #77090

@naricc
Copy link
Member Author

naricc commented Oct 24, 2022

/azp run runtime-extra-platforms

@naricc
Copy link
Member Author

naricc commented Oct 24, 2022

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


void
mono_install_marshal_callbacks_ilgen (MonoMarshalIlgenCallbacks *cb)
{
g_assert (!ilgen_cb_inited);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the race occur ? This should be called during startup.

Copy link
Member Author

@naricc naricc Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Was reporting what looks like a race condition here: #77090. I'm not sure exactly what interleaving can cause this though, and have not been able to trigger it locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vargaz @naricc since e107bcb, mono_marshal_ilgen_init may be called by embedders early, but it doesn't have to be called. In the case where the driver doesn't call it early (for example the normal desktop mono driver doesn't call it), get_marshal_cb will initialize lazily. The race is in the lazy case where one thread can see ilgen_cb_inited == TRUE, but the callbacks are all still null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it get called during runtime startup ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it get called during runtime startup ?

I was surprised, but apparently it doesn't. Going back to the very first PR it's always been lazy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now on wasm at least, all such initialization needs to happen before the runtime is initialized, i.e.
https://github.com/dotnet/runtime/blob/main/src/mono/wasm/runtime/driver.c#L593
When mono_jit_init_version () is called, it can check whenever the embedder has made some custom changes, and if not, initialize things the default way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I don't have any objection to just using mono_marshal_ilgen_init to set a flag and doing all the callback initialization late during startup. @naricc I think that's what you wanted to do for the component in the first place right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdageek That is what I am doing for the componentization change.

So is that what we should do here too, instead of the thing with atomics? mono_marshal_ilgen_init sets a flag; we check that flag during, say, mini_init(). At which point we do callback installation if it is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the case that we are using NOILGEN and no one has called mono_marshal_ilgen_init we just want to go ahead and install the noilgen callbacks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that sounds right.

So is that what we should do here too, instead of the thing with atomics? mono_marshal_ilgen_init sets a flag; we check that flag during, say, mini_init(). At which point we do callback installation if it is set?

Yea, basically at the point where in the componentized versino we'd call the component init function, here we'd just set up the callbacks.

I think writing down #77383 (comment) helped me to think about what the real problem is. It's not concurrency. It's that we have a confusing API for choosing between ilgen/noilgen. We should init the marshaling stack at startup - there's no need for it to be lazy. You were right in your proposal about mono_marshal_ilgen_init (just use it to set a flag), but I didn't understand the problem properly.

lambdageek
lambdageek previously approved these changes Oct 24, 2022
Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. needs barriers. also some style nits

memcpy (&ilgen_marshal_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
ilgen_cb_inited = TRUE;
}
MonoMarshalIlgenCallbacks* local_cb = (MonoMarshalIlgenCallbacks*)malloc(sizeof(MonoMarshalIlgenCallbacks));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use g_malloc/g_free


if (mono_atomic_cas_ptr((void**)&ilgen_marshal_cb, local_cb, NULL ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (mono_atomic_cas_ptr((void**)&ilgen_marshal_cb, local_cb, NULL ))
if (mono_atomic_cas_ptr((void**)&ilgen_marshal_cb, local_cb, NULL ) != NULL)

nit: usually the pattern is if (mono_atomic_cas_ptr (dest, newValue, expectedValue) != expectedValue)

ilgen_cb_inited = TRUE;
}
MonoMarshalIlgenCallbacks* local_cb = (MonoMarshalIlgenCallbacks*)malloc(sizeof(MonoMarshalIlgenCallbacks));
memcpy (local_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a memory barrier between the memcpy and the CAS

Suggested change
memcpy (local_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
memcpy (local_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
mono_memory_barrier ();

MonoMarshalLightweightCallbacks* local_cb = (MonoMarshalLightweightCallbacks*)malloc(sizeof(MonoMarshalLightweightCallbacks));
memcpy (local_cb, cb, sizeof (MonoMarshalLightweightCallbacks));

if (mono_atomic_cas_ptr((void**)&marshal_lightweight_cb, local_cb, NULL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as the other method:

  1. use g_malloc/g_free
  2. barrier after memcpy
  3. add != NULL to the if condition

@lambdageek
Copy link
Member

Added fix for race condition.

Please use a better summary. something like [mono][marshal] Fix race condition in callback initialization

@lambdageek lambdageek dismissed their stale review October 24, 2022 19:24

I guess we're going with a flag instead of atomics

@naricc naricc changed the title Added fix for race condition. [MONO] Fix race condition in marshal callback installation Oct 24, 2022
@naricc naricc changed the title [MONO] Fix race condition in marshal callback installation [MONO][Marshal] Fix race condition in marshal callback installation Oct 24, 2022
@naricc
Copy link
Member Author

naricc commented Oct 25, 2022

I am abandoing this PR in favor of doing it the flag way ( see discussion above). That is a different change, so I will do it in a different PR.

@naricc naricc closed this Oct 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants