Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to check for DDSs sending duplicate attach ops on handle encode #20995

Merged
merged 21 commits into from
May 9, 2024

Conversation

jzaffiro
Copy link
Contributor

@jzaffiro jzaffiro commented May 6, 2024

Adds end to end tests to validate that each DDS does not send duplicate attach ops when handles are encoded. This bug was added in #19242, and deals with a second attach op being sent when there is a handle from one detached DDS to another and the DDS is then attached.

AB#7936

@github-actions github-actions bot added area: dds Issues related to distributed data structures area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch labels May 6, 2024
const dataObjectB = (await dataStoreB.entryPoint.get()) as ITestFluidObject;
idB = dataObjectB.context.id;

cellRoot.set(dataObjectB.handle);
Copy link
Contributor

@anthony-murphy anthony-murphy May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need a test per dds, as grouping them could miss problems, as if any dds binds the handle it will work right, so we miss broken dds.

i would imagine this test using a for loop to run the same test for each dds.

for(const storeHandle in [...]){
  it(`store handle in dds: $(ddsName)`,()=>{

    //setup the detached stuff
    storeHandle(handle)
    // verify

  }
}

maybe the loop would run over an array of funtion that take the defaultDataStore, and store in the dds like:

(defaultDataStore, handle)=>{
const cellRoot = await defaultDataStore.getSharedObject(cellId);
cellRoot.set(handle);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my impression was that the interesting part was not storing dataObjectB.handle, but rather the handle from B's root to the other DDSes in B. So the test case would be to store a handle from a particular DDS to some other DDS - all in a DataStore that is currently detached.

const defaultDataStore = (await container.getEntryPoint()) as ITestFluidObject;
const runtime = defaultDataStore.context.containerRuntime;

const dataStoreB = await runtime.createDataStore(["default"]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using a simpler container schema / default data store, to be more explicit in the test code itself about what's going on. IIRC, the key part of the setup is that dataStoreB has internal handled between its DDSes (in a Map). Better to have that be explicit in the test code itself.

Probably will get here anyway by following Tony's suggestions below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably be resolved in a follow up, just so we can have some coverage immediately.

@github-actions github-actions bot removed the area: dds Issues related to distributed data structures label May 6, 2024
[cellId, SharedCell.getFactory()],
[counterId, SharedCounter.getFactory()],
[directoryId, SharedDirectory.getFactory()],
// [treeId, SharedTree.getFactory()],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tree implementations are not commented out, I get an error that SharedTree is invalid or not implemented. It might have to do with the fact that SharedTree is not in the test APIs, but I'm not sure yet.

@jzaffiro jzaffiro marked this pull request as ready for review May 7, 2024 19:34
await waitForContainerConnection(container2);
const default2 = (await container2.getEntryPoint()) as ITestFluidObject;

const handleB = (await handle.readHandle(default2)) as IFluidHandle<ITestFluidObject>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should avoid casting. in the stress test we use isFluidHandle, we should use that here like assert(isFuildHandle(handle2)) rather than casting

@msfluid-bot
Copy link
Collaborator

Could not find a usable baseline build with search starting at CI 36f78b9

Generated by 🚫 dangerJS against b3894e0

@anthony-murphy anthony-murphy merged commit 7119a13 into microsoft:main May 9, 2024
30 checks passed
@jzaffiro jzaffiro deleted the duplicateAttachOps branch May 9, 2024 20:10
anthony-murphy added a commit that referenced this pull request May 9, 2024
…ed (#20998)

the primary change is to move getting the attachment summary before
transitioning to attached for datastores. This also matches how the
container works, specifically, capture the summary, and then transition
to attaching. Since this code is all synchronous the risk of something
bad happening is small, or about the same as before.

Regardless of the order, having the summary retrieval decoupled from the
state transition leaves open the possibility for bugs. With this change
it is possible the dds state is modified after the summary is captured,
so it won't be in the summary or in ops, only in memory. Conversely, in
the existing code if a change happened after the attach state changed it
would get an op, but it would also be in the summary, so it gets
duplicated. Both of these are data corruption.

The better and much bigger fix would be to get the summary as part of
the transition, so as each ddses attachment summary is retrieved it then
transition to attaching. this would be a substantial refactoring, and
due to the code being synchronous the opportunity for these types of
issues is small, and would likely represent a framework issue.

Testing done as part of #20995

---------

Co-authored-by: Tony Murphy <anthonm@microsoft.com>
}

const container2 = await provider.loadTestContainer(testContainerConfig);
if (container2.deltaManager.lastSequenceNumber < seq) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, almost... it doesn't support passing a custom target in. Oh well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe if you do provider.ensuresynchronized before loading container2, it would work

anthony-murphy added a commit to anthony-murphy/FluidFramework-1 that referenced this pull request May 9, 2024
…ed (microsoft#20998)

the primary change is to move getting the attachment summary before
transitioning to attached for datastores. This also matches how the
container works, specifically, capture the summary, and then transition
to attaching. Since this code is all synchronous the risk of something
bad happening is small, or about the same as before.

Regardless of the order, having the summary retrieval decoupled from the
state transition leaves open the possibility for bugs. With this change
it is possible the dds state is modified after the summary is captured,
so it won't be in the summary or in ops, only in memory. Conversely, in
the existing code if a change happened after the attach state changed it
would get an op, but it would also be in the summary, so it gets
duplicated. Both of these are data corruption.

The better and much bigger fix would be to get the summary as part of
the transition, so as each ddses attachment summary is retrieved it then
transition to attaching. this would be a substantial refactoring, and
due to the code being synchronous the opportunity for these types of
issues is small, and would likely represent a framework issue.

Testing done as part of microsoft#20995

---------

Co-authored-by: Tony Murphy <anthonm@microsoft.com>
anthony-murphy added a commit to anthony-murphy/FluidFramework-1 that referenced this pull request May 9, 2024
…code (microsoft#20995)

Adds end to end tests to validate that each DDS does not send duplicate
attach ops when handles are encoded. This bug was added in microsoft#19242, and
deals with a second attach op being sent when there is a handle from one
detached DDS to another and the DDS is then attached.


[AB#7936](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/7936)

---------

Co-authored-by: Mark Fields <markfields@users.noreply.github.com>
Co-authored-by: Tony Murphy <anthony.murphy@microsoft.com>
const dataStoreB = await runtime.createDataStore(["default"]);
const dataObjectB = (await dataStoreB.entryPoint.get()) as ITestFluidObject;

await handle.storeHandle(defaultDataStore, dataObjectB.handle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzaffiro - This is a different test case than what I remember seeing last week.

This:

  1. DefaultDataStore is attached (of course). Create DataStoreB (it's detached).
  2. DefaultDataStore's DDS of each type stores a handle to DataStoreB (thus attaching it)

What I remember:

  1. DefaultDataStore is attached (of course). Create DataStoreB (it's detached).
  2. DataStoreB's DDS of each type stores a handle to another DDS within DataStoreB (all the while it's detached)
  3. Then store a handle to DataStoreB from DefaultDataStore's root -- now DataStoreB (and all its children) attach.

Did you confirm that these tests fail without the bindHandles call and without Tony's fix?

Copy link
Contributor Author

@jzaffiro jzaffiro May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests do fail without calling bindHandles and without Tony's fix. I also might have gotten confused on the test scenario, but looking at the commit we protoyped it seems like the test case was:

  1. DefaultDataStore is attached (of course). Create DataStoreB (it's detached).
  2. Store a handle to DataObjectB in the root of DefaultDataStore (effectively storing it in a map)

I planned on doing a follow up anyway, so I can add the second test case you point out in that PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know you verified the test coverage.

As for the other test case - you're right that this is what the test was doing. But when we were debugging, it was the test fluid object initialization for DataStoreB that was storing handles between DDSes, and that was the part that was causing multiple attach ops. Super subtle.

My concern is that all these test cases are actually just testing Map under the hood, because it's in DataStoreB's root Map where the handles are stored that actually matter.

cc @anthony-murphy as FYI

anthony-murphy added a commit that referenced this pull request May 10, 2024
…ary retrieved (#21032)

Ports #20998 and some tests #20995

---------

Co-authored-by: Tony Murphy <anthonm@microsoft.com>
Co-authored-by: jzaffiro <110866475+jzaffiro@users.noreply.github.com>
Co-authored-by: Mark Fields <markfields@users.noreply.github.com>
jzaffiro added a commit that referenced this pull request May 28, 2024
With the change to the ordering of getting the attachment summary in
#20998 and the testing in #20995 and #21132, we can now remove the
workarounds that call bindHandles or makeSerializable when not in an
attached state.
Abe27342 added a commit that referenced this pull request Jul 26, 2024
## Description

Adds back `ConsensusQueue` to the handle validation tests. The test had
a few issues preventing it from working with ConsensusQueue. The direct
cause for the hang that the original bug reported is that container1 was
disposed before attempting to read the handle, and ConsensusQueue
requires ops to round-trip before its data can be read.

While working on this, I also refactored the affected test and suite a
fair bit for clarity, and tweaked what the test case was doing. I split
it into a couple different cases based on what the handle in the DDS was
referencing, as [previous
discussion](#20995 (comment))
on the topic mentioned the problematic case was something not being
explicitly tested.


[AB#8058](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/8058)

---------

Co-authored-by: Abram Sanderson <absander@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants