Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sync writes to multiMerge() in WebStorage. Fix caching when multiMerge() runs. #122

Merged
merged 9 commits into from
Mar 18, 2022

Conversation

marcaaron
Copy link
Contributor

Details

cc @Julesssss @kidroca Second attempt at alleviating the IDB issues

Related Issues

#119

Automated Tests

Added a new automated test specifically for the weirdness reported here.

Linked PRs

Expensify/App#8053

@marcaaron marcaaron self-assigned this Mar 10, 2022
Copy link
Contributor

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have a problem with multiMerge, though it might be more subtle now

Comment on lines +771 to +776
// Prefill cache if necessary by calling get() on any existing keys and then merge original data to cache
// and update all subscribers
Promise.all(_.map(existingKeys, get)).then(() => {
cache.merge(collection);
keysChanged(collectionKey, collection);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but:

  1. it suffer from the same problem we discovered - if get has to read from disk, data might not yet be written to disk (when write is still pending), and still result in partial data being added to cache
  2. delaying cache update opens a door for problems - other get/connect calls might use stale cache information
  3. it might delay the UI update a bit, since now keysChanged would be delayed if get actually has to read something

Point 2) is probably covered by keysChnaged being delayed as well - anyone that read stale information would be notified of an update, but we'd still need to explain that and warn anyone to keep usages together in that order.

The goal with cache is to update it sync as fast as possible to avoid concurrency issues
This solves problems like

  1. set starts and sets key: alpha
  2. it immediately updates cache before write to disk is complete
  3. get starts for key: alpha
  4. it receives the latest value even before it's written to disk

Perhaps you'd like this alternative:

  1. Keep code here unchanged
  2. Update code in cache.merge to exclude merging any keys that are not currently present in cache

The idea is:
If we lack data in cache, we skip merging potentially partial data
When someone actually needs to get this key, it would trigger a read from disk and re-populate cache with correct data

We don't have many usages of cache.merge but this could address the partial data problem for all such usages

Copy link
Contributor Author

@marcaaron marcaaron Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if get has to read from disk, data might not yet be written to disk (when write is still pending), and still result in partial data being added to cache

If you can, please explain how exactly it can result in partial data being added to the cache? I would love to write a test for it and we can come up with a solution if it's possible, but I want to make sure there we understand how it can happen.

delaying cache update opens a door for problems - other get/connect calls might use stale cache information

One alternative I can think of is to prefill the cache with all data before Onyx does any work at all (guessing most front end caches work this way).

it might delay the UI update a bit, since now keysChanged would be delayed if get actually has to read something

Yes, but better for something to be delayed than send partial data without previous data initially in storage.

The goal with cache is to update it sync as fast as possible to avoid concurrency issues

I think I understand this argument, but I'm not sure that's how merge() works today. Here's the implementation for merge() which also potentially blocks with a get() call

return get(key)
.then((data) => {
try {
const modifiedData = applyMerge(key, data);
// Clean up the write queue so we
// don't apply these changes again
delete mergeQueue[key];
return set(key, modifiedData);

set() is the only thing which modifies the data in the cache first before reading existing values

cache.set(key, value);
// Optimistically inform subscribers on the next tick
Promise.resolve().then(() => keyChanged(key, value));

Perhaps you'd like this alternative:
Keep code here unchanged
Update code in cache.merge to exclude merging any keys that are not currently present in cache

I do like this idea and it solves this problem in a different way, but we would defer the filling of the cache. I'm not sure if that is bad or good or doesn't make a difference and would be willing to try it if it sounds simpler. Thanks!

Copy link
Contributor Author

@marcaaron marcaaron Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so pulling on that last thread and I can't get it working. It leads to a stale cache that is missing the data. Here are the changes I made:

  • Reverted the change above.
  • Added this so we would not set data if no existing key was in the cache
@@ -107,11 +110,19 @@ class OnyxCache {
     }

     /**
-     * Deep merge data to cache, any non existing keys will be created
+     * Deep merge data to cache, any non existing keys will not be merged
      * @param {Record<string, *>} data - a map of (cache) key - values
      */
     merge(data) {
-        this.storageMap = lodashMerge({}, this.storageMap, data);
+        // Filter any keys we have not yet cached
+        const newData = _.reduce(data, (prev, curr, key) => {
+            if (this.hasCacheForKey(key)) {
+                return {...prev, [key]: curr};
+            }
+            return prev;
+        }, {});
+
+        this.storageMap = lodashMerge({}, this.storageMap, newData);

         const storageKeys = this.getAllKeys();
  • Re-ran tests and added a subscriber (which should trigger the cache fill)
  • Storage ends up correct, but cache is wrong

I think it's because the subscriber can be added at anytime while we are adding new data. We already skipped the cache for any consecutive calls to mergeCollection() and yes they are making it to storage correctly. But if you add a subscriber at the same time while this process is happened it will only get the result of what has been written and save that to the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the subscriber can be added at anytime while we are adding new data. We already skipped the cache for any consecutive calls to mergeCollection() and yes they are making it to storage correctly. But if you add a subscriber at the same time while this process is happened it will only get the result of what has been written and save that to the cache.

Yeah, this case slipped my mind

If you can, please explain how exactly it can result in partial data being added to the cache? I would love to write a test for it and we can come up with a solution if it's possible, but I want to make sure there we understand how it can happen.

I think you answered that yourself at the end or at least the reason is similar

  1. We have a mergeCollection call and then another mergeCollection call(s) in close proximity
  2. Writes are scheduled and happen one by one
  3. Promise.all(_.map(existingKeys, get)) is invoked before scheduled writes complete
  4. When some of the existingKeys is not available in cache
  5. get from disk is triggered - it reads older data and puts it in cache

To capture that in a test we're looking for a way to:

  1. Have storage in a state where it has { a: 1, b: 1, c: 1 } for a given key
  2. Have a queue of writes happening and one particular key being written at the end of the queue. Changing b: 2
  3. Have cache in a state where it lacks the given key
  4. Call mergeCollection to merge multiple updates, one of them regarding key setting c: 2

I think that what we have here with Promise.all would result in cache ending up with

{ a: 1, b: 1, c: 2 }

vs

{ a: 1, b: 2, c: 2 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kidroca, I couldn't figure this one out, but got your other idea working. Let me know if you have any other thoughts here.

Copy link
Contributor

@kidroca kidroca Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prefilling cache might be our safest and optimal option here

It seems that for Native:

  1. Calling AsyncStorage.get
  2. Then calling AsyncStorage.set, before the get is completed
  3. get would resolve with the value from step 2)

I'm not certain this would work out the same way on Web/Desktop
It used to work that way when we used window.localStorage because local storage calls are sync and happen sequentially

So we're back to: "Oh, Cache save us from concurrency issues" 😄
And to do that it might be best to leverage multiGet and get everything but safe eviction keys in cache during init
Then we don't have to delay updates to cache and cache should never become stale


Another strategy could be to recreated the above 3 steps ourselves

  • capture pending writes (we already have a queue)
  • when a read happens check whether we're not writing or are about to write the requested key
  • if we are - resolve the get with the value from the write queue, otherwise read from disk

lib/storage/providers/LocalForage.js Outdated Show resolved Hide resolved
Comment on lines 30 to 31
it('mergeCollection', () => {
expect(Storage.name).toBe('WebStorage');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the success of the test depend on the underlying storage provider? - it shouldn't
mergeCollection should be considered working correctly if it correctly delegates work to a Storage provider

Testing that mergeCollection is interacting correctly with cache is valuable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the implementation we are having issues with is the storage provider then I want to make sure things work correctly - it also took me some time to get this working. Testing against AsyncStorage would not touch any of the code we're proposing to add. Sorry, I think you are trying to say that we should maybe:

  1. Have one set of tests
  2. Run them against both storage providers

But not totally sure. Please clarify if there is a specific suggestion, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think it would be worth it to run against all storage providers

We've introduced 2 changes:

  1. Queue LocalForage writes - this can be tested outside of mergeCollection
  2. We changed the way meregeCollection updates cache - this shouldn't depend on specific storage provider

Specifically for the bug with partial cache I think the issue should be happening with any storage provider, because we evicted cache keys

Comment on lines 64 to 67
// Check the storage to make sure our data has been saved
expect(localforageMock.storageMap.test_1).toEqual(finalObject);
expect(localforageMock.storageMap.test_2).toEqual(finalObject);
expect(localforageMock.storageMap.test_3).toEqual(finalObject);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is changes made around mocking are not necessary here, because which mock is used shouldn't matter
We can assert that Storage.multiSet and/or Storage.multiMerge are getting called correctly

The changes made to __mocks__/localforage.js can help us test LocalForage.multiSet and LocalForage.multiMerge are working correctly with the queues here: LocalForageProviderTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably not understanding the message. The test is helping me to verify the changes by using a mock that works similar to storage. Why is it not necessary? The previous tests did not test this scenario and things were broken. So I'm adding a case to cover the thing that was breaking. What should we do instead?

Copy link
Contributor

@kidroca kidroca Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By testing what's stored inside storageMap of the mock we're asserting that WebStorage works correctly when Storage.multiMerge is called within mergeCollection

We can just have a test where we import LocalForage and invoke multiMerge on it directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this test because it more accurately represents the code flow that went wrong here. It's a combination of Onyx changes and storage provider changes so I think it is reasonable to test them both. But I am possibly being lazy 😄

Copy link
Contributor

@kidroca kidroca Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense it's just that we're not doing it anywhere else - using specific storage provider and mock

So it's not clear when or why tests need to use a specific storage provider, e.g. why this test does but others don't?

What we see changed in Onyx.mergeCollection is a change that should work regardless of the storage provider
We can write 2x tests with the other provider as well, but then it results in more maintenance outweighing the value of the tests

I think we can achieve the same test by:

  1. Mock Storage.getItem
  2. Verity Storage.getItem is being called when a given key is not available in cache
  3. Verify returned mock data is used in cache - e.g. cache matches expected final value

To me this gives us more information regarding what we expect to happen
(First time here I didn't even realize we we were testing the case where we don't have data in cache but have it on disk)

lib/storage/providers/LocalForage.js Outdated Show resolved Hide resolved
@Julesssss
Copy link
Contributor

Sorry, I didn't get to this today -- I stopped due to a family issue. Adding it to my list for tomorrow.

Comment on lines +771 to +776
// Prefill cache if necessary by calling get() on any existing keys and then merge original data to cache
// and update all subscribers
Promise.all(_.map(existingKeys, get)).then(() => {
cache.merge(collection);
keysChanged(collectionKey, collection);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the subscriber can be added at anytime while we are adding new data. We already skipped the cache for any consecutive calls to mergeCollection() and yes they are making it to storage correctly. But if you add a subscriber at the same time while this process is happened it will only get the result of what has been written and save that to the cache.

Yeah, this case slipped my mind

If you can, please explain how exactly it can result in partial data being added to the cache? I would love to write a test for it and we can come up with a solution if it's possible, but I want to make sure there we understand how it can happen.

I think you answered that yourself at the end or at least the reason is similar

  1. We have a mergeCollection call and then another mergeCollection call(s) in close proximity
  2. Writes are scheduled and happen one by one
  3. Promise.all(_.map(existingKeys, get)) is invoked before scheduled writes complete
  4. When some of the existingKeys is not available in cache
  5. get from disk is triggered - it reads older data and puts it in cache

To capture that in a test we're looking for a way to:

  1. Have storage in a state where it has { a: 1, b: 1, c: 1 } for a given key
  2. Have a queue of writes happening and one particular key being written at the end of the queue. Changing b: 2
  3. Have cache in a state where it lacks the given key
  4. Call mergeCollection to merge multiple updates, one of them regarding key setting c: 2

I think that what we have here with Promise.all would result in cache ending up with

{ a: 1, b: 1, c: 2 }

vs

{ a: 1, b: 2, c: 2 }

Comment on lines 30 to 31
it('mergeCollection', () => {
expect(Storage.name).toBe('WebStorage');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think it would be worth it to run against all storage providers

We've introduced 2 changes:

  1. Queue LocalForage writes - this can be tested outside of mergeCollection
  2. We changed the way meregeCollection updates cache - this shouldn't depend on specific storage provider

Specifically for the bug with partial cache I think the issue should be happening with any storage provider, because we evicted cache keys

Comment on lines 64 to 67
// Check the storage to make sure our data has been saved
expect(localforageMock.storageMap.test_1).toEqual(finalObject);
expect(localforageMock.storageMap.test_2).toEqual(finalObject);
expect(localforageMock.storageMap.test_3).toEqual(finalObject);
Copy link
Contributor

@kidroca kidroca Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By testing what's stored inside storageMap of the mock we're asserting that WebStorage works correctly when Storage.multiMerge is called within mergeCollection

We can just have a test where we import LocalForage and invoke multiMerge on it directly

Comment on lines 39 to 51
const additionalDataOne = {b: 'b', c: 'c'};
Onyx.mergeCollection(ONYX_KEYS.COLLECTION.TEST_KEY, {
test_1: additionalDataOne,
test_2: additionalDataOne,
test_3: additionalDataOne,
});

const additionalDataTwo = {d: 'd'};
Onyx.mergeCollection(ONYX_KEYS.COLLECTION.TEST_KEY, {
test_1: additionalDataTwo,
test_2: additionalDataTwo,
test_3: additionalDataTwo,
});
Copy link
Contributor

@kidroca kidroca Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this captures the bug with cache
If you revert mergeCollection changes would the test fail because finalObject is not matching OnyxCache?

I think to cover the cache bug the test should:

  1. merge additionalDataOne
  2. wait for the merge to complete
  3. evict test_1, test_2 from cache
  4. merge additionalDataTwo
  5. Assert finalObject is in cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good additional test. Just to clarify this is covering the case where we are not first grabbing existing data and merging it into the cache. We start with some initial data, then call mergeCollection, then call it again. Before the changes we would end up skipping the initial value and merging partial data into the cache.

@marcaaron
Copy link
Contributor Author

Still need to make sure there are no major issues for native with this change. But the performance is pretty nice on web and the cache issues seem resolved. Gonna take it out of draft.

@marcaaron marcaaron marked this pull request as ready for review March 11, 2022 01:17
@marcaaron marcaaron requested a review from a team as a code owner March 11, 2022 01:17
@MelvinBot MelvinBot requested review from stitesExpensify and removed request for a team March 11, 2022 01:17
@Julesssss
Copy link
Contributor

Seems like we're still discussing solutions. @marcaaron is this still the blocking PR?

@marcaaron
Copy link
Contributor Author

@Julesssss Sorry, not sure which solutions you are referring to. I think this PR is ready to be reviewed/tested - we are just discussing improvements to the tests at this point.

Copy link
Contributor

@thienlnam thienlnam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! Just had one small question/comment

const {data, resolve, reject} = this.queue.shift();
this.run(data)
.then(resolve)
.catch(reject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log or do anything with failed requests? Or I guess we will have server logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have some handling here (but it could be better). In this case, we're just letting the original caller handle the error though and don't need more specific handling (at least not for now since the error will reach this code here)..

.catch(error => evictStorageAndRetry(error, mergeCollection, collection));

@marcaaron
Copy link
Contributor Author

I'm gonna move ahead with this with a caveat that there's probably more work to be done here. The performance is suffering a lot for web/desktop users internally so this should be a welcome temporary improvement

Things to follow up on:

  • Look into improving the tests further
  • Explore a more robust solution to fix IndexedDB slow transactions

@marcaaron marcaaron merged commit 2d42566 into main Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants