Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual objects should use WeakRefs to enable === #1968

Closed
FUDCo opened this issue Nov 3, 2020 · 11 comments
Closed

Virtual objects should use WeakRefs to enable === #1968

FUDCo opened this issue Nov 3, 2020 · 11 comments
Assignees
Labels
devex developer experience enhancement New feature or request SwingSet package: SwingSet

Comments

@FUDCo
Copy link
Contributor

FUDCo commented Nov 3, 2020

In the discussion of #1960 (comment) @erights points out that using WeakRefs to associate virtual objects with their data would allow us to eliminate the virtual object representative vs. inner self distinction, letting us have a single, unique representative for each active virtual object. This would enable vat code to compare virtual objects for equivalence using object identity (===) comparison instead of forcing them to use a special comparator such as sameKey. This would (1) simplify the application developer's model of the world to be more consistent with their expectations based on regular JavaScript experience and (2) make for fewer complicated weird rules that we have to explain and justify. The tradeoff is that this will require an engine that supports WeakRef, forcing us to dictate Node 14 as the minimum supported Node version.

@FUDCo FUDCo added enhancement New feature or request SwingSet package: SwingSet labels Nov 3, 2020
@FUDCo FUDCo self-assigned this Nov 3, 2020
@dckc dckc added the devex developer experience label Mar 26, 2021
@warner
Copy link
Member

warner commented Mar 26, 2021

That's a great (and nicely simplifying) feature to provide. And I'm completely fine with relying upon a WeakRef-enabled engine, and allowing liveslots (and the virtual object manager it contains) access to WeakRef, since we must do that anyways to enable GC.

My biggest concern is to make sure this doesn't expose non-determinism to vats (which sounds unlikely). And to understand how it interacts with the GC issues that #2724 raises. I don't understand the latter at all yet, but I know that one of our likely tools is to forbid userspace access to a raw WeakMap or WeakSet, and instead they get a version that can recognize certain liveslots objects (definitely Presences, maybe Representatives too) and either use a different key, or forbid their use entirely.

@FUDCo
Copy link
Contributor Author

FUDCo commented Mar 26, 2021

The trick that WeakStore uses for virtual objects is to keep a separate Map indexed by object ID string. Operations check if the key is a virtual object; if so, they use this side map keyed by the virtual object's ID, else they look up in a regular WeakMap keyed by the actual key. This way we we don't sacrifice WeakMap benefits for regular objects. The same trick should work for wrapping WeakMaps in general.

@warner
Copy link
Member

warner commented Apr 15, 2021

@dtribble asked me how quickly we could implement this, so we could move Zoe APIs to the better model now, instead of causing churn later. I think we can do it before the #2724 infrastructure, however we need to be careful about non-determinism, and that may require giving up some performance.

This issue is that we must do a syscall.vatstoreGet, at some point, to satisfy userspace reading a property off a virtual object. This syscall has a return value, which means it must be recorded in the transcript (so that a future replay knows what value to return, even without interacting with the kernelDB, which will have evolved into a much different state by that time). And our plan is to include the transcript in the consensus state, so followers and new validators can verify the contents of individual vats. Which means the sequence, keys, and values of every vatstoreGet is part of consensus, and must be a deterministic function of userspace activity.

The lifecycle of a Representative is: creation, use, drop, collection. We need a Representative each time an inbound message references the virtual object, and each time userspace reads from a virtual collection and the hitherto-offline data becomes online. In our current implementation, we make a new Representative each time. In the proposed change, we use a mapping from vref to WeakRef to re-use an existing Representative, so that if userspace still holds on to one, they'll always get that same JS Object in any other pathway that references the same vref.

In this proposed world, the creation of new Representatives for a single vref may become nondeterministic. We must ensure that this does not result in a nondeterministic sequence of vatstoreGet calls (or vatstoreSet, or vatstoreDelete), or anything else that lives in the transcript and is part of consensus. So, for example, the following approach would not work:

  • when we need a Representative (message unserialization, virtual collection read), check the table, if an entry is present, check the WeakRef, if the WeakRef still points to something, return it
  • if not, make a new Representative, do a vatstoreGet to populate its data, update the table with new WeakRef(representative), return the new Representative

because the following might happen within a single crank:

  • crank starts with a delivery that references the virtual object, Representative is created
  • userspace stores it in a virtual collection, drops the Representative
  • userspace does a lot more work
  • userspace reads the virtual collection, needs the Representative again

If the "lot more work" triggers engine-level GC, it might collect the Representative, and then the second reference will need to create a new one. But if it doesn't, the Representative will still be available (not reachable from userspace code, only reachable through our WeakRef), and we won't make a new one. If these two cases cause different sequences of syscalls, we've exposed nondeterminism.

We've decided that we're willing to rely upon engines doing a complete GC (with prompting) at the end of a crank. But the property being exercised here: not doing GC in the middle of a crank, is beyond that, and I don't think we're comfortable relying upon it.

So we need a solution that keeps the syscalls deterministic. Userspace will (by definition) provide a deterministic sequence of interactions with our virtual objects. That will consist of:

  • acquisition of a Representative via an incoming message or promise resolution
  • acquisition of a Representative by reading data from a virtual collection
  • dropping a reference to a Representative
  • reading/writing properties of a Representative

By virtue of the first three, the transition back and forth between REACHABLE and UNREACHABLE (using #1872 terminology) is a deterministic function of userspace activity, however liveslots cannot distinguish the two. The transition between UNREACHABLE and COLLECTED can be sensed by liveslots (by querying a WeakRef), but only by polling. The transition between COLLECTED and FINALIZED is sensed by a FinalizationRegistry callback being invoked (so "interrupt driven" instead of "polling").

So whatever our virtual object manager does, it must not reveal (through syscalls) the difference between UNREACHABLE/COLLECTED/FINALIZED. It is allowed to reveal the difference between REACHABLE and UNREACHABLE, however that isn't helpful because it can't even sense that difference.

Solutions

I see two approaches that might work. Both give up some performance, specifically they give up some of the benefits of caching.

vatstoreRead on every property access

We could create Representatives non-deterministically, as long as the resulting syscalls are still deterministic. Userspace will read properties deterministically, so this solution is to maintain a one-to-one relationship between userspace doing a property read and liveslots doing a vatstoreGet.

One implementation would make the getters that live on the Representative's state object each perform the same vatstoreGet every time any one of them is invoked. We'd do the same for vatstoreSet. Something like the following:

for(const prop of props) {
  state.defineProperty(prop, {
    get: () => unserialize(syscall.vatstoreGet(vref))[prop],
    set: (value) => syscall.vatstoreSet(vref, serialize({
     prop: value, ...(unserialize(syscall.vatstoreGet(vref))) })),
  });
}

We can't cache the data, because the presence of the cache is nondeterministic. We can't even cache the get data for use during set, for the same reason (userspace might retrieve a Representative and then do a set without doing any intermediate get).

In this approach, we can create a Representative at any time: it does not need to be deterministic. But we must be careful to not perform any syscalls when the Representative is created. We cannot use vatstoreGet to pre-populate the data. We can't even use it to confirm that the vref is valid: that must be deferred until someone actually tries to touch the state object. (There should be no way for userspace to submit an invalid vref, so this shouldn't matter).

The performance consequences are:

  • the number of syscalls are proportional to the number of reads+writes on Representatives, so multiple reads in a row are no longer the same cost as a single read
    • if the object has a large state, this would encourage a usage pattern where you read the property off .state just once, and use the read value multiple times
    • making multiple writes to the object is now more expensive (measured in syscalls) than making a single write

Pin the Representative until the end of the crank

When we get more of the GC work done, liveslots will be aware of end-of-crank: it will regain control after userspace has lost agency, so it can provoke engine-level GC and give all finalizers a chance to run.

We could have liveslots maintain a strong reference to every Representative from the moment of instantiation (start of crank as the delivery is unserialized, or retrieval from a virtual store, both of which are deterministic), until this end-of-crank release phase (which is also deterministic). We'd just have a Set of all Representatives created during the crank. Then we could safely cache the read data, knowing that the cache will survive until a deterministic point.

At end of crank, we empty this Set, removing liveslots' strong reference (leaving only userspace references, if any), moving any unreferenced Representatives from REACHABLE to UNREACHABLE. Then we trigger GC, which (since we will rely upon it being complete) moves them from UNREACHABLE to COLLECTED. (And we allow finalizers to run, moving them to FINALIZED, and letting us delete the Map entries for the now-dead WeakRefs). This puts the Map, and any cached data, back into a state that's a deterministic function of userspace activity.

We could defer doing secondary-store writes until this end-of-crank phase, if we wanted, which would amortize multiple writes during a single crank.

The performance consequences are:

  • we do at most one read and (optionally) at most one write syscall per virtual object per crank
    • so multiple reads or writes are amortized, even more so than the current design
  • all virtual objects used during a single crank are kept around until (at least) the end of crank
    • so cranks that e.g. iterate a large table of virtual objects will consume RAM proportional to the size of the table, not a constant "max representatives in memory at a time" limit, until the end of crank releases everything

First steps

I'm inclined to go with the "immediate read/write" approach for now, because I haven't finished the "liveslots gets control at end-of-crank" GC work necessary for the second solution. I know it would be disappointing to give up the cache, but I think the design will be nicely simple.

I do need to apply a tiny bit of the GC work, just to make sure WeakRef is available to liveslots in all worker types. If this isn't already the case, it should only be a few lines of code.

We must also think about how to get rid of the Map entries. I think it is safe to use a FinalizationRegistry, with a callback that checks vrefToRepresentative.has(vref) && vrefToRepresentative.get(vref).deref() and deletes the entry if false. This will happen at nondeterministic times, but as long as we're careful to not do any syscalls on the creation side, I think that's ok. We can either implement that, or punt and say that we keep these Map entries around forever (until we get to the next phase). Either way doesn't require the deeper end-of-crank changes I have in the works.

After the GC improvements, we'll have the option of using the second approach. I don't have a clear prediction as to which performance characteristics will be better. My hunch is that we're more likely to touch a lot of virtual objects in a single crank (iterating through tables) than we are to do a lot of reads/writes to any single object, so my hunch is that we should stick with the "do all the reads" approach, and not use the "pin the representatives until end-of-crank" approach. The "do all the reads" approach is also way simpler.

@FUDCo
Copy link
Contributor Author

FUDCo commented Apr 15, 2021

I like the "pin representatives until end of crank" approach much more. The "read every time" approach essentially sacrifices 90% of the benefit of having representatives in the first place; if we were going to pay the cost of a read every time, I'd just redesign the virtual object API entirely (making it much less "virtual"). Also, the pin approach seems like less work to implement.

@warner
Copy link
Member

warner commented Apr 15, 2021

I thought you'd say that :). I kinda think the primary benefit of virtual objects is the ability to move them out of RAM, which happens somewhat more in the first approach (GC could run in the middle of a crank). I consider consolidation of reads/writes to be a secondary benefit, but of course it depends upon how userspace code actually uses it.

I currently like the bare simplicity of the first approach, I think we'd delete at least half of virtualObjectManager.js (I appreciate that you wrote all of that, and I don't want to celebrate undoing your hard work, but I do love deleting complex code). And we can't implement the second approach until we have enough of the GC plan implemented to give us an end-of-crank place to stand, so the first approach would let us get something built faster.

@erights
Copy link
Member

erights commented Apr 15, 2021

I'm caught up. It sounds like we're agreed that

  • The pinning approach is what we actually want long term
  • Either approach gives us what we need in the short term
  • We can't always get what we want, but we can get what we need
  • We should do first whatever is simplest to implement in the short term
  • Pinning would require more mechanism that we have right now

Is this right?

@erights
Copy link
Member

erights commented Apr 15, 2021

Just noting that JS standard weakrefs already do something like pinning for the rest of the turn. But since this is per turn rather than per crank, it probably doesn't help.

@warner
Copy link
Member

warner commented Apr 16, 2021

yeah, that sounds about right

I'm not convinced that pinning is what we want long-term. It's a tradeoff that optimizes for different things.

The pinning approach is bad for RAM usage when we use a large number of virtual objects within a single crank: memory usage will grow to O(num_used) by the time the crank finishes (and then drop back down to O(1)). But the DB usage is constant whether you make a lot of reads/writes or just one.

The no-cache approach is bad for DB calls when we make a lot of reads/writes. But it's great for RAM usage in all cases, because it never keeps anything in RAM: O(0).

It would be lovely if we could split the difference with an LRU cache, but that won't get us determinism.

@warner
Copy link
Member

warner commented Apr 16, 2021

Oh, and I should point out that any of these approaches will still expose non-determinism to user code until we fix #2724, because it might use a Representative as the key of a WeakMap, and then drop all strong references to the Representative. The map entry will disappear, and someone sending it back to them (e.g. looking up a Purse balance later) will get nothing.

I think we can implement this in stages, since current non-adversarial code is restricting itself to putting Representatives in our special makeWeakStore (it already has access to surprising, buggy, deterministic behavior by putting them in a normal WeakMap). The sequence would be:

@warner
Copy link
Member

warner commented Apr 17, 2021

@FUDCo and I talked this through, and we think we can retain the LRU cache, as long as we maintain the property that the creation of a Representative does not cause any reads (in fact no syscalls of any kind).

Userspace will read/write properties of a Representative's state in a deterministic fashion. Userspace will cause calls to provideRepresentative in a deterministic fashion (unserializing the delivery, or reading from a virtual collection, or populating a .state that references some other virtual object). The nondeterminism will be in the implementation of provideRepresentative, where it either re-uses an existing object (if it's in the REACHABLE or UNREACHABLE state), or finds the WeakRef is empty/missing and must make a new object (in the COLLECTED/FINALIZED/UNKNOWN state).

So the approach will be:

  • we keep an LRU cache of state objects, indexed by vref. The "U" (update) happens when someone reads or writes the state object.
  • we use slotToVal (suitably modified, as part of change liveslots to use WeakRefs to track objects and report their disuse #2660, to hold WeakRefs instead of the values themselves) to keep track of existing Representatives
    • convertSlotToVal is used to provide a Representative, which makes a new one if necessary
    • there will be at-most-one Representative for any given vref
    • the Representative may or may not have a state object. (I mean, it always has a .state property, but there may or may not be an entry in the LRU cache for it). We do not add something to the LRU cache when creating a Representative: we wait until someon reads or writes the state.
  • we do not populate the state object until someone does a read, triggering the getter, causing a syscall
  • we can defer writing the data to secondary storage until the state object is evicted from the LRU cache, if we want to be lazy, or we could write it immediately

To reiterate:

  • the presence or absence of a Representative for any given vref is nondeterministic
    • userspace cannot sense this (once we fix WeakMap/WeakSet)
    • no syscalls depend upon this distinction
  • the contents of the LRU state cache are a (weird, but) deterministic function of the reads and writes performed by userspace
    • this state is sensitive to all userspace activity: it reveals something about access patterns of all objects in the same vat
    • therefore userspace should not be able to sense it, to avoid an ambient communication channel
  • the read/write syscalls made are a deterministic function of the state of the LRU cache

In terms of scheduling, we think:

  • first, I need to land the 266-add-weakref branch, or at least the subset that includes slotToVal using WeakRefs
  • @FUDCo will change the virtual object manager to avoid eager pre-population of the state data (which means read during Representative creation, which will be a problem in the new scheme)
  • he'll also adapt his hybrid makeWeakStore object to provide the standard WeakMap API, instead of our local init/get/set/delete API
    • then he'll throw that object at @erights for conversion into something that can be subclassed
    • we'll use this to replace the global WeakMap object, so it must be usable in all the same ways, and neither of us know how to perform such magic
  • then we'll circle back and figure out integration

We'll be ok with landing the persistent-Representative changes before replacing WeakMap, if it comes out that way, but we'll need to remember that it exposes nondeterminism to adversarial code. But we really want to get quickly to the point where WeakMap can be used with both Presences and Representatives, because that's the most natural way for developers to program, and we don't want to have to teach developers one thing and then replace it with something better a month later. Best to make it better first, and then only teach them a single thing.

@FUDCo
Copy link
Contributor Author

FUDCo commented Apr 17, 2021

As I was refamiliarizing myself with the virtual object code in preparation for making the changes to ensure that no vatstore access happens until deterministic vat code tries to access the state on purpose, it struck me that the use of weakRefs to hold onto the representatives will allow us to eliminate one layer of indirection in the underlying implementation and simplify the virtual object machinery as a result.

Currently there are three layers: the representatives, the "inner self", and the state. The state is a separate layer because it can come and go from memory, and so it must be a separate thing so that that thing can sometimes not be there. The inner self is a separate thing because we need to have a single common entity that all the representatives can point to in order to share the state, since there can be multiple representatives. However, the weakRef-based implementation means that when the virtual object is actively being referenced in memory, there will always be exactly one representative instead of possibly many, and thus that representative and the inner self it points to will always be in one-to-one correspondence. The distinction between the representative and the inner self can go away. All the code that currently works with the inner self can instead use the representative directly, and a layer of mechanism can be removed.

warner added a commit that referenced this issue Apr 27, 2021
This will be used by the virtual object manager when creating a
Representative, so that both inbound and outbound tables are prepared for it.

refs #1968
warner added a commit that referenced this issue Apr 28, 2021
This will be used by the virtual object manager when creating a
Representative, so that both inbound and outbound tables are prepared for it.

refs #1968
@FUDCo FUDCo closed this as completed in 4fee636 May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devex developer experience enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

4 participants