Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed server starting with errored entities (1.7.1.0) #2165

Open
wants to merge 4 commits into
base: 1.7.0.0
Choose a base branch
from

Conversation

uGuardian
Copy link

Fixed server starting with errored entities, most notably the issue where occasionally an entity will have a NaN location, causing the entire save file to be unloadable.
Refactored sorting into a single enumeration pass, in part to allow easy insertion of the try-catch.

I've noticed several people having this issue occasionally on the current version, and this is intended to be a small patch to allow the current public version to not occasionally end up losing entire saves due to "corrupted" data when the user has no idea of how to find or fix it.

Streamlined sorting into a single enumeration pass
@tornac1234
Copy link
Collaborator

Those entities we are throwing away might be important and we can't just delete them like it's nothing.
I'd like to have some "manual recovery mode" which happens when server starts to fix all entities that might be affected by the said NaN error. You'd have the possibility to either: delete the entity or move it to some coordinates (0, 0, 0 by default) while having minimal information about the entity (e.g. its type, if it contains players). For VehicleWorldEntities and PlayerWorldEntities we'd also have the possibility to "naturally put them back in the world" while simulating Subnautica's way of doing it (see OutOfBoundsWarp.cs). There would be a random position generated in a safe zone (once again check the script it contains the right procedure) and the entity would simply be moved there.

@uGuardian
Copy link
Author

uGuardian commented Aug 14, 2024

That's actually a very good point, and I was too preoccupied with making sure the entire save doesn't get discarded and thinking about it as a stopgap fix while waiting for the Living Large version that I didn't consider that it could be say... a player, a cyclops, or an immediately critical vehicle in deep water getting deleted, and implementing a location fix for it would be far better.

Want me to look into making a fix for that immediately? I'm not sure how fast I'd get it done around work and other IRL obligations, but it shouldn't be too difficult, especially if it's just correcting the data during the loading process. (I'm going to at least look into it a bit right now)

As a side note I'm not exactly sure what causes the NaN location entities to come into existence, I'd assume it's something like a failed spawn or something, but I'm far from certain on that. In the case of the server I'm part of it was a random Stalker.

…d on other entities in parent, which is generally a cell.
@uGuardian
Copy link
Author

uGuardian commented Aug 14, 2024

I realized that the parent data is usually still present, and location can be approximated from that.

Also flagging previous comments as resolved as the index based access has to be where it is now with current design, and a foreach statement wouldn't work as the for loop now backsteps one after fixing things.

EDIT: I also thought wrong, it could still be put in the try/catch, just not changed to a foreach. If people would prefer the index accessor to be put inside the try/catch feel free to say something.

@Coding-Hen
Copy link
Collaborator

Do we want to add a choice for the user in saying a non debug mode? I am just thinking that the normal users will at this point go to the help channel and start saying it doesn't work, I think we should come up with a default and then let advanced users fix it themselves, what do you think?

@uGuardian
Copy link
Author

Hmm, maybe?
Right now it just says to press a button to continue and if you press anything other than D it attempts to fix it. If you press D is deletes it, so even if it's someone who would just mash the enter key, it'd still "work" for them, and they'd at least get the warning. It basically defaults to automatic repair, it just makes sure you actually have the chance to override that.
I'm not sure, what do you think tornac?

@tornac1234
Copy link
Collaborator

With that in mind, I'd just have a message at the beginning saying
"X entities were found to be broken, do you want to manually fix them (m) or to apply a default fix for all (a) ?"
Default fix would be to do the OutOfBoundsWarp I mentionned above for all entities

@uGuardian
Copy link
Author

You sure? The NaN error is very rare, and when it occurs the new warning tells you what kind of entity it happened on. If you ever get the error on more than one or two entities max there's probably something far more significant wrong with your save.
If you'd really prefer it the other way though I can do that.

Also for the default, I currently have it put it back in the parent it was already part of, the parent most likely being a cell or other type of container (I can't think of anything else it even could be, the one I've seen being an entity with no type that contains several fish entities).
It grabs the average position of all entities in the cell and drops it there. This is a lot more accurate than just plopping it in the middle of the map. If it fails to find any other entities, it'll drop it at 0,0,0 relative to the parent, which will still be more accurate on average than dropping it somewhere in the middle of the map.
It also admittedly doesn't make me have to figure out the parent's position relative to the center of the map, and then figure out the semi-random location based on that offset.
Also thinking about it now, I have no idea what would happen if it tried to load an entity parented to a chunk in a completely different location. If it was breaking, I'd also need to figure out what to reparent it to, and also alter both relevant children lists as well.

@tornac1234
Copy link
Collaborator

Generally it'd happen to players which were launched into space because of some Subnautica wrong collision, but it can probably happen to any entity, for which the best fix would simply to put them back into the world.

When you say it's pretty rare of an error, what do you mean ? What other entity errors are you talking about ?

If you spawn an entity in it's original cell while messing up the positions, there are good chances that they'll just end up being under the ground. I even think no computation is better than any computation around other entities' positions

@uGuardian
Copy link
Author

Sorry that this took so long, work got in my way for several days straight.

Anyways, first of all getting launched into space should never cause the NaN error. Positional data is handled as a float, meaning that no matter how far they go it should only result in floating point rounding errors. The only circumstances I can think of that would ever result in a NaN are dividing the values by zero, or it never getting initialized or serialized properly in the first place.

As for saying it's a pretty rare error, this is specifically the error mentioned in #2097 and also happened to the server I'm part of, but does not seem to occur to many people and not very often. When it does occur though it's a pretty severe bug.
It's rare enough that it should only ever happen to one or two entities at most. If more than that have the same error, there's probably a much bigger problem with the server.

Here is the relevant world data for my server: MP.zip
The specific offending entity in my case is this one:
{"SpawnedByServer":true,"Transform":{"LocalPosition":{"X":"NaN","Y":"NaN","Z":"NaN"},"LocalRotation":{"X":"NaN","Y":"NaN","Z":"NaN","W":"NaN"},"LocalScale":{"X":1.0,"Y":1.0,"Z":1.0}},"TechType":"Stalker","Id":"9ff6bab0-42be-862f-7262-cc29be1a10ee","Level":1,"ClassId":"cf8794a1-5cd6-492e-8acf-7da7c940ef70","WaterParkId":null,"SerializedGameObject":null,"ExistsInGlobalRoot":false,"ParentId":"8d5f104b-08ff-48bb-c8b6-07694a74bbc9","Metadata":null,"ExistingGameObjectChildIndex":null}
The rotation data also being NaN and the scale data just being a flat default of 1.0 seems to support the idea of it never being initialized/serialized correctly.

As for the location setting back to an estimated position based on surrounding entities in parent, I created a test set of 100 theoretical repositions and manually visited each of them precisely in-game, which I recorded in test footage here: https://youtu.be/D-Jkua4cfG8?si=vSqnkAClfveR-md4
The very first tested reposition is the reposition of the stalker that actually caused the error on my server, the few after that were specifically random leviathans, then specifically creatures of a dozen or so creature types since creatures seems to be what it most likely happens to, and then the last 50 were completely random entities.
Any entity that's labelled as having no siblings is simply centered on the parent.
Also for quick review and/or scanning through the video, here's the results sheet: EntityFixTest.txt

The biggest thing that concerns me about putting things back in the center of the world is... well, let's just say I have this image of this bug happening to a Leviathan, and it getting dropped right on top of the escape pod. Alternatively, this somehow happening to a vehicle during world save, loading back in, and finding out your vehicle is now all the way back up in the shallows while you're somewhere like the lost river. Lastly the slight chance of an object getting permanently stuck up in the air that isn't supposed to be seems almost as bad as ending up under the ground to me, although creatures would be more likely to find their way back to water than they would be to swim the correct direction to exit the ground, even if they're right next to it. (Although let's be honest, creatures clipping into surfaces is already something that happens quite a bit in Subnautica)
Also to be completely honest implementing an emulated OutOfBoundsWarp would take more work since it also would involve reparenting the entity and altering the child lists of the new and old entities just to make absolute certain there wouldn't be any weird bugs, so it also has a tiny bit of laziness based motivation, although that's far from the primary reason.

Comparatively, centering it in the parent does have a fairly high chance of ending up in the ground, but if there are other entities in the parent they'll be physics bound as well, meaning if they all cluster to a specific side or avoid a specific part of the area, then there's probably an obstacle there, and centering it on sibling positions will nudge it the correct direction to not end up in the ground most of the time. Also if it were something like a plant (which I don't think this bug really happens to but it's good to be sure) it'll generally put it back into the plant cluster around the right spot.

If the above tests still produce too many bad results in your opinion, I will try to implement an emulated OutOfBoundsWarp, I'm not the one in charge here after all and will defer to your judgement on the subject, I just genuinely think putting them back roughly where they belong is the better option.

@killzoms
Copy link
Collaborator

The issue of the NaN numbers in positions is an issue with Nitrox's calculations regarding object transformations... its a bug that I myself struggle to wrap my brain around due to lack of enough math education regarding that particular topic.

@uGuardian
Copy link
Author

Do you know offhand where in the code the bugged transformation calculations take place? I'll also take a look at that.

@killzoms
Copy link
Collaborator

This is the class that handles transformations in Nitrox, it could be data that is fed to it improperly or it could originate here.

@killzoms
Copy link
Collaborator

spawning in Nitrox where the data originates from is here fyi

@killzoms
Copy link
Collaborator

for the sources of the data if curious, here and here

@killzoms
Copy link
Collaborator

killzoms commented Aug 21, 2024

Perhaps targeting the main branch of Nitrox would be better for this particular issue? The reason is that Nitrox is getting ready to release a new version soon(TM), and the team doesn't plan on releasing any more patches for the 1.7 version.

Releasing another patch on 1.7 for one fix could give the impression that progress has stalled. Plus, the team wants to prioritize getting Nitrox in top shape for its upcoming release, while also sinking any final regressions that surfaced after the Subnautica Living Large update.

@uGuardian
Copy link
Author

Fair enough, I just had no idea exactly how long it'd be until the 2.0 version of Nitrox came out, and the bug seemed severe enough to warrant a hotfix. (Being completely unable to load the server's world is… notable to say the least)
That and me not being sure if the underlying systems in the main branch had undergone major changes are why I'd submitted a pull request to this branch specifically. If you don't want to give false impressions though it's understandable.

@dartasen
Copy link
Member

@uGuardian Indeed we're not going to release another 1.7 version. Though help is always good to take and that would be a pleasure if you're willing to get this sorted out for 1.8. Eve, though we've reworked a lot of thing for 1.8, issue is still the same.

The current fix you gave looks like a help tool rather than a proper fix. What comes to my mind is to prevent such entities to be serialized into the save file with weird values like NaN or Infinite, before anything else. Rather than dealing with the issue when you're loading the server

@uGuardian
Copy link
Author

Yup, that'd be preferable, although it's always good to have error checking in the loader.
I will try looking into the core saving issue itself over the next few weeks. If I manage to find the source of the issue I'll submit fixes.
I'll also look into adding the error checking in the main branch when I get a chance, just because caution never hurts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants