Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loops w/ 100% CPU usage caused by stack overflows #367

Closed
MaxDesiatov opened this issue Jan 21, 2021 · 10 comments · Fixed by #371
Closed

Infinite loops w/ 100% CPU usage caused by stack overflows #367

MaxDesiatov opened this issue Jan 21, 2021 · 10 comments · Fixed by #371
Assignees
Labels
bug Something isn't working DOM/HTML renderer Tokamak in the browser

Comments

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Jan 21, 2021

Describe the bug
There's a certain snippet of code, which when rendererd in the browser triggers an infinite loop after clicking on a NavigationLink.

To Reproduce
Build and run this snippet with carton dev:

import TokamakShim

struct TokamakApp: App {
  var body: some Scene {
    WindowGroup("Spooky Hanger") {
      NavigationView {
        List {
          ForEach(["Section"], id: \.self) { section in
            Section(header: Text(section).font(.headline)) {
              ForEach(["Item 1"], id: \.self) { childRow in
                NavigationLink(
                  destination: Text(childRow).padding([.leading, .trailing])
                ) {
                  Text(childRow)
                }
                .padding(.leading)
              }
            }
          }
        }
        .listStyle(SidebarListStyle())
      }
    }
  }
}

TokamakApp.main()

Open it in any browser a tap on "Item 1". Sometimes it loads the destination, but frequently it just hangs. If it doesn't hang, reload the page and try again. It's usually reproducible at least one in five attempts. I've reproduced in Safari, Firefox, and Chrome.

Expected behavior
NavigationLink always opens the destination without any problems.

Screenshots
Screenshot 2021-01-21 at 20 22 35

Additional context

This was reported by @foscomputerservices in a bigger project, but I've managed to reduce it to this snippet. I've tried to remove seemingly unrelated modifiers or views, like listStyle or padding, but removing them seems to make the issue go away or reduce the probability or reproduction steps to something I can't trigger anymore.

@MaxDesiatov MaxDesiatov added the bug Something isn't working label Jan 21, 2021
@MaxDesiatov MaxDesiatov self-assigned this Jan 21, 2021
@MaxDesiatov MaxDesiatov added the DOM/HTML renderer Tokamak in the browser label Jan 21, 2021
@MaxDesiatov
Copy link
Collaborator Author

Occasionally (but not always!) I'm able to catch an OOB with this in Chrome:

Screenshot 2021-01-21 at 21 17 43

@MaxDesiatov
Copy link
Collaborator Author

MaxDesiatov commented Jan 21, 2021

I've been sticking print(#function); defer { print("exit \(#function)") } in too many places in the reconciler and the DOM renderer, but the actual source of the loop is very elusive. With every added print the place of OOB changes, or the OOB itself goes away. The only thing that gives me hope is that this is stably reproducible (even OOBs) with the same source code in all browsers, so this isn't some kind of Wasm VM optimizer issue, but certainly something with Tokamak, SwiftWasm, or the Runtime library.

@kateinoigakukun please help, my debugging skills are not good enough for this. My only hypothesis is that something's weird with the metadata (maybe the Runtime library misreads some fields?) and this somehow messes everything up.

@MaxDesiatov
Copy link
Collaborator Author

MaxDesiatov commented Jan 21, 2021

Also, I've verified that this is reproducible in both main and 0.6.1, so it was introduced some time ago even if it wasn't present right from the start. I haven't checked with 0.5.3 yet as it requires an older version of carton.

MaxDesiatov added a commit that referenced this issue Jan 23, 2021
We had some issues in this code area previously and I'm thinking of refactoring it in attempt to fix #367. Would be great to increase the test coverage here before further refactoring.
MaxDesiatov added a commit that referenced this issue Jan 24, 2021
This should allow us to remove the Runtime dependency eventually, which seems to be unstable, especially across different platforms and Swift versions.

Seems to resolve in one instance #367. There are a few other places where `typeInfo` is still used, I'll clean that up in a follow-up PR.

* Replace uses of the Runtime library with stdlib

* Remove irrelevant Runtime library imports

* Add TokamakCoreBenchmark target
MaxDesiatov added a commit that referenced this issue Jan 25, 2021
* Add a test for environment injection

We had some issues in this code area previously and I'm thinking of refactoring it in attempt to fix #367. Would be great to increase the test coverage here before further refactoring.

* Update copyright years in `MountedElement.swift`

* Update copyright years in the rest of the files
MaxDesiatov added a commit that referenced this issue Jan 27, 2021
Our OpenCombine fork no longer depends on Runtime, and we don't need much from it other than struct metadata. I removed the unused bits and bobs and kept only a minimal subset of it that we really need. This should make it easier for us to test and debug, as #367 has shown that some weird stuff may still lurk in that area.

* Add a test for environment injection

We had some issues in this code area previously and I'm thinking of refactoring it in attempt to fix #367. Would be great to increase the test coverage here before further refactoring.

* Update copyright years in `MountedElement.swift`

* Update copyright years in the rest of the files

* Vend the Runtime library directly

* Remove unused class, enum, tuple, func reflection

* Remove unused models and protocol metadata

* Remove unused MetadataType and NominalMetadataType

* Remove unused protocols, rename RelativePointer

* Remove more unused protocols

* Use immutable pointers for reflection

* Update copyright headers
@MaxDesiatov
Copy link
Collaborator Author

MaxDesiatov commented Feb 5, 2021

Reopening, as this issue is still reproducible in main with just a few print statements added across the codebase 😢

Thanks to @foscomputerservices for uncovering this!

@MaxDesiatov MaxDesiatov reopened this Feb 5, 2021
@MaxDesiatov
Copy link
Collaborator Author

I've pushed the changes to a branch if anyone's interested in reproducing it on their side.

@MaxDesiatov
Copy link
Collaborator Author

MaxDesiatov commented Apr 9, 2021

TWIMC, all points to this being a stack overflow issue, the immediate workaround is to add these flags to your Package.swift:

linkerSettings: [
  .unsafeFlags(["-Xlinker", "--stack-first", "-Xlinker", "-z", "-Xlinker", "stack-size=16777216"]),
]

The required stack size may be bigger to accommodate for bigger view hierarchies.

The most probable long-term fix will probably require Tokamak to allocate some more on the heap. As @kateinoigakukun said:

TokamakCore.MountedHostView.update consumes 864 bytes for each call, and the deep view-tree accelerates the consumption. Then the total consumed space is nearly 64KB, which is a default stack size wasm-ld allocates.

@foscomputerservices
Copy link
Contributor

This solves the issues that I was having. Thank you for all of your efforts!

@MaxDesiatov MaxDesiatov changed the title Mysterious non-deterministic infinite loop w/ navigation Infinite loops w/ 100% CPU usage caused by stack overflows Apr 29, 2021
@MaxDesiatov
Copy link
Collaborator Author

linkerSettings: [
  .unsafeFlags(["-Xlinker", "--stack-first", "-Xlinker", "-z", "-Xlinker", "stack-size=16777216"]),
]

Had a question asked about it, wanted to clarify that linkerSettings should be passed as the last argument to .target: https://developer.apple.com/documentation/swift_packages/target

@MaxDesiatov
Copy link
Collaborator Author

We have a sanitizer shipped and enabled by default for debug builds in carton 0.10.0. I think we can consider the issue to be resolved now.

MaxDesiatov added a commit that referenced this issue Jun 15, 2021
Most of the changes are related to the use of OpenCombineShim (available in upstream OpenCombine now) instead of CombineShim. But there is also a new test added during the investigation of #367, where an app is rendered end-to end, which is a good way to expand our test suite I think.

* Use immediate scheduler in TestRenderer

This allows running our test suite on WASI too, which doesn't have Dispatch and also can't wait on XCTest expectations. Previously none of our tests (especially runtime reflection tests) ran on WASI.

* Run `carton test` and `carton bundle` in separate jobs

* Bump year in the `LICENSE` file

* Add reconciler stress tests for elaborate testing

* Move default App implementation to TestRenderer

* Use OpenCombineShim instead of CombineShim
@gregcotten
Copy link
Contributor

Hey wait I just had this issue!

I was getting it when an ObservableObject was subscribed to an Electron IPC listener and was making @published updates. The 100% CPU on the renderer process only showed up when the chromium window became completely occluded, strangely. Anyway, using the aforementioned flags worked for me:

linkerSettings: [
  .unsafeFlags(["-Xlinker", "--stack-first", "-Xlinker", "-z", "-Xlinker", "stack-size=16777216"], .when(platforms: [.wasi]),
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DOM/HTML renderer Tokamak in the browser
3 participants