Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry SDK Crash Detection for Cocoa #44342

Closed
24 tasks done
philipphofmann opened this issue Feb 9, 2023 · 7 comments
Closed
24 tasks done

Sentry SDK Crash Detection for Cocoa #44342

philipphofmann opened this issue Feb 9, 2023 · 7 comments

Comments

@philipphofmann
Copy link
Member

philipphofmann commented Feb 9, 2023

Problem Statement

As an APM company, the reliability of our SDKs is one of our most essential quality goals. If our SDK breaks the customer, we fail. Our SDK philosophy refers to this as degrade gracefully.

For some SDKs, like mobile SDKs, we primarily rely on users to report SDK crashes because we don't operate them in production. If users don't report them, we are unaware. Instead, we should detect crashes caused by our SDKs when they happen so we can proactively fix them.

This solution doesn't seek to detect severe bugs, such as the transport layer breaking or the SDK continuously crashing. CI or other quality mechanisms should find such severe bugs. Furthermore, the solution only targets SDKs maintained by us, Sentry.

The goal is to only start with Cocoa SDK crashes. If that works properly, we can extend the algorithm to find SDK crashes of other Sentry SDKs.

Solution

We want to detect SDK crashes during event processing as decided in the RFC.

Tasks for Alpha Phase

  1. Component: Event Pipeline Component: Monitoring Component: SDK Status: Backlog Team: Mobile Platform
    philipphofmann
  2. Component: Event Pipeline Component: Monitoring Component: SDK Status: In Progress Team: Mobile Platform
    philipphofmann
  3. Component: Event Pipeline Component: Monitoring Component: SDK Status: Backlog Team: Mobile Platform
    philipphofmann
  4. Component: Event Pipeline Component: Monitoring Component: SDK Status: Backlog Team: Mobile Platform
    philipphofmann
  5. Scope: Backend
    philipphofmann
  6. 14 of 14
    Platform: Cocoa Scope: Backend
    philipphofmann
  7. Platform: Cocoa Scope: Backend
    philipphofmann
  8. Scope: Backend
    philipphofmann
  9. Scope: Backend Status: Backlog
    philipphofmann
  10. Scope: Backend
    philipphofmann
  11. Scope: Backend Status: Backlog
    philipphofmann
  12. Scope: Backend Status: Backlog
    philipphofmann
  13. 2 of 2
    Scope: Backend Status: Backlog
    philipphofmann
  14. Scope: Backend
    philipphofmann
  15. Scope: Backend
    philipphofmann
  16. Scope: Backend
    philipphofmann

Other open tasks

  1. Scope: Backend
    philipphofmann
  2. Scope: Backend
    philipphofmann

Rollout PRs

@getsantry
Copy link
Contributor

getsantry bot commented Feb 9, 2023

Routing to @getsentry/team-mobile for triage, due by Friday, February 10th at 5:00 pm (sfo). ⏲️

@philipphofmann
Copy link
Member Author

Current state is here https://github.com/getsentry/sentry/tree/feat/sdk-crash-monitoring

@philipphofmann
Copy link
Member Author

philipphofmann commented Jun 14, 2023

Collected Data

The current state as of June 14th, 2023.

I have a POC tested locally in this PR #49928. Nothing is merged to master yet. We decided to split the PR into multiple smaller PRs so it's easier to review. You can have a look at a clean diff here.

The feature will be called in post_process

sdk_crash_detection.detect_sdk_crash(
event=event, event_project_id=settings.SDK_CRASH_DETECTION_PROJECT_ID
)

Then it takes the event data and only keeps properties of the event based on an allow list. We must still remove debug images and define an allow list for exceptions (#50710):

sdk_crash_event_data = strip_event_data(event.data, self.cocoa_sdk_crash_detector)

EVENT_DATA_ALLOWLIST = {
"type": Allow.SIMPLE_TYPE,
"datetime": Allow.SIMPLE_TYPE,
"timestamp": Allow.SIMPLE_TYPE,
"platform": Allow.SIMPLE_TYPE,
"sdk": {
"name": Allow.SIMPLE_TYPE,
"version": Allow.SIMPLE_TYPE,
"integrations": Allow.NEVER.with_explanation("Users can add their own integrations."),
},
"exception": Allow.ALL.with_explanation("We strip the exception data separately."),
"debug_meta": Allow.ALL,
"contexts": {
"device": {
"family": Allow.SIMPLE_TYPE,
"model": Allow.SIMPLE_TYPE,
"arch": Allow.SIMPLE_TYPE,
},
"os": {
"name": Allow.SIMPLE_TYPE,
"version": Allow.SIMPLE_TYPE,
"build": Allow.SIMPLE_TYPE,
},
},
}

Currently, the code only keeps non-in-app frames and SDK frames. We still have to refine this to use an allow list (#50916) cause if users change the in app logic, we might collect customer frames.

def _strip_frames(
frames: Sequence[Mapping[str, Any]], sdk_crash_detector: SDKCrashDetector
) -> Sequence[Mapping[str, Any]]:
"""
Only keep SDK frames or non in app frames.
"""
return [
frame
for frame in frames
if sdk_crash_detector.is_sdk_frame(frame) or frame.get("in_app", None) is False
]

Finally, it saves the stripped event data to a dedicated project

manager = EventManager(dict(event_data))
manager.normalize()
return manager.save(project_id=event_project_id)

Sample Output JSON of the Event Stripper

We still have to remove the debug_meta if possible.

{
  "type": "error",
  "platform": "cocoa",
  "timestamp": 1686731069.355087,
  "contexts": {
    "device": { "family": "iOS", "model": "iPhone14,8", "arch": "arm64e" },
    "os": { "name": "iOS", "version": "16.3", "build": "20D47" },
    "sdk_crash_detection": { "detected": true }
  },
  "exception": {
    "values": [
      {
        "type": "SIGABRT",
        "stacktrace": {
          "frames": [
            {
              "function": "__49-[UINavigationController _startCustomTransition:]_block_invoke",
              "symbol": "__49-[UINavigationController _startCustomTransition:]_block_invoke",
              "package": "UIKitCore",
              "in_app": false,
              "image_addr": "0x1a4e8f000"
            },
            {
              "function": "-[UINavigationController navigationTransitionView:didEndTransition:fromView:toView:]",
              "symbol": "-[UINavigationController navigationTransitionView:didEndTransition:fromView:toView:]",
              "package": "UIKitCore",
              "in_app": false,
              "image_addr": "0x1a4e8f000"
            },
            {
              "function": "-[UIViewController _endAppearanceTransition:]",
              "symbol": "-[UIViewController _endAppearanceTransition:]",
              "package": "UIKitCore",
              "in_app": false,
              "image_addr": "0x1a4e8f000"
            },
            {
              "function": "-[UIViewController __viewDidAppear:]",
              "symbol": "-[UIViewController __viewDidAppear:]",
              "package": "UIKitCore",
              "in_app": false,
              "image_addr": "0x1a4e8f000"
            },
            {
              "function": "-[UIViewController _setViewAppearState:isAnimating:]",
              "symbol": "-[UIViewController _setViewAppearState:isAnimating:]",
              "package": "UIKitCore",
              "in_app": false,
              "image_addr": "0x1a4e8f000"
            },
            {
              "function": "__47-[SentryBreadcrumbTracker swizzleViewDidAppear]_block_invoke_2",
              "package": "Sentry",
              "in_app": false,
              "image_addr": "0x100304000"
            }
          ]
        },
        "mechanism": {
          "type": "generic",
          "data": { "handled": false }
        }
      }
    ]
  },
  "debug_meta": {
    "images": [
      {
        "code_file": "/private/var/containers/Bundle/Application/9EB557CD-D653-4F51-BFCE-AECE691D4347/iOS-Swift.app/Frameworks/Sentry.framework/Sentry",
        "debug_id": "e2623c4d-79c5-3cdf-90ab-2cf44e026bdd",
        "arch": "arm64",
        "image_addr": "0x100304000",
        "image_size": 802816,
        "type": "macho"
      },
      {
        "code_file": "/System/Library/PrivateFrameworks/UIKitCore.framework/UIKitCore",
        "debug_id": "b0858d8e-7220-37bf-873f-ecc2b0a358c3",
        "arch": "arm64e",
        "image_addr": "0x1a4e8f000",
        "image_size": 25309184,
        "image_vmaddr": "0x188ff7000",
        "type": "macho"
      }
    ]
  },
  "sdk": { "name": "sentry.cocoa", "version": "8.1.0" }
}

@HazAT
Copy link
Member

HazAT commented Jun 14, 2023

if sdk_crash_detector.is_sdk_frame(frame) or frame.get("in_app", None) is False
will it help to know which non in_app frames were called?

I would propose removing it so we don't have to deal with what you mentioned, a user changing in_app
so I would just do if sdk_crash_detector.is_sdk_frame(frame)

wdyt?

@philipphofmann
Copy link
Member Author

@HazAT, just the SDK frames won't be enough sometimes to know what's happening. System frames would be helpful as well. So therefore, my plan is to add an allow list with items such as CoreFoundation, UIKit, UIKitCore, GraphicsServices, etc.

philipphofmann added a commit that referenced this issue Jun 14, 2023
This PR adds a feature flag disabled by default, the project id for the
feature in the settings, and calls the placeholder class for the SDK
crash detection (#44342) in
post processing.

This is the first PR of splitting up the POC for SDK crash detection
(#49928) into multiple PRs.

---------

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
@philipphofmann
Copy link
Member Author

philipphofmann commented Jun 15, 2023

We (Daniel, Armin, Karl, and I) decided to move away from the in-app logic and only keep SDK frames and system library frames instead. For more detail see #50916.

@philipphofmann
Copy link
Member Author

The SDK Crash detection is up and running for the Cocoa SDK. We will open more issues in the future to roll it out to more SDKs.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

3 participants