-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infrastructure connectivity checks #29439
Comments
How do you handle the fact that checkConnectivity may hang for a "long" time? Are the calls on checkConnectivity done at the same time or one after another, and what does the page shows while waiting for the timeouts? |
Yes, I'd say components know better how to configure the timeouts for their libraries (ex: Redis), and they need to catch the matching exceptions to track those timeouts. |
I will bring one in soon for currently running case, atm i have a simple patch (amongst others) that need to be polished. |
Should not be one per host sufficient? If there is a meaningful connection possible to be made without auth (should be, right). Otherwise it will be just terrible. |
once per host would be nice, but would only work when no auth is set or if the credentials are identical |
for connectivity test it should be enough to see if the server replies reasonably, i.e. requiring to auth. 30k checks against a service might look like a DoS attempt :) |
Problem
Receiving reports that are not actually bugs but connectivity issues with components, for example slow LDAP servers, or unstable connections to the database or redis.
Solution
Having automated checks in place that makes it possible for admins to verify their infrastructure connectivity before reporting issues. The goal is to prevent people from sending issues that are not actual bugs.
Implementation idea
Nextcloud already has a "setup check" section in the settings where various settings are getting verified.
This section should be extended to include a table with two columns connectivity and reliability, and with rows representing the various components that are to be checked.
With connectivity we mean "is it accessible at all" and with reliability we mean "how many connection failures / timeouts in past time intervals (hours, days, weeks)"
Possible components to check that would appear as rows / grouped rows:
Implementation details
Status provider service
getType(): string
for displaying in the table as prefix for a component type (and for grouping)getDisplayName(): string
for displaying in the table, it must be useful enough for the admin to find out which exact component needs attentioncheckConnectivity(): void
that does an immediate connectivity check to one given component type, like for example a specific LDAP server. Only a connection is done, the measurement is done by the caller.getFailures(): array
, returns array of timestamps for last failures.Intermittent failure tracking
Every app that manages components (ex: LDAP) must catch connection failures like timeouts and send it to the status provider service
Component types (raise tickets when ready)
Development phases
Open issues
The text was updated successfully, but these errors were encountered: