Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add observability ui operator proposal #1494

Closed

Conversation

jgbernalp
Copy link

This proposal by the Console UI team aims to improve the way we manage observability plugins in the OpenShift console. The focus is on reducing complexity and unifying the Observability experience in the OpenShift console.

@jgbernalp
Copy link
Author

/assign @stleerh @eparis

@jgbernalp
Copy link
Author

/retest


- As an OpenShift user, I want an operator from the Red Hat catalog that can deploy various observability UI components so that all signals supported by the cluster are easily accessible and can be used for troubleshooting.

- As an OpenShift administrator, I want a centralized operator for observability UI components so that I can streamline console requirements and integrate diverse signals effectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that it will require the user to install another operator just to get the UI portion unless more work is done to provide an option to install this for you when you install the component such as Openshift Logging or Network Observability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unique console plugin for all observability operators would solve that point without the need to install an extra operator. As soon as you have at least one installed, the observability plugin is there.
Behind the scene, the plugin can feature gate the exposed pages according to available metrics / logs etc.

The challenge in this approach would be the compatibility between plugin version and each operator version but there are ways to remediate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unique console plugin for all observability operators would solve that point without the need to install an extra operator. As soon as you have at least one installed, the observability plugin is there.

@jpinsonneau But how does the observability plugin get installed? Is the suggestion for all observability operators to include a copy of the plugin?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all operators could embed the same plugin and the most up to date is taken. It need to create a mechanism to identify who's responsible of it, using owner / version labels for example. The code managing that must be shared between all operators to avoid any reconcile issue. It's a cross team effort !

As an alternative, the plugin could be embedded in Monitoring Operator or OCP Console directly, and enabled when needed. The downside of this approach is the update cycle, tied to the owner.

Both of these approach seems to be a better match for the objectives listed in this doc rather than creating yet another operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all operators could embed the same plugin and the most up to date is taken. It need to create a mechanism to identify who's responsible of it, using owner / version labels for example. The code managing that must be shared between all operators to avoid any reconcile issue. It's a cross team effort !

This sounds pretty involved and easy to get wrong. If the goal is to simplify the signal operators, is it possible we're actually making things harder?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stleerh I believe this will give customers the flexibility to add visualization based on their needs, if in some case metrics or logs are just forwarded, the visualization piece might not be needed saving resources. for multi-cluster setups it should be enabled only in the hub cluster . If visualization should be installed by default this can come from a meta operator like OBO that enables the signal operator and enables that piece in the Observability UI operator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to me it's still a bit heavy to create a dedicated operator, not including monitoring, just to avoid update cycles.

It would make more sense if the target was:

  • including monitoring dashboards / metrics query pages
  • sharing storages (Prometheus, Loki etc)
  • managing gateway + roles
  • enabling correlation between any metrics / storages

Your plugin should rely on the logStore you configured in ClusterLogging CR, so even if it doesn't consume the logging services, it rely on its configuration.

Our Console Plugin is deployed by Network Observability Operator with Loki storage behind the scene, just as same as you:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correlation is given by korrel8r and enhanced customizable dashboards by Perses. So the observability operator is intended to do this heavy lifting of configuring and installing the Perses operator and korrel8r for correlation.

The logging plugin could be connected to any Loki store without even having cluster logging in a cluster, and the only task for the logging operator is to enable it. So it currently relies on the CR configuration but this is not ideal as any other operator can do the job.

IIUC even if the net observe plugin is optional is coupled to the net observe operator backend, regardless of the store. This is a reason why we wont include the network observability plugin as it has a clear 1:1 relationship with its operator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation yes, however that 1:1 relationship may change in the future, depending on multi cluster needs.
I think that's the reason why logging plugin can be installed without the operator right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this will be the ideal in multi cluster scenario so visualization is present only where is needed.


The Observability UI Operator will be available in the Red Hat catalog.

It will manage the deployment of several components which will be added incrementally based on priority, as shown in the table below:
Copy link
Contributor

@stleerh stleerh Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is only one Observability UI Operator, this means all of the components must be compatible with a specific version of the UI operator.  Will the Observability UI Operator guarantee backwards-compatibility?

**OpenShift cluster administrator** is responsible for installing, enabling, configuring, and managing the plugins and operators within the OpenShift environment.
**OpenShift user** is the end-user interfacing with the OpenShift console and making use of the observability signals presented by the dynamic console plugins.

1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to install operator dependencies automatically and not expect the user to do this, similar to all package managers like apt, rpm, dnf, npm, pip, go get, etc.  Let's do some investigation on what OLM supports.


- As an OpenShift administrator, I want a centralized operator for observability UI components so that I can streamline console requirements and integrate diverse signals effectively.

- As an OpenShift user, I want to customize observability dashboards with various signals so that I can quickly identify and resolve issues.
Copy link

@jpinsonneau jpinsonneau Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this replace monitoring ?
The opposite is mentionned in the non goals section so the difference should be clear between monitoring dashboards and the plugin ones.
Unifying monitoring in a single place would be a real value for the user. It's a nonsense to have dashboards separated per team today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add new customizable dashboards, as the dashboards should not be specific to a single datasource. Totally agree that dashboards per team create disparity. Hence the goal is to add new dashboards that can consume multiple datasources and have richer charts so teams are not constrained in visualization. This is described in the design section: Dashboards console plugin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it definitly makes sense for monitoring to be part of it and take advantage of the improvments ?
See #1494 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monitoring currently is more than only dashboards as it includes alerting rules and service monitors. So the part that is not coupled (dashboards) and offers other teams more advantages is the part we are extracting.

Copy link
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern is that the transition to this model sounds pretty rough for cluster admins with several steps and a lot of manual configuration. If those steps aren't completed, the observability UI will simply disappear for users. We should look if there are ways to streamline the transition and make things just work.

The other major drawback is that plugins will now have to deal with version skew between the observability UI operator and the signal operators. If the plugin is packaged with the signal operator, this isn't an issue. We should decide whether we only want this for cross-cutting plugins.

**OpenShift user** is the end-user interfacing with the OpenShift console and making use of the observability signals presented by the dynamic console plugins.

1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog.
2. If there is an existing observability UI plugin deployed by another operator, the cluster administrator disables it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily need to be a manual process. One pattern we've taken when phasing out static plugins is for the new plugin to set a console feature flag like OBSERVABILITY_PLUGIN, then the old plugin will disable all extension points when that flag is present. That way if the admin doesn't disable the old plugin, you won't have duplicate pages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion, will update accordingly


1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog.
2. If there is an existing observability UI plugin deployed by another operator, the cluster administrator disables it.
3. The cluster administrator configures the operator adding a custom resources (CR) to deploy the desired plugins and link them with the corresponding signal operators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the presence of the observability operators something we can detect ourselves using console feature flags so that this step isn't required?


### API Extensions

This enhancement introduces a new CRD to represent observability UI console plugins. The `ObservabilityUIConsolePlugin` CR for a plugin will be defined as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we're asking a lot of administrators to set this up. If they forgot to do it or get it wrong, the UI will be missing or non-functional. I think we should look at whether we can discover these services on the cluster instead of requiring manual configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the role of the admin, or the role of the signal operator, to create this CR ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be the role of the admin, but as @spadgett suggests above, this might come also from the operator discovering the services and enabling the plugins accordingly.


### Open Questions

- How does the operator enables the plugins from the Observability UI Operator without having to patch the console operator? Answer fom the console team: Plugins signed by Red Hat can be enabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Red Hat plugins aren't enabled by default. If it's a Red Hat operator, we do default the radio button to enabled when installing the operator in the UI. This isn't done through signing, but by looking at the catalog source IIRC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the phrasing as I wanted to describe what could be a solution in the future.


### Why:

The current state of observability signals in the OpenShift console has each operator responsible for its own console plugin. This sometimes results in operators deploying plugins outside their primary scope. As the requirements for the console's UI grow, there's a clear need for a centralized system that can manage diverse UI components spanning across various signals to offer a unified observability experience in the console.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to move all observability UIs into this common plugin or only plugins that don't have a clear 1:1 relationship with an existing operator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only plugins that do not have a clear 1:1 relationship, as they are currently misplaced inside other operators, like the cluster logging operator.


# Observability UI Operator

The Observability UI Operator aims to manage dynamic console plugins for observability signals inside the OpenShift console, ensuring a consistent user experience and efficient management of UI plugins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the proposal, the Observability UI operator itself would contain the implementation of the UI plugins. Is that correct?

If so, it might be good to clarity it just at the beginning, as one could still read this proposal as the plugins being installed from separate components/repositories.


Decouple the responsibility of managing observability UI from operators, enabling each operator to focus solely on its primary functionalities.

Enhance the observability experience on the console by providing components like [Perses](https://github.com/perses/perses) for customizable dashboards and [korrel8r](https://github.com/korrel8r/korrel8r) for correlation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgbernalp just to clarify, after our discussion I still have a doubt: if an operator like NetObserv wants to deploy perses dashboards (and/or datasources), does it also have to have a new plugin integrated via the ObservabilityUIConsolePlugin CRD? Or this can be two independent things, and deployed perses dashboards can anyway go into a "generic pool" like what today exists in the menu "Observe > dashboards", without the need for a dedicated plugin ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak I believe these are independent things. When an operator uses a Perses dashboard, the Observability UI operator must enable the necessary plugins/proxies in the console so the user can see the dashboard, ideally this will fall also into the observe > dashboards with an enhanced UI.

This aligns with what Sam suggested about having less manual configuration for the admins or in this case other operators.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2023
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 19, 2023
@jgbernalp
Copy link
Author

/remove-lifecycle stale

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 28, 2023
Copy link
Contributor

openshift-ci bot commented Dec 28, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jgbernalp
Copy link
Author

/reopen

@openshift-ci openshift-ci bot reopened this Jan 2, 2024
Copy link
Contributor

openshift-ci bot commented Jan 2, 2024

@jgbernalp: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jgbernalp
Copy link
Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 4, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 9, 2024
@periklis
Copy link
Contributor

periklis commented Feb 9, 2024

/remove-lifecycle stale

@periklis
Copy link
Contributor

periklis commented Feb 9, 2024

/remove-lifecycle rotten

@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 21, 2024
Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@periklis
Copy link
Contributor

/reopen

@openshift-ci openshift-ci bot reopened this Feb 21, 2024
Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

@periklis: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jwmatthews for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

@jgbernalp: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 164c9a7 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 28, 2024
Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

5 similar comments
@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

1 similar comment
@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants