Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Kibana notification service #90297

Closed
wants to merge 21 commits into from
Closed

Conversation

mshustov
Copy link
Contributor

@mshustov mshustov commented Feb 4, 2021

RFC for a Kibana notification service. The notification service improves UX by providing a way of drawing the user's attention to an event that occurred in Kibana or another Elastic product.

Open questions:

@mshustov mshustov changed the title [RFCadd notification service [RFC] Kibana notification service Feb 4, 2021
@mshustov mshustov added RFC backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Feb 4, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@mshustov mshustov added the v8.0.0 label Feb 4, 2021
@mshustov mshustov requested review from kobelb, legrego and a team February 4, 2021 16:50
@mshustov mshustov mentioned this pull request Feb 4, 2021
26 tasks
@jowiho
Copy link

jowiho commented Feb 5, 2021

About the icon being a public URL: I'm not sure if that will be feasible in all situations. I'm thinking about on-prem ECE installations where the icon would have to be hosted by the ECE UI (because some customers don't like ECE making outside calls, or run ECE in an air-gapped env, so using an elastic.co URL won't work). But the user-visible ECE UI URL isn't always known to us, and can change. For instance, customers can decide to move their ECE UI behind a reverse proxy.

Even without ECE, how would this work for an air-gapped on-prem Kibana?

Would it make sense to support in-line svg icons? Perhaps combined with a set of predefined icon names (like we have for the Kibana login selector icons)?

@jowiho
Copy link

jowiho commented Feb 5, 2021

I've left a few other comments in the gdoc (https://docs.google.com/document/d/1iIjL1CzlYQ6EcVzOY_5_vgDsGetnxw7Xtp_9PUUJXrs/edit#) because GH comments aren't ideal for discussing review feedback.

@mshustov
Copy link
Contributor Author

mshustov commented Feb 8, 2021

Would it make sense to support in-line svg icons? Perhaps combined with a set of predefined icon names (like we have for the Kibana login selector icons)?

yes, euiIconType: string supports SVG URL and data URL out of the box. I will clarify it in the RFC

Comment on lines 94 to 99
- `message: { i18n_key: string, values: object } | { text: string }` - a notification message is subject to i18n.
Locale might be changed on the Space level (on the User level in the future).
Thus, we cannot determine a message locale during notification creation.
Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.
- `title: { i18n_key: string, values: object, url?: string } | { text: string, url?: string }` - notification header.
- `action: { i18n_key: string, values: object, url: string } | { text: string, url: string }` - notification CTA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about i18n keys changing between releases. We don't have enough rigor or tooling in place to be sure that the keys don't change and then break notifications created before the change (or sourced from different Stack versions). I also think we'd need to be able to add new i18n keys to old Kibana versions so that newer notifications could be rendered in the UI of older Kibanas. Or are we expecting the UNS to handle this for us?

If we have decided that locale config changes should also affect notifications that already exist, we need to account for taking on the work to make sure i18n keys are stable. Would it be acceptable to delay this work and do the i18n translating at creation time until we enforce that?

Also, what do you think about always providing a fallback message string even when i18n_key is provided? That way we can make changes to how we handle this in the future without ever breaking older Kibana instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about i18n keys changing between releases. We don't have enough rigor or tooling in place to be sure that the keys don't change and then break notifications created before the change (or sourced from different Stack versions).

That's a valid concern. We are going to store translations for a given notification:

Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.

we need i18n_key to retrieve all the translations from @kbn/i18n before storing them in an index or sending them to UNS.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify things by only storing the translation for the users current language preference? Old notifications won't change when a user changes their language prefs, but that seems like an acceptable tradeoff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it seems simpler to store the translated text. Spaces does throw an interesting wrench in that plan though... Perhaps if the source of the notification event is some space-specific entity, we use the Space's locale if the user hasn't specified one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify things by only storing the translation for the users current language preference? Old notifications won't change when a user changes their language prefs, but that seems like an acceptable tradeoff.

@alexfrancoeur How do you feel about this from a product point of view? It's not the best UX, but might be an acceptable trade-off.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about this with @alexfrancoeur a bit yesterday. In general, I don't like when surprising things happen in our product and having UI be in multiple languages depending on when the state of a cluster setting changes is surprising in my opinion. That being said, language setting changes are probably infrequent and it is my understanding that our internationalization sophistication is still maturing today, so we agreed this would be okay.

If we put ourselves in a user's shoes, the worst case scenario is probably something like: English is the default language for a deployment, I bring a deployment up, the deployment experiences problems and generates notifications about what went wrong, I do not read English, I switch my deployment language expecting to be able to return and read those messages, I am frustrated that this expectation was not met.

It will be a while before the notification system and internationalization is capable of producing an experience like that, but it's good to paint a real picture.

In case we want to look back at this for context, are our concerns around storing multiple languages in a notification driven by disk concerns or network concerns?

Copy link
Contributor Author

@mshustov mshustov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we want to look back at this for context, are our concerns around storing multiple languages in a notification driven by disk concerns or network concerns?

mostly yes.

I agree that it seems simpler to store the translated text. Spaces does throw an interesting wrench in that plan though... Perhaps if the source of the notification event is some space-specific entity, we use the Space's locale if the user hasn't specified one?

@kobelb That will require extending User profile API to retrieve user-specific settings (once we have them). Worth noting that Kibana doesn't have access to user credentials at notification creation time. Probably we can mark some settings as public not to apply the security model to them.

In general, this approach might not give us a lot of benefits if several users set different locales because we have to duplicate a notification for every locale.
For example, Alert is triggered for UserA, UserB, UserC:
UserA and UserB have locale en: we store a notification with a message in English.
UserC has locale de: we store the same notification with the message in German.

So instead of storing the single notification object with en and de locales, we persist two notification objects, which ends up consuming even more disk space.

interface TranslatedContent {
  translations: {
    [locale: string]: string;
  }
  arguments: TranslateArguments;
}

interface Notification {
  recipient_id: string[];
  priority?: 'normal' | 'high' | 'critical';
  created_at: number;
  expire_at?: number;
  is_pinned?: boolean;
  source_type: string; 
  group_id?: string;
  content: {
    icon?: {
      euiIconType: string;
    } 
    severity?: 'normal' | 'high' | 'critical';
    message: { 
      text: TranslatedContent;
    }
    title: {
      url: string;
      text: TranslatedContent;
    }
    action: {
      url: string;
      text: TranslatedContent;
  }
}

// vs several objects
interface Notification {
  recipient_id: string[];
  priority?: 'normal' | 'high' | 'critical';
  created_at: number;
  expire_at?: number;
  is_pinned?: boolean;
  source_type: string; 
  group_id?: string;
  content: {
    icon?: {
      euiIconType: string;
    } 
    severity?: 'normal' | 'high' | 'critical';
    text: string;
    title: {
      url: string;
      text: string;
    }
    action: {
      url: string;
      text: string;
  }
}

Also, it's unclear how to store notifications without a particular recipient, addressed to all the users: creating a copy of notification per a locale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jowiho @kobelb What do you think?

Considering that at the current moment there is no way to get user locale without user credentials, I'd suggest either starting with storing all possible translations for a message or storing translations for en locale only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@restrry as we discussed over Zoom, I think that both options are fine for the short-term. Long-term, I think that we're going to end up having to store all possible translations to provide the ideal UX.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
This option consumes less storage space but might lead to expensive read operations due to an additional JOIN call to
search for notification state objects. We don’t expect Kibana to read more than 10-20 notifications at once, so the overhead might be acceptable.
- Create a copy for every notification in a user-specific list of notifications. It’s [the recommended way to store data](https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html),
but it leads to duplicating notification messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to do some rough math on the scale of data we'd be duplicating before choosing an option here.

Copy link
Contributor Author

@mshustov mshustov Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My calculation is based on https://www.javamex.com/tutorials/memory/ and https://www.javamex.com/tutorials/memory/object_memory_usage.shtml:
The single notification object:

interface Notification { // 16 bytes
  recipient_id: string[]; // the single uuid 36 char, so use StringBuilder = 144 bytes + 16 bytes for Array = 88 bytes
  priority?: 'normal' | 'high' | 'critical'; // max. 16 char. = 56 bytes 
  created_at: number; // 8 bytes
  expire_at?: number; // 8 bytes
  is_pinned?: boolean; // 8 bytes
  source_type: string; // max. 16 chars = 56 bytes
  group_id?: string // StringBuilder max 100 char. = 144 bytes
  content: { // 16 bytes
    icon?: { // 16 bytes
      euiIconType: string; // StringBuilder = 144 bytes. URL or inlined SVG can take up to several KB
    } 
    severity?: 'normal' | 'high' | 'critical' // max. 16 char. = 56 bytes 
    message: { // 16 bytes
      translations: { // 16 bytes
        [locale: string]: string; // Unicode, max. 80 char. x 2bytes = 160 bytes x N locales
      }
      arguments: { // 16 bytes
       values: object; // assume max. 100 bytes
       defaultMessage: string; // StringBuilder max 100 char. = 144 bytes
       description: string; // StringBuilder max 100 char. = 144 bytes
      }
    }
    title: { // 16 bytes
      url: string; // url max. 200 char x 2 bytes = 400 bytes.
      translations: { // 16 bytes
        [locale: string]: string; // Unicode, max. 30 char. x 2bytes = 60 bytes x N locales
      }
      arguments: { // 16 bytes
       values: object;  // assume max. 100 bytes
       defaultMessage: string; // StringBuilder max 100 char. = 144 bytes
       description: string; // StringBuilder max 100 char. = 144 bytes
      }
    }
    action: {
      url: string; // url max. 200 char x 2 bytes = 400 bytes.
      translations: { // 16 bytes
        [locale: string]: string; // Unicode, max. 30 char. x 2bytes = 60 bytes x N locales
      }
      arguments: { // 16 bytes
       values: object; // assume max. 100 bytes
       defaultMessage: string; // StringBuilder max 100 char. = 144 bytes
       description: string; // StringBuilder max 100 char. = 144 bytes
      }
    }
  }
}

with N = 3 supported locales, one message consumes ~ 3.7KB. (It will a bit more since I have no idea how JSON object keys are stored in java)

for an state with interface:

interface  NotificationState { // 16 bytes
  notification_id: string; // the single uuid 36 char, so use StringBuilder = 144 bytes
  recipient_id: string; // StringBuilder = 144 bytes
  is_read: boolean;  // 8 bytes
  is_pinned: boolean; // 8 bytes
}

one state object consumes ~0.3KB.

let's assume that the deployment has a message addressed to 10 users, then the scenario with storing notification content with the state will require (3.7 + 0.3) x 10 ~ 40KB, while the case storing message content separately from the state requires 3.7 + (0.3 x 10) ~ 6.7 KB.

- Change a notification state.

Kibana-specific entities shouldn’t leak to UNS:
- Kibana sends message translations for all the supported locales.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can elaborate on what this means? We would send the fully translated string, one for each locale?

Copy link
Contributor Author

@mshustov mshustov Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. If we want to support i18n on Cloud and show notifications across deployments with different Kibana versions.
@jowiho Can you see a better option? This one consumes a lot of disk space and network bandwidth, but a notification contains everything that's needed to show it in UI.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just commented above that we could store the notification in a single language, according to the user's preference at that time. Not sure what the best interface would be. Either the notification source would first lookup the user's language preference and then generate a notification in that language. Or the source would generate a notification in all languages, and the notification service would lookup op the preferred language and store the notification in that language.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Comment on lines 292 to 293
Writing to the same Notification storage might cause conflict on the `write`. If the storage mechanism implementation
doesn’t make them impossible, we will have to use a single place to change notification status (via the [queue](#localrepository)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that we should leverage the ES-backed queue to determine the order of state changes? So the flow would be:

  • NotificationService.markAsRead() is called
    • NotificationEvent (or similar) is put into ES queue
    • markAsRead returns (even before the change has been fully processed)
  • Later on, the background task picks up events form the queue:
    • Take the latest X events in queue and process them serially

This should eliminate write conflicts because only one instance of the background task is running (a guarantee provided by Task Manager).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that we should leverage the ES-backed queue to determine the order of state changes?

It depends on the data storage implementation. If we end up storing a notification state object per user, the probability of writing conflicts is reduced significantly.

A notification cannot have a particular recipient in the lack of a way to identify a user in the system. A notification will be shown to all the users instead.
User-specific notification state (*read_status*) is not stored in Kibana but browser Local Storage.
### Phase II
When [user profiles](https://github.com/elastic/kibana/issues/17888) are supported, Kibana UI starts showing user-specific notifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have is how we will handle the case of when security is disabled. How do you imagine the storage mechanism working in that case? Do we know if we could just not support notifications in that type of deployment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if we could just not support notifications in that type of deployment?

This is my vote. The onboarding notifications potentially make sense w/o security, and I could make an argument for others, but this could also be an enticing forcing function to get our clusters secured.

We are still planning on enabling security by default starting in 8.0, so the burden of getting security setup will be greatly diminished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have is how we will handle the case of when security is disabled. How do you imagine the storage mechanism working in that case? Do we know if we could just not support notifications in that type of deployment?

The storage mechanism doesn't depend on user_id as long as a notification source passes it during notification creation.

We need user_id when retrieving notifications addressed to a particular recipient.
Lack of user_id is not an obstacle to show notifications addressed to all the users (referred to as global notifications in the RFC).

Also, the ability to disable Security plugin has been deprecated #85159

cc @alexfrancoeur as it is a product decision as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the ability to disable Security plugin has been deprecated #85159

Note that enabling Kibana's security plugin !== having stack security features enabled. Future versions of the stack will still be able to run without security features -- they just won't be able to explicitly disable Kibana's security plugin anymore.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prefer for "global" cluster notifications to still function even if the parts of our stack that allow us to differentiate users is unavailable. That being said, it isn't yet clear to me how complex bringing user identity into Kibana and this system will prove to be (very is my guess). So we should keep an eye on this and I'd like to understand the details of this better myself as we go.

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @restrry for putting together this great RFC! I love the structure and the thinking process! Also the design, of course :)

I added some NIts and questions. Another thing I wanted to point out: the rendered link in the description looks outdated to me.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Kibana HTTP API:
- Endpoint: `GET /notifications/`
- `search_after?: number` - Unix timestamp to start searching from.
- `type_id?: string[]` - list of source types to filter by.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be an allow list? If so, as explained later on in the Filtering section, when a user hides a specific channel, we'll store in the local storage all the types the user wants to see. What happens when a new channel pops up (i.e.: after an upgrade, or a new type that the user never saw before, so it didn't add it to the list)?
Should we add the query parameter exclude_type_id?: string[]?

Copy link
Contributor Author

@mshustov mshustov Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a new channel pops up

Maybe, it is ok to show the new type?
If we hide it by default, the user will not even know that a new type is available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hide it by default, the user will not even know that a new type is available.

That's why I'm asking, the suggested behaviour in this RFC makes me think we'll leave out any new types if we follow the type_id list based on the local storage "known subscription list".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more, I see the point to add exclude_type_id, but I don't think it justifies the complexity.
We will have to support include / exclude options in filtering for all the parameters. It's hard to reason about what is the final combination of filters will be applied, esp. if exclude_type_id and include_type_id contain the same id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, as long as we keep this limitation in mind when handling the local storage to handle the unsubscribed channels in our first implementation.

- The service relies on the yet-to-be-added ability to uniquely identify the user, enumerate users with a given set of permissions.
- The service doesn't provide a built-in security mechanism but relies on a notification source implementation.
- Lack of a centralized i18n service makes message transferring over-bloated due to the necessity to send all the possible translations over the wire.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We are relying on X-Pack features (like security and task manager), so we are tied to that. Having to maintain both: the current Newsfeed and NC as an X-Pack replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current Newsfeed and NC as an X-Pack replacement.

We should support Notifications for plugins outside of x-pack. With the latest licensing changes, it's likely a subset of security plugin functionality ends up outside of the x-pack folder. But still waiting for an announcement from Tech Leads. @kobelb

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
- `pinned_status?: boolean` - a flag to pin a notification at top of the list of notifications. Used by a source to draw a user's attention to a notification.
- `source_type: string` - source type. the same as source domain name: `cloud`, `alerting`.
- `source_subtype: string` - source-specific type (`cloud.insufficient_funds`, `alerting.something`)
- `group_id?: string` - identifier to associate several notifications in the UI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the grouping be only based on group_id / can distinct sources use the same group_id and have their notifications grouped together under the same group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the grouping be only based on group_id

group_id per a source_type. Alerting will leverage it to group notifications for an event: an alert triggered, an alert acknowledged, etc. With this grouping, UI provides a sort of a domain event timeline.

In the MVP implementation, a click on + 3 messages shows them in the flyout.
2021-02-11_18-37-30

Later, when we have Notification Center UI, a click on + 3 messages navigates to the NC UI and applies a filter by group_id.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Comment on lines 94 to 97
- `message: { i18n_key: string, values: object } | { text: string }` - a notification message is subject to i18n.
Locale might be changed on the Space level (on the User level in the future).
Thus, we cannot determine a message locale during notification creation.
Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the message must contain translations for all the supported locales

Not sure to understand that, given the described format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering the same, I assumed this would be something like

message: { i18n_key: string; values: Record<string, string> }

Where Record<string, string> is { [languageCode]: 'translation of message' }

Copy link
Contributor Author

@mshustov mshustov Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where values: { dictionary, defaultMessage, values, description }.
Ok, I updated the section with

import type { TranslateArguments } from '@kbn/i18n';

interface TranslatedContent {
  translations: {
    [locale: string]: string;
  }
  arguments: TranslateArguments;
}

interface NotificationContent {
...
  message: TranslatedContent;
  title: TranslatedContent & { url?: string };
  action: TranslatedContent & { url: string };
}

We need to extend @kbn/i18n with a method returning all the translations for a given string.

notification.create({
  message: i18n.getAllForKey('hello.wonderful.world', {
    defaultMessage: '...',
    values: {...},
    description: '...'
  });
});

Comment on lines 181 to 182
- `expired_at` if not specified. Kibana allows administrators to set how long notifications will be stored. Falls back to 30 - 90 days be default.
- Writing new incoming messages to the Notification storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kibana allows administrators to set how long notifications will be stored

How will that work once we switch to UNS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect administrators will configuration a retention period:

  • on Cloud via Cloud UI
  • on-prem via kibana.yml

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Comment on lines 302 to 303
Filtering allows users to hide notifications they aren’t interested in. Kibana UI attaches a list of filters when requesting notifications.
Kibana stores filter as a part of User settings data to provide a seamless experience between different deployments on Cloud and user sessions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What 'User settings' refers to exactly ? If it's stored using the local Kibana's user data / user persisted data, how will this be accessible from cloud / other deployments?

Copy link
Contributor Author

@mshustov mshustov Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What 'User settings' refers to exactly ?

Right now, it refers to filters only.

If it's stored using the local Kibana's user data / user persisted data, how will this be accessible from cloud / other deployments?

The notification system writes them to the same storage when notifications are persisted. On Cloud, it will be stored in UNS to provide the same settings across different deployments.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
A notification might be addressed to:
- a user
- a specific group of users
- all the users
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we represent a notification that's addressed to all users? I don't expect we'd want to actually place all users within the recipient list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it initially, but removed later. Notifications addressed to all users will be created via a special method

createForAllUsers(events: Omit<NotificationEvent, 'recipient_id'>): void;

Kibana reuses Notification service implementation on Cloud but enhances it with additional pluggable storage of external notifications.
![img](../images/0014/alternative_implementation.png)

The main benefit of this approach is that Kibana always uses the same service in any environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of other benefits of this approach come to mind for me:

Graceful degradation

If the UNS is unavailable for any reason (even a planned/scheduled outage), then all cloud hosted instances will be unable to publish or retrieve notifications, even if these notifications were generated by that very Kibana instance.

If we took this pluggable storage approach, then locally sourced notifications could still work, as they wouldn't rely on the UNS for storage or retrieval.

To me, this is similar to your Netflix feed: the different categories are managed by different services to provide a single UX. If one of those services is unavailable, then the rest of your feed continues to work.

Extensibility

Once solutions start taking advantage of the notification service, we might uncover new requirements, which may in turn require changes to the data model. If we are coupled to the UNS's representation of the data model, then we are potentially limited by what we can do (short of a _meta dumping ground).

If, on the other hand, we store locally sourced events locally, then we have full control over the data model, and we can perform data migrations as needed.

We would still need to support the UNS's data model, but the types of events created by external sources may not require the same amount of functionality as locally sourced events.

Flexible authorization model

We could choose to authorize locally sourced notifications differently from externally sourced notifications. The UNS (i.e. externally sourced notification) is likely in a better position to perform auth on their own, but locally sourced notifications could have a stack-specific ACL attached.

Migration to ESS?

If an on-prem user wants to move their cluster to the Cloud, then a snapshot/restore would allow them to retain their existing notifications and notification history.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this some more, I don't think many of these benefits really apply if locally sourced notifications have to be available consistently in other Elastic products

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graceful degradation
If the UNS is unavailable for any reason (even a planned/scheduled outage), then all cloud hosted instances will be unable to publish or retrieve notifications, even if these notifications were generated by that very Kibana instance.

Later, when UNS is available, it should synchronize notifications and their state between the remote and local storage. For sure, it's undoable, but it's not simple.

Extensibility
Once solutions start taking advantage of the notification service, we might uncover new requirements, which may in turn require changes to the data model.

Agree. It might be useful if we are ready to put up with the cost of data and UI inconsistency between a notification shown in Kibana UI and Console UI.

Flexible authorization model
We could choose to authorize locally sourced notifications differently from externally sourced notifications. The UNS (i.e. externally sourced notification) is likely in a better position to perform auth on their own, but locally sourced notifications could have a stack-specific ACL attached.

Based on the requirements that Cloud must show Kibana notifications in Cloud Console UI, how would it implement the Kibana-compatible authorization model?

Migration to ESS?
If an on-prem user wants to move their cluster to the Cloud, then a snapshot/restore would allow them to retain their existing notifications and notification history.

That's a valid point as long as we are sure that a user migrates to deployment with the same configuration as the local one.

A notification cannot have a particular recipient in the lack of a way to identify a user in the system. A notification will be shown to all the users instead.
User-specific notification state (*read_status*) is not stored in Kibana but browser Local Storage.
### Phase II
When [user profiles](https://github.com/elastic/kibana/issues/17888) are supported, Kibana UI starts showing user-specific notifications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if we could just not support notifications in that type of deployment?

This is my vote. The onboarding notifications potentially make sense w/o security, and I could make an argument for others, but this could also be an enticing forcing function to get our clusters secured.

We are still planning on enabling security by default starting in 8.0, so the burden of getting security setup will be greatly diminished.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
- Security solutions plugin - Cases
- Alerting plugin - Alerts

The notification system supports a user-specific state for notifications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not already done for Phase I, then it sounds like Phase II would need space-awareness? Or will you always see notifications from all of your spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, notifications are space-agnostic.

Spaces are difficult to use on the Cloud because a user does not have an active Space during the session in Cloud Console UI.

For on-prem Kibana, it makes sense to show notifications from all the user's Spaces. Say, you have an alert triggered in Space-A, why shouldn't you receive a notification even if your active Space is Space-B?

@alexfrancoeur could you please confirm these conclusions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming the current notification UX/UI is space agnostic, and the spirit of the feature is to provide visibility across our stack/products where ever you are (as you illustrate @restrry ).

That being said, I could see filtering by space making sense at some point. If we wanted to include the concept of a space as an optional part of the notification data model (intended as a way to organize notifications) what challenges would we run into?

Copy link
Contributor Author

@mshustov mshustov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to include the concept of a space as an optional part of the notification data model (intended as a way to organize notifications) what challenges would we run into?

Let's continue with Alerting example. When an alert notification is created, it stores in what Space the alert has been created. https://github.com/elastic/kibana/blob/master/x-pack/plugins/alerts/server/alerts_client/alerts_client.ts#L1335
It can pass spaceId as a part of Notification content. Notification service itself cannot get this information in runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can pass spaceId as a part of Notification content. Notification service itself cannot get this information in runtime.

Agreed that the default experience should be showing all notifications across all spaces, but if we wanted to support filtering by space, couldn't we automatically source the current space from the request context when a notification is created? There may be some licensing and/or technical limitations that make this difficult, but it may be worth solving those if we want this experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we automatically source the current space from the request context when a notification is created?

Sure, we can do it when we have a request context (as in the case of Background search). but it's not always the case - we don't have request context when an alert is triggered. I'd consider SpaceId to be passed to provide A-Single-Way-To-Do-Things™.

although there are some odd instances with >100k users. So we consider two options at the moment (see the [workflow](https://docs.google.com/document/d/18s-BTHog_oPaQKf871gzuhxkah4pK1GYJM9DCLceaLo/edit#)):
- Store notification state separately from a notification. State for new notifications created for every user when they change the state.
```typescript
interface NotificationState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we want audit logging for this state? Would administrators find it useful to know that a specific user read a specific notification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of such a requirement at the moment. @alexfrancoeur

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say the answer is "no" for the MVP and we can think about it more when read-state moves server-side. I don't know enough about our audit logging myself, so feel free to speak up if anyone disagrees.

If read-state ends up being saved-object backed, will we kindof get this for free with audit logging or are system-managed saved objects exempt?

Copy link
Contributor Author

@mshustov mshustov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If read-state ends up being saved-object backed, will we kindof get this for free with audit logging or are system-managed saved objects exempt?

That's correct, we will have a record that someone updates SO (saved_object_update event, the full list of events). But we will need to add integration if we want to provide a more meaningful event on notification state change: notification_marked_read, notification_marked_unread, etc.

Kibana-specific entities shouldn’t leak to UNS:
- Kibana sends message translations for all the supported locales.
- Kibana-specific groups of recipients unfolded to a list of *recipient_id*.
- *recipient_id* is set to *elastic_id*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the distinction between a recipient_id and an elastic_id? Will all users on ESS have a unique elastic_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the distinction between a recipient_id and an elastic_id?

I suspect it will be elastic_id as long as we need to unambiguously identify a user across the Stack and Cloud.
In any case, Notification plugin relies on:

  • recipient_id passed in from outside when creating the notification. So recipient_id form is defined by Security plugin eventually.
  • getting user id from Security plugin when requesting notifications from Kibana UI.

I'd refer to Security team whether:

  • Will all users on ESS have a unique elastic_id?
  • Will elastic_id be created for on-prem Kibana users or user id will be in another format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer here, but I do think it's important to not overlook. As far as I'm aware, we don't have precedent to rely on when figuring out how to translate from "Cloud users" to "Stack users".

- `created_at: number` - Unix timestamp in UTC timezone when a notification has been created.
- `expired_at?: number` - Unix timestamp in UTC timezone when notification can be removed from the system.
- `pinned_status?: boolean` - a flag to pin a notification at top of the list of notifications. Used by a source to draw a user's attention to a notification.
- `source_type: string` - source type. the same as source domain name: `cloud`, `alerting`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just thrown off by the examples here, but I assume the plugins/sources relationship is one:many? (That is, a plugin could create notifications as multiple sources). Or are you suggesting that for internal sources, source type === plugin name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, a plugin could create notifications as multiple sources

I assumed source_type to be a domain name. Sometimes source type === plugin name, sometimes source type might be the same for several plugins in the common domain.

OTOH the only use case for source_type usage at the moment is filtering.

Comment on lines 94 to 97
- `message: { i18n_key: string, values: object } | { text: string }` - a notification message is subject to i18n.
Locale might be changed on the Space level (on the User level in the future).
Thus, we cannot determine a message locale during notification creation.
Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering the same, I assumed this would be something like

message: { i18n_key: string; values: Record<string, string> }

Where Record<string, string> is { [languageCode]: 'translation of message' }

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
mshustov and others added 7 commits February 15, 2021 14:14
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Comment on lines 94 to 99
- `message: { i18n_key: string, values: object } | { text: string }` - a notification message is subject to i18n.
Locale might be changed on the Space level (on the User level in the future).
Thus, we cannot determine a message locale during notification creation.
Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.
- `title: { i18n_key: string, values: object, url?: string } | { text: string, url?: string }` - notification header.
- `action: { i18n_key: string, values: object, url: string } | { text: string, url: string }` - notification CTA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it seems simpler to store the translated text. Spaces does throw an interesting wrench in that plan though... Perhaps if the source of the notification event is some space-specific entity, we use the Space's locale if the user hasn't specified one?

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Locale might be changed on the Space level (on the User level in the future).
Thus, we cannot determine a message locale during notification creation.
Considering that Kibana notifications might be rendered outside of Kibana UI (in Cloud UI, for example), the message must contain translations for all the supported locales.
- `title: TranslatedContent & { url?: string }` - notification header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming these URLs are for Kibana, the backward compatibility guarantees for URLs don't cross major versions. To side-step this complication, would it make sense for us to drop all Kibana-generated notifications from the prior major version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To side-step this complication, would it make sense for us to drop all Kibana-generated notifications from the prior major version?

In this case, the user will lose access to all notifications created in the previous version.
If we store notifications with the SO mechanism, might it be better to implement migrations for notifications? Plugins can extend notification service migrations with custom logic. We have something similar for embeddable plugin https://github.com/restrry/kibana/blob/099e9bc50fafa89301d438ae7112330da5ab421c/src/plugins/kibana_utils/common/persistable_state/index.ts#L60

Kibana-specific entities shouldn’t leak to UNS:
- Kibana sends message translations for all the supported locales.
- Kibana-specific groups of recipients unfolded to a list of *recipient_id*.
- *recipient_id* is set to *elastic_id*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer here, but I do think it's important to not overlook. As far as I'm aware, we don't have precedent to rely on when figuring out how to translate from "Cloud users" to "Stack users".

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved

# Drawbacks

- It is not entirely clear what functionality will be supported by UNS and when this will be known.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uncertainty around UNS does concern me because it's going to be a drop-in replacement for the local implementation. There's the potential that both systems won't integrate seamlessly because we know so little about it.

rfcs/text/0014_notification_service.md Outdated Show resolved Hide resolved
Integration with UNS is something that is not going to happen any time soon. The work can be divided into the following phases:

### Phase I
Kibana shows notifications created by the Newsfeed Kibana plugin in Notification flyout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how we're going to "pull" from the newsfeed and persist these notifications locally? Will this be using TaskManager? Are there any situations where we currently delete newsfeed items that complicate this implementation?

Copy link
Contributor Author

@mshustov mshustov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any situations where we currently delete newsfeed items that complicate this implementation?

The current Newsfeed implementation doesn't store any data. It fetches notifications from the browser on every visit and shows them in UI.
On Phase I we start with moving the existing Newsfeed plugin logic to the Notification service and add transformation layer to convert https://github.com/elastic/newsfeeds/blob/master/data/kibana.yaml#L10-L17 to the notification compatible interface. News is shown in Notification UI, but not stored in Notification storage.

Do you know how we're going to "pull" from the newsfeed and persist these notifications locally? Will this be using TaskManager? Are there any situations where we currently delete newsfeed items that complicate this implementation?

Newsfeed plugin can fetch notifications and create a notification for every news item. Every news has hash property, which is recommended to use as id(see output fields). We can make notification creation idempotent operation and use hash as id to avoid creating duplicates.

Or we have to track manually what news already created notifications.

Integration with UNS is something that is not going to happen any time soon. The work can be divided into the following phases:

### Phase I
Kibana shows notifications created by the Newsfeed Kibana plugin in Notification flyout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the Kibana server pull the notifications from the remote newsfeed service won't work if the user has a network topology where Kibana can't access the newsfeed on the public internet. It's my understanding that this is a significant number of users. @afharo do you happen to know a rough percentage of users that have Kibana firewalled off from the public internet?

Copy link
Member

@afharo afharo Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb I think this is a key discussion you raised here (and I think it extends to more than the Newsfeed context):

To answer your question, as of today: close to 70% of the clusters that reported in the last month, did so from the server at least once. We enabled server-side reporting ON by default on 7.9.0. If we filter the stacks from that version, the percentage grows to ~90%. Disclaimer: this is only for those clusters opted-in telemetry

Newsfeed nowadays runs entirely on the public side: each user's browser fetches the newsfeed as they load the page (and keeps the data in the browser's memory, not even the local storage). The Kibana server is not involved in the process: it doesn't fetch/store anything. Even when the trend looks like we're heading to more relaxed firewall rules, I think we should consider UI-sourced notifications, and the implications they might have. In my experience, if we don't provide a common way to do so, plugins will follow their own way, potentially leading to security issues.

If we solve how UI-sourced notifications may work, I don't think we need to worry about the implementation of the Newsfeed fetcher.

Apart from working around the potential firewall issue, other UI-sourced notifications I can think of:

  • Accumulate toasts? i.e.: Errors, "Upgrade your license", ...
  • Search session (although this could be generated from the server)
  • "Upgrade your browser for a better experience", "Use this feature to save you these 10 clicks you always go through" (maybe they are more related to Suggestion service though).

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider UI-sourced notifications, and the implications they might have. In my experience, if we don't provide a common way to do so, plugins will follow their own way, potentially leading to security issues.

This RFC doesn't distinguish between Kibana client-side and server-side sources.
But I would prefer not to introduce the concept of UI-sourced notifications as long as possible because of:

  • the security concern. The browser environment is open, it much easier for an intruder to use API. For example, if we provided an HTTP endpoint to create a notification within the system, someone would use it to spam the users with notifications.
  • proper separation of concerns. It's almost always a good idea to keep business logic out of UI: it's easier to reason about and test it.
  • coordination problem: a user can have Kibana open in several tabs (and on different devices). We have to come up with a solution to coordinate notification creation to avoid duplicate records.

Copy link
Contributor Author

@mshustov mshustov Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we filter the stacks from that version, the percentage grows to ~90%. Disclaimer: this is only for those clusters opted-in telemetry

and exclude Elastic Cloud the percentage on Kibana instances not hidden behind the firewall is ~91%

Copy link
Contributor

@kobelb kobelb Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~90% is a lot better than I expected...@restrry do you think it's going to make the architecture a lot more complicated to get the last 10% working with the newsfeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb It's doable if we are okay to live with API that has the potential problems outlined above #90297 (comment)

Are there other alternatives? What if we reconsider the decision to display News as part of the Notification experience? @thesmallestduck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other alternatives? What if we reconsider the decision to display News as part of the Notification experience?

Newsfeed notifications are paused. We are going to wait for more use-cases of UI-sourced notifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes RFC Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants