Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS audit events for alerting #84113

Merged
merged 19 commits into from
Dec 4, 2020
Merged

ECS audit events for alerting #84113

merged 19 commits into from
Dec 4, 2020

Conversation

thomheymann
Copy link
Contributor

@thomheymann thomheymann commented Nov 23, 2020

Closes #80288
Closes #49823

Release Note

Adds ECS audit events for alerts and actions plugins.

Docs

https://kibana_84113.docs-preview.app.elstc.co/guide/en/kibana/master/xpack-security-audit-logging.html

Summary

The following events are now logged in ECS format when the new audit logger is enabled:

NOTE: @LeeDr - some of these event names changed, see https://www.elastic.co/guide/en/kibana/7.11/xpack-security-audit-logging.html

  • connector_create
  • connector_update
  • connector_delete
  • connector_get
  • connector_find
  • alert_rule_create
  • alert_rule_update
  • alert_rule_update_api_key
  • alert_rule_enable
  • alert_rule_disable
  • alert_rule_delete
  • alert_rule_get
  • alert_rule_find
  • alert_rule_mute
  • alert_rule_unmute
  • alert_instance_mute
  • alert_instance_unmute

Approach

  • Even though alerts are basically saved objects and the events share a lot of similarities with saved object CRUD operations we decided to create dedicated events for alert rules and connectors since they have their own authentication model and are hidden saved objects so won't be visible to users as a saved object in stack management screen.
  • The event names align with the alerting working group terminology, rather than the code. (e.g. action=connector, alert=alert_rule)
  • Write operations should be logged after any authorisation checks have been carried out but before the operation starts in order to ensure that a record is persisted even in case of a fatal crash. In order to log the saved object id alongside the create operation we are generating an ID using core generateSavedObjectId function directly within alerts and actions client (based on Allow predefined ids for encrypted saved objects #83482)

Known issues

  • Any event that is logged as part of a background task / fake request will not have the user field populated. While this is not the best user experience it does not create black spots in the audit trail since the user who created/updated the object can be looked up by querying the audit log for previous alert_rule_create or alert_rule_update events. This should be tackled as part of a separate PR once we worked out how we can support a concept of "authenticated fake requests". (Related to Scope-able elasticsearch clients #39430)

Out of scope

  • The legacy audit events are still logged. I will remove them as part of a separate PR together with the rest of the legacy audit logger.
  • connector_execute events are out of scope as the code path is currently different between direct execution and async execution (background tasks) which would cause duplicate or missing events. @gmmorris confirmed that the code can be refactored to support audit logging.

How to test

  1. Enable audit logging:
xpack.security.audit.enabled: true
xpack.security.audit.appender:
  kind: file
  path: /kibana/audit.log
  layout:
    kind: json
  1. Start Kibana and Elasticsearch in SSL mode and watch the audit log:
yarn es snapshot --license trial --ssl
yarn start --ssl
tail -f /kibana/audit.log
  1. Create a connector / alert via Stack Management in Kibana

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@thomheymann thomheymann added release_note:enhancement v8.0.0 Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.11.0 labels Nov 23, 2020
@thomheymann thomheymann requested review from a team as code owners November 23, 2020 16:18
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@@ -311,6 +311,7 @@ export {
exportSavedObjectsToStream,
importSavedObjectsFromStream,
resolveSavedObjectsImportErrors,
generateSavedObjectId,
Copy link
Contributor

@pgayvallet pgayvallet Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reduce non-contract APIs exposed from core to a minimum (the import/export static functions are remains from legacy that will be moved to the SO service mid term). Any reason this generateSavedObjectId is exposed as a static function instead of being provided by the savedObjects service contract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pure function that doesn't require any shared state so didn't make sense to me to add it to the SavedObjectsSerializer. Maybe we could add it as a static method to SavedObjectsServiceSetup if that addresses your concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rudolf wdyt? is exposing generateSavedObjectId statically from the index acceptable to you, or do you think this should be exposed from the SO service contract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be to put this into kbn-utils package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be to put this into kbn-utils package.

I don't think we should. Separate by domain is one of our main principles, so it should belong to saved objects.
++ to add it as a setup/start contract property. id creation operation doesn't sound like serializer responsibility.
btw why other code in SO service still calls uuid to generate id?

if (object.id == null) object.id = uuid.v1();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However in that case, I'd say that the best option may be to stick to using uuid.v1 from the consuming code until we do implement this new API. @rudolf @restrry wdyt?

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However in that case, I'd say that the best option may be to stick to using uuid.v1 from the consuming code until we do implement this new API. @rudolf @restrry wdyt?

I believe the logic for ID generation is changing in the very near future (👀 @jportner), so I feel like it'd be safer to keep the logic consolidated in the static helper. Asking consumers to use uuid.v1 is leaking an implementation detail of the SO service, which feels worse to me than exposing a static function from the service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the logic for ID generation is changing in the very near future (👀 @jportner)

Sharing Saved Objects will implement ID (re-)generation when existing objects are converted, but I have no plans to change "regular" ID generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it in SavedObjectsUtils then. it would be easier to track than greping for uuid anyway when we'll need to move that to the service contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done. Please give it another parse.

@thomheymann thomheymann requested a review from a team as a code owner November 30, 2020 11:15
@thomheymann thomheymann requested a review from a team November 30, 2020 11:15
@thomheymann thomheymann requested a review from a team as a code owner November 30, 2020 11:15
@gmmorris gmmorris self-requested a review November 30, 2020 11:43
Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(approved changes to Alerts & Actions)
LGTM, thanks for taking on this mammoth of a task!

It looks good and I'm happy to approve, but there's one thing that worries me (would be good to hear more from @elastic/kibana-alerting-services on this), which is that the audit logging concerns are yet another think the AlertsClient does.... I was really hoping we could hide that implementation inside of the AlertsAuthorization class (same thing fo Actions).

I'm happy to ago ahead and merge this with the hope that we find away of decoupling these things later... but I thought it might be worth flagging incase anyone on the team feels otherwise.

x-pack/plugins/actions/server/lib/audit_events.ts Outdated Show resolved Hide resolved
x-pack/plugins/actions/server/lib/audit_events.ts Outdated Show resolved Hide resolved
this.auditLogger?.log(
alertRuleEvent({
action: AlertRuleAction.CREATE,
outcome: EventOutcome.UNKNOWN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "unknown" because we haven't actually tried to create yet? If so, shouldn't this be explicit about not having an unknown outcome as much as just not having reached an outcome yet?

I might be misunderstanding the reasons it's unknown 🤔
Just wondering if 'Unknown' is conceptually the right way of thinking about this event as it could be confusing if we see this in the log...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, the outcome for write operations is logged as unknown since we don't know whether the operation succeeded or failed at that point in time.

I wanted to be explicit about it so that it's clear to the user that the operation has not completed yet, which is also why the message is phrased in present progressive rather than past tense.

I also wanted to keep the option open to add logging the outcome as a separate event if users require that information for auditing purposes.

My concern with omitting the outcome field would be that users might not fully realise what the event means and when it was logged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I guess the very existence of an unknown outcomes suggests to me that there was an outcome, we just couldn't identify it...
I can go either way, which is why I was happy to approve 👍

x-pack/plugins/alerts/server/alerts_client/audit_events.ts Outdated Show resolved Hide resolved
@mikecote
Copy link
Contributor

Hi @thomheymann, regarding the terminology, I know we've been back and forth on what terms to use so the alerting team had a discussion on what to use. We've decided against gradually introducing the new terms into our codebase (would cause increase in confusion vs changing everything at once) as it seems the new terminology won't be used just yet.

I hope this doesn't cause too many changes in your PR but please let me know if I can help out. I did a pass at the events in the description and came up with a map of what I think they would now become:

  • connector_create
  • connector_update
  • connector_delete
  • connector_get
  • connector_find
  • alert_create
  • alert_update
  • alert_update_api_key
  • alert_enable
  • alert_disable
  • alert_delete
  • alert_get
  • alert_find
  • alert_mute
  • alert_unmute
  • alert_instance_mute
  • alert_instance_unmute

@mikecote mikecote self-requested a review December 1, 2020 16:01
Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started my review but ran short on time to finish. I finished looking at the implementation for actions plugin and left a few comments. I'm sure some of these would apply to the alerts plugin as well but I haven't looked at it yet.

x-pack/plugins/actions/server/lib/audit_events.ts Outdated Show resolved Hide resolved
x-pack/plugins/actions/server/actions_client.ts Outdated Show resolved Hide resolved
this.preconfiguredActions.find((preconfiguredAction) => preconfiguredAction.id === id) !==
undefined
) {
throw new PreconfiguredActionDisabledModificationError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to capture the errors thrown when the action type is disabled? If so, there's a few other places (this.actionTypeRegistry.ensureActionTypeEnabled) that will throw similar errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this error occur? Is it role based or just some kind of global configuration?

I thought it was an authorisation error but looks like I made a mistake here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this error occur? Is it role based or just some kind of global configuration?

There are two types of actions in the system, pre-configured and dynamic / user created actions. This code will throw whenever the update API is called for a pre-configured action. Since those actions are configured via kibana.yml, we don't support updates and throw the error.

So I think we may be ok here to skip auditing this error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've double checked with Oleg and we think this is audit worthy. As an auditor you'd want to know why someone is trying to update preconfigured connectors.

I'll make ensureActionTypeEnabled error are included in the audit log.

Are there any other errors I should include?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at the errors thrown inside ensureActionTypeEnabled, I'm not sure they're audit worthy.

The first type of error is thrown when an action type is disabled. That just feels like a validation error.

The second type of error is thrown when the license doesn't allow a certain type. That's definitely not security related / audit worthy.

I'd say let's leave these as is until we get a requirement to include those errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works for me!

this.preconfiguredActions.find((preconfiguredAction) => preconfiguredAction.id === id) !==
undefined
) {
throw new PreconfiguredActionDisabledModificationError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here:

Is it intentional to capture the errors thrown when the action type is disabled? If so, there's a few other places (this.actionTypeRegistry.ensureActionTypeEnabled) that will throw similar errors.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KibanaApp changes LGTM, just adding ids to the saved object migration tests which should have been there anyway

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the remaining changes, LGTM once the requests below are complete! Great work!

@legrego legrego requested a review from jportner December 3, 2020 14:52
@jportner
Copy link
Contributor

jportner commented Dec 3, 2020

@elasticmachine merge upstream

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, mostly nits below!

Comment on lines +88 to +90
public static isRandomId(id: string | undefined) {
return typeof id === 'string' && UUID_REGEX.test(id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
public static isRandomId(id: string | undefined) {
return typeof id === 'string' && UUID_REGEX.test(id);
}
public static isRandomId(id: string | undefined) {
return typeof id === 'string' && UUID_REGEX.test(id.toLowerCase());
}

Alternatively you could change the regex to search for upper-case characters 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think UUIDs are case insensitive: https://github.com/uuidjs/uuid/blob/master/src/validate.js

I'd prefer to use the package directly but upgrading to latest version is not trivial and should be a PR in its own right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the uuid package is wrong. See the spec:

 Each field is treated as an integer and has its value printed as a
 zero-filled hexadecimal digit string with the most significant
 digit first.  The hexadecimal values "a" through "f" are output as
 lower case characters and are case insensitive on input.

But I suppose if we only ever care to validate IDs that were generated with the uuid package, the current implementation is correct 😄

Comment on lines 395 to 402
ids.forEach((id) =>
this.auditLogger?.log(
connectorAuditEvent({
action: ConnectorAuditAction.GET,
savedObject: { type: 'action', id },
})
)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not intuitive, but when executing a bulk operation, the entire operation may succeed even though there are one or more errors for individual objects. For example, if an invalid type was specified (bad request), or if the object was not found.

To be consistent with our other access logs, we don't want to create an audit record for an object that wasn't actually accessed by the user. You could account for these cases with the following change:

Suggested change
ids.forEach((id) =>
this.auditLogger?.log(
connectorAuditEvent({
action: ConnectorAuditAction.GET,
savedObject: { type: 'action', id },
})
)
);
bulkGetResult.saved_objects.forEach(({ id, error }) => {
if (!error && this.auditLogger) {
this.auditLogger.log(
connectorAuditEvent({
action: ConnectorAuditAction.GET,
savedObject: { type: 'action', id },
})
);
}
});

Note, we should make the same change in x-pack/plugins/security/server/saved_objects/secure_saved_objects_client_wrapper.ts too.

Comment on lines 69 to 75
const canSpecifyID =
(options.overwrite && options.version) || SavedObjectsUtils.isRandomId(options.id);
if (options.id && !canSpecifyID) {
throw new Error(
`Predefined IDs are not allowed for encrypted saved objects of type "${type}".`
'Predefined IDs are not allowed for saved objects with encrypted attributes, unless the ID has been generated using `SavedObjectsUtils.generateId`.'
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the same code is used in two different places, could be abstracted out to a function like

private throwIfIdIsNotAllowed(id: string | undefined, overwrite: boolean, version: string) {
  const canSpecifyID = (overwrite && version) || SavedObjectsUtils.isRandomId(id);
  if (id && !canSpecifyID) {
    throw new Error(
      'Predefined IDs are not allowed for saved objects with encrypted attributes, unless the ID has been generated using `SavedObjectsUtils.generateId`.'
    );
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a getValidId method that handles all that logic and also generates an id if none has been specified:
459e0c4#diff-06a76df7e9f18756a4bcd62574dd9d6fd505a9ed7c8f275a1bdabd52a81a4bb6R298-R320

Comment on lines +78 to +80
public static generateId() {
return uuid.v1();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should use uuid.v4() here instead, considering that the ESO wrapper now relies on this?

I know that you should not rely on UUIDs being hard to guess, but UUIDv1 is very predictable where UUIDv4 is not. I don't see any practical benefit to using v1 over v4.

@pgayvallet do you have any thoughts on this either way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we already had this discussion a while ago (probably with @rudolf), and I can't exactly remember the outcome, but to restart the conversation:

  1. Objectively, v4 is better than v1
  2. Still, I think v1 is good enough, it 'just work (tm)', so there is no urge to change it (or do you see one?)
  3. I don't think we should switch from one to the other during a minor
  4. Even if we do, it should probably not be in this PR

Do you want me to open an issue to discuss this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I think v1 is good enough, it 'just work (tm)', so there is no urge to change it (or do you see one?)

My only justification is that the ESO wrapper used to generate its own IDs upon object creation, and it used v4. Now it's relying on the core generateId function which uses v1.

I think it's worth discussing further. I'm curious to know why we're hesitant to change to v4 in a minor.

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +78 to +80
public static generateId() {
return uuid.v1();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we already had this discussion a while ago (probably with @rudolf), and I can't exactly remember the outcome, but to restart the conversation:

  1. Objectively, v4 is better than v1
  2. Still, I think v1 is good enough, it 'just work (tm)', so there is no urge to change it (or do you see one?)
  3. I don't think we should switch from one to the other during a minor
  4. Even if we do, it should probably not be in this PR

Do you want me to open an issue to discuss this change?

Comment on lines 82 to 90
/**
* Validates that a saved object ID matches UUID format.
*
* @param {string} id The ID of a saved object.
* @todo Use `uuid.validate` once upgraded to v5.3+
*/
public static isRandomId(id: string | undefined) {
return typeof id === 'string' && UUID_REGEX.test(id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this naming is misleading. We are asserting that the id matches the uuid format, not that is was randomly generated (which is properly explained in the tsdoc, but not really reflected on the method name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the isRandomId method is to allow plugin developers to assert whether an ID has been randomly generated or whether it's a static ID. In order to do so the function checks whether a string matches UUID format but that's an implementation detail that consumers don't need to be aware of in the same way that they don't need to know that generateId generates a UUID.

Would isRandomlyGeneratedId be more explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would just be a rephrasing.

In that case, change Validates that a saved object ID matches UUID format. to Validates that a saved object ID has been randomly generated

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great stuff!

@thomheymann thomheymann merged commit f413957 into master Dec 4, 2020
@thomheymann thomheymann deleted the alerting/audit-logging branch December 4, 2020 19:13
thomheymann added a commit that referenced this pull request Dec 5, 2020
* ECS audit events for alerts plugin

* added api changes

* fixed linting and testing errors

* fix test

* Fixed linting errors after prettier update

* Revert "Allow predefined ids for encrypted saved objects (#83482)"

This reverts commit 7d929fe.

* Added suggestions from code review

* Fixed unit tests

* Added suggestions from code review

* Changed names of alert events

* Changed naming as suggested in code review

* Added suggestions from PR

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

kibanamachine commented Jan 15, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:enhancement Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ECS audit events for alerts plugin Write audit log entries for action and alert CRUD requests