Prevent legacy url conflicts #116007

jportner · 2021-10-21T19:32:40Z

See commit messages for details, this involved some refactoring so it may be easier to review on a per-commit basis.

I changed the SOR create/bulkCreate methods to use a new preflightCheckForCreate function. From the TS docs:

Conducts pre-flight checks before object creation. Consumers should only check eligible objects (multi-namespace types). For each object that the consumer intends to create, we check for three potential error cases in all applicable spaces:

'aliasConflict' - there is already an alias that points to a different object.

'unresolvableConflict' - this object already exists in a different space and it cannot be overwritten with the given parameters.

'conflict' - this object already exists (and the given options include overwrite=false).

Objects can be created in 1-N spaces, and for each object+space permutation we need to check if a legacy URL alias exists. This function attempts to optimize by defining an "alias threshold"; if we need to check for more aliases than that threshold, instead of attempting to bulk-get each one, we find (search for) them. This is intended to strike an acceptable balance of performance, and is necessary when creating objects in "*" (all current and future spaces) because we don't want to attempt to enumerate all spaces here.

~~TODO: address upsert in update and incrementCounter (see #116007 (comment))~~ Edit: done in this PR

In creating this PR, I identified a few other situations where a legacy URL alias conflict scenario could be created, which this PR does not account for. TODO (in a follow up bugfix PR):

~~Delete aliases when objects are deleted?~~
~~Delete aliases when objects are unshared?~~

~~^ Basically if there is ever an object change that causes an alias target to point to nothing, we should probably just delete the alias. That's what I 'm thinking right now.~~ Update: tracking this in a separate issue, #116235

Move/rename several functions out of collect_multi_namespace_references: * getKey -> getObjectKey * parseKey -> parseObjectKey * checkLegacyUrlAliases -> findLegacyUrlAliases All of these are now available for other internal consumers. Also made minor type changes for brevity.

legrego

Preliminary review, overall this is looking good, and the unit test coverage is great. The logic is a bit hard to follow, especially with the amount of isLeft and isRight logic branches we have here. I had to pick this up and put it down a couple of times to fully understand what was going on.

I noticed that we didn't change the upsert functionality within the repository's update function -- is that intentional, or should we also be preventing legacy URL conflicts when an upsert request results in a create?

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

src/core/server/saved_objects/service/lib/repository.ts

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

src/core/server/saved_objects/service/lib/find_legacy_url_aliases.ts

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

Had to make a small tweak to preflightCheckForCreate because the existing implementation caused a 500 error. Since the bulk-get call is optimized to only retrieve the `legacy-url-alias.disabled` field of the raw alias documents, if the `disabled` field does not exist in the raw document then the entire source object of the result is empty.

jportner · 2021-10-25T08:31:46Z

I noticed that we didn't change the upsert functionality within the repository's update function -- is that intentional, or should we also be preventing legacy URL conflicts when an upsert request results in a create?

Yeah, I just fixed an upsert bug last week, not sure how I overlooked this. I think we should change update and incrementCounter, both of which provide upsert functionality.

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

docs/api/saved-objects/bulk_create.asciidoc

docs/api/saved-objects/create.asciidoc

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

pgayvallet

Overall looking good for now.

I guess it's not totally done though, given #116007 (comment)?

pgayvallet · 2021-10-25T11:50:13Z

src/core/server/saved_objects/service/lib/find_legacy_url_aliases.ts

+  } catch (e) {
+    error = e;
+  }


Should we accumulate the errors, or exit the loop early when an error is encountered? ATM only the last error will be reported, but we're not failing fast when encountering errors, which seems wrong?

I'm not sure I follow your concern. The whole loop is enclosed in the try/catch, so one error will break the loop. The only reason we have this catch is so that we can attempt to close the finder if we encountered an error. I don't think we need an accumulator, we just need to close the finder and throw the error that we encountered.

Can you clarify with a diff of your suggested changes?

pgayvallet · 2021-10-25T11:51:44Z

src/core/server/saved_objects/service/lib/find_legacy_url_aliases.ts

+ */
+export async function findLegacyUrlAliases(
+  createPointInTimeFinder: CreatePointInTimeFinderFn,
+  objects: Array<{ type: string; id: string }>


NIT: Should we re-use SavedObjectsCollectMultiNamespaceReferencesObject or a similar type? We have 99 { type: string; id: string } type definitions, and the same amount of inlined { type: string; id: string }.

Well this is used for more than collectMultiNamespaceReferences now. I can add a new type for this, though.

Just out of curiosity, are these type definitions a performance issue? Or just the principle of the thing?

pgayvallet · 2021-10-25T11:56:04Z

src/core/server/saved_objects/service/lib/internal_utils.ts

+export function getObjectKey({ type, id }: { type: string; id: string }) {
+  return `${type}:${id}`;
+}


🙈 (I know this was already present)

kibana/src/core/server/saved_objects/export/utils.ts

Line 13 in 4584a8b

export const getObjKey = (obj: SavedObject) => `${obj.type}|${obj.id}`;

Same logic, different separator.

Changed in 86ebd8c

Edit: this change was causing tests to fail, I reverted it in 327de44. I think this is fine to leave as-is.

src/core/server/saved_objects/service/lib/internal_utils.test.ts

pgayvallet · 2021-10-25T12:01:10Z

src/core/server/saved_objects/service/lib/point_in_time_finder.ts

+export type CreatePointInTimeFinderFn = <T = unknown, A = unknown>(
+  findOptions: SavedObjectsCreatePointInTimeFinderOptions
+) => ISavedObjectsPointInTimeFinder<T, A>;
+


I only see this used in added code. I wonder if we have existing code that should use this new type?

Good question! I have already changed collect_multi_namespace_references.ts to use it too. I looked around and didn't see anything else that looks like it could use this type.

rudolf

My biggest question is around the performance here. I kind of doubt that we have places in Kibana that create large batches across many namespaces. So I assume performance isn't critical, but I think it's still worth understanding the worst case performance.

rudolf · 2021-10-25T13:08:10Z

src/core/server/saved_objects/service/lib/preflight_check_for_create.ts

+ * If the object does not exceed this threshold, we use mget to fetch aliases instead.
+ * This threshold is a bit arbitrary, but it is intended to optimize so we don't make expensive PIT/find operations unless it is necessary.
+ */
+const FIND_ALIASES_THRESHOLD = 3;


We should probably introduce a limit to the bulk create API to make it easier to reason about the possible performance.

If we assume a limit of 10k saved objects per batch and all (or the majority) of objects are only created in 2 spaces it would create an mget with number_of_spaces*object_count + object_count so trying to get 30K _id's which feels very inefficient, but maybe it's anyway performant.

If this bulk create has just one object with 4 namespaces we will also create a PIT and do a find in addition to this large mget.

Since the performance tradeoff is likely between how large the mget request becomes vs the time to create a PIT and page through results I think we should select just one strategy per batch and the heuristic should be the number of _id's in the mget for the entire bulk instead of the number of spaces per object.

So we would loop through all objects and sum up the amount of spaces that each object will be created in. And if we end up with e.g. > 1000 potential legacy url alias id's that need to be mget'd we switch to using PIT/find (then we could try to profile it and see if 1000 is the right threshold or not)

If there's one object with a * namespace we could let the entire batch fall back to PIT/find.

(this might also simplify the code since we're not having to create so many discriminated unions with a left and right that has a different meaning in different contexts)

This implementation is just a first stab at optimizing performance and I agree it surely can be improved. I went into this with the primary assumption that it's generally more performant to fetch aliases with mget than find.

I guess the big variable here is, how many aliases do we actually have?

Here's my concern:
Assume that we have Kibana 7.16 with 1k objects in each of 100 spaces (100k objects total). You upgrade Kibana to 8.0, which means you also have 99k aliases.
If you do a bulkCreate for 1k objects, and it includes just one object with 4 namespaces:

With the current implementation: we create a PIT and do a find for one alias, then do mget for 1999 documents (1k objects + 999 aliases)

With your suggestion: we create a PIT and do a find for all aliases, we have to page through 99k search results, then do mget for 1000 documents

Aside from that: when using find, we have to add a new compound query clause for each object. I wonder if Elasticsearch could handle a single find with 1k or 10k such clauses? I haven't done any testing, I guess we could find out.

Given that most object types aren't shareable yet, I'm thinking we don't have to worry about these performance edge cases just yet. Do you feel comfortable with me/us tackling this in a follow-on (perhaps as part of #113743)?

Yeah happy to defer performance to a follow-up.

Consumers can now define how many aliases they want to search for per page: * collectMultiNamespaceReferences uses 100 because it's less important * preflightCheckForCreate uses 1000 because it's more important and it could be paging through potentially many thousands of results

The current implementation causes an extra round trip to ES if a new object would be created, but given the infrequency of the affected functions and our current time constraints, it seems acceptable because it is the simplest path forward.

jportner · 2021-10-25T19:43:31Z

I updated this to include an alias check when an upsert would create a new object (during update or incrementCounter). Includes unit tests, going to work on integration tests now but I think this is ready for another round of reviews.

jportner · 2021-10-25T19:47:05Z

src/core/server/saved_objects/service/lib/repository.ts

+      if (upsert && preflightResult.checkResult === 'not_found') {
+        // If an upsert would result in the creation of a new object, we need to check for alias conflicts too.
+        // This takes an extra round trip to Elasticsearch, but this won't happen often.
+        // TODO: improve performance by combining these into a single preflight check
+        await this.preflightCheckForUpsertAliasConflict(type, id, namespace);
+      }


I really don't like having an extra preflight check, but I think that it's reasonable for now given our timetable and the low frequency of upsert operations. I'd like to revisit this as part of #113743 too.

It turns out this is exposed to external consumers and caused some integration tests to fail. The unrelated saved objects service getObjKey function is internal only and uses a different delimiter.

Also updated dev docs.

kibanamachine · 2021-10-26T16:15:44Z

💛 Build succeeded, but was flaky

Test Failures

[job] [logs] OSS Misc Functional Tests / Saved Objects Management saved objects management with hidden types Delete modal should display a warning then trying to delete hidden saved objects

Metrics [docs]

✅ unchanged

History

💚 Build #1613 succeeded 393f714
💔 Build #1569 failed d3383bd
💛 Build #1287 was flaky 286d3ff
💔 Build #1253 failed 50e0c52
💛 Build #1106 was flaky f696125

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

jportner · 2021-10-26T16:20:09Z

I believe the flaky test above is unrelated to the changes in this PR.

Instead, it looks like it could be related to this open issue: #115303

jportner added 3 commits October 21, 2021 14:42

Refactor

1cb26fc

Move/rename several functions out of collect_multi_namespace_references: * getKey -> getObjectKey * parseKey -> parseObjectKey * checkLegacyUrlAliases -> findLegacyUrlAliases All of these are now available for other internal consumers. Also made minor type changes for brevity.

Add preflightCheckForCreate

8954672

Change repository to use preflightCheckForCreate

f696125

jportner added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting labels Oct 21, 2021

jportner requested a review from legrego October 21, 2021 19:32

legrego reviewed Oct 22, 2021

View reviewed changes

jportner added 4 commits October 24, 2021 22:30

Merge branch 'master' into issue-113335-prevent-legacy-url-conflicts

5b38264

Address preliminary PR review feedback

3850deb

Update dev docs

585e537

jportner force-pushed the issue-113335-prevent-legacy-url-conflicts branch from 50e0c52 to 286d3ff Compare October 25, 2021 08:18

jportner marked this pull request as ready for review October 25, 2021 08:22

jportner requested review from a team as code owners October 25, 2021 08:22

legrego reviewed Oct 25, 2021

View reviewed changes

pgayvallet reviewed Oct 25, 2021

View reviewed changes

rudolf reviewed Oct 25, 2021

View reviewed changes

jportner added 3 commits October 25, 2021 11:31

More PR review feedback

86ebd8c

Include alias checks for upserts

d3383bd

The current implementation causes an extra round trip to ES if a new object would be created, but given the infrequency of the affected functions and our current time constraints, it seems acceptable because it is the simplest path forward.

jportner requested review from rudolf, legrego and pgayvallet October 25, 2021 19:43

jportner commented Oct 25, 2021

View reviewed changes

jportner added 2 commits October 25, 2021 16:00

Merge branch 'master' into issue-113335-prevent-legacy-url-conflicts

0777f12

Revert change to export utils getObjKey delimiter

327de44

It turns out this is exposed to external consumers and caused some integration tests to fail. The unrelated saved objects service getObjKey function is internal only and uses a different delimiter.

Add update API integration tests

393f714

Also updated dev docs.

legrego approved these changes Oct 26, 2021

View reviewed changes

rudolf approved these changes Oct 26, 2021

View reviewed changes

pgayvallet approved these changes Oct 26, 2021

View reviewed changes

jportner added 2 commits October 26, 2021 09:23

Merge branch 'master' into issue-113335-prevent-legacy-url-conflicts

8137c87

Fix types

6c0f95a

jportner enabled auto-merge (squash) October 26, 2021 15:38

jportner merged commit 3c9a656 into elastic:master Oct 26, 2021

jportner deleted the issue-113335-prevent-legacy-url-conflicts branch October 26, 2021 16:16

jportner mentioned this pull request Nov 3, 2021

Investigate risk of performance regression from share-capable saved object types #113743

Closed

jportner mentioned this pull request Mar 22, 2022

Saved Object API calls that search for aliases can result in unhandled error #128282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent legacy url conflicts #116007

Prevent legacy url conflicts #116007

jportner commented Oct 21, 2021 •

edited

Loading

legrego left a comment

jportner commented Oct 25, 2021

pgayvallet left a comment

pgayvallet Oct 25, 2021

jportner Oct 25, 2021

pgayvallet Oct 25, 2021

jportner Oct 25, 2021

pgayvallet Oct 25, 2021

jportner Oct 25, 2021 •

edited

Loading

pgayvallet Oct 25, 2021

jportner Oct 25, 2021

rudolf left a comment

rudolf Oct 25, 2021

jportner Oct 25, 2021 •

edited

Loading

rudolf Oct 26, 2021

jportner commented Oct 25, 2021

jportner Oct 25, 2021 •

edited

Loading

kibanamachine commented Oct 26, 2021

jportner commented Oct 26, 2021

Prevent legacy url conflicts #116007

Prevent legacy url conflicts #116007

Conversation

jportner commented Oct 21, 2021 • edited Loading

legrego left a comment

Choose a reason for hiding this comment

jportner commented Oct 25, 2021

pgayvallet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jportner Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rudolf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jportner Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jportner commented Oct 25, 2021

jportner Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

kibanamachine commented Oct 26, 2021

💛 Build succeeded, but was flaky

Test Failures

Metrics [docs]

History

jportner commented Oct 26, 2021

jportner commented Oct 21, 2021 •

edited

Loading

jportner Oct 25, 2021 •

edited

Loading

jportner Oct 25, 2021 •

edited

Loading

jportner Oct 25, 2021 •

edited

Loading