Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent legacy url conflicts #116007

Merged

Conversation

jportner
Copy link
Contributor

@jportner jportner commented Oct 21, 2021

Resolves #113335.

See commit messages for details, this involved some refactoring so it may be easier to review on a per-commit basis.

I changed the SOR create/bulkCreate methods to use a new preflightCheckForCreate function. From the TS docs:

Conducts pre-flight checks before object creation. Consumers should only check eligible objects (multi-namespace types). For each object that the consumer intends to create, we check for three potential error cases in all applicable spaces:

  1. 'aliasConflict' - there is already an alias that points to a different object.
  2. 'unresolvableConflict' - this object already exists in a different space and it cannot be overwritten with the given parameters.
  3. 'conflict' - this object already exists (and the given options include overwrite=false).

Objects can be created in 1-N spaces, and for each object+space permutation we need to check if a legacy URL alias exists. This function attempts to optimize by defining an "alias threshold"; if we need to check for more aliases than that threshold, instead of attempting to bulk-get each one, we find (search for) them. This is intended to strike an acceptable balance of performance, and is necessary when creating objects in "*" (all current and future spaces) because we don't want to attempt to enumerate all spaces here.

TODO: address upsert in update and incrementCounter (see #116007 (comment)) Edit: done in this PR

In creating this PR, I identified a few other situations where a legacy URL alias conflict scenario could be created, which this PR does not account for. TODO (in a follow up bugfix PR):

  • Delete aliases when objects are deleted?
  • Delete aliases when objects are unshared?

^ Basically if there is ever an object change that causes an alias target to point to nothing, we should probably just delete the alias. That's what I 'm thinking right now. Update: tracking this in a separate issue, #116235

Move/rename several functions out of collect_multi_namespace_references:
 * getKey -> getObjectKey
 * parseKey -> parseObjectKey
 * checkLegacyUrlAliases -> findLegacyUrlAliases
All of these are now available for other internal consumers. Also made
minor type changes for brevity.
@jportner jportner added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting labels Oct 21, 2021
@jportner jportner requested a review from legrego October 21, 2021 19:32
Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review, overall this is looking good, and the unit test coverage is great. The logic is a bit hard to follow, especially with the amount of isLeft and isRight logic branches we have here. I had to pick this up and put it down a couple of times to fully understand what was going on.

I noticed that we didn't change the upsert functionality within the repository's update function -- is that intentional, or should we also be preventing legacy URL conflicts when an upsert request results in a create?

Had to make a small tweak to preflightCheckForCreate because the
existing implementation caused a 500 error. Since the bulk-get call is
optimized to only retrieve the `legacy-url-alias.disabled` field of the
raw alias documents, if the `disabled` field does not exist in
the raw document then the entire source object of the result is empty.
@jportner jportner force-pushed the issue-113335-prevent-legacy-url-conflicts branch from 50e0c52 to 286d3ff Compare October 25, 2021 08:18
@jportner jportner marked this pull request as ready for review October 25, 2021 08:22
@jportner jportner requested review from a team as code owners October 25, 2021 08:22
@jportner
Copy link
Contributor Author

I noticed that we didn't change the upsert functionality within the repository's update function -- is that intentional, or should we also be preventing legacy URL conflicts when an upsert request results in a create?

image

Yeah, I just fixed an upsert bug last week, not sure how I overlooked this. I think we should change update and incrementCounter, both of which provide upsert functionality.

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good for now.

I guess it's not totally done though, given #116007 (comment)?

Comment on lines +52 to +54
} catch (e) {
error = e;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we accumulate the errors, or exit the loop early when an error is encountered? ATM only the last error will be reported, but we're not failing fast when encountering errors, which seems wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow your concern. The whole loop is enclosed in the try/catch, so one error will break the loop. The only reason we have this catch is so that we can attempt to close the finder if we encountered an error. I don't think we need an accumulator, we just need to close the finder and throw the error that we encountered.

Can you clarify with a diff of your suggested changes?

*/
export async function findLegacyUrlAliases(
createPointInTimeFinder: CreatePointInTimeFinderFn,
objects: Array<{ type: string; id: string }>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should we re-use SavedObjectsCollectMultiNamespaceReferencesObject or a similar type? We have 99 { type: string; id: string } type definitions, and the same amount of inlined { type: string; id: string }.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is used for more than collectMultiNamespaceReferences now. I can add a new type for this, though.

Just out of curiosity, are these type definitions a performance issue? Or just the principle of the thing?

Comment on lines +251 to +253
export function getObjectKey({ type, id }: { type: string; id: string }) {
return `${type}:${id}`;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 (I know this was already present)

export const getObjKey = (obj: SavedObject) => `${obj.type}|${obj.id}`;

Same logic, different separator.

Copy link
Contributor Author

@jportner jportner Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 86ebd8c

Edit: this change was causing tests to fail, I reverted it in 327de44. I think this is fine to leave as-is.

Comment on lines +44 to +47
export type CreatePointInTimeFinderFn = <T = unknown, A = unknown>(
findOptions: SavedObjectsCreatePointInTimeFinderOptions
) => ISavedObjectsPointInTimeFinder<T, A>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see this used in added code. I wonder if we have existing code that should use this new type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I have already changed collect_multi_namespace_references.ts to use it too. I looked around and didn't see anything else that looks like it could use this type.

Copy link
Contributor

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest question is around the performance here. I kind of doubt that we have places in Kibana that create large batches across many namespaces. So I assume performance isn't critical, but I think it's still worth understanding the worst case performance.

* If the object does not exceed this threshold, we use mget to fetch aliases instead.
* This threshold is a bit arbitrary, but it is intended to optimize so we don't make expensive PIT/find operations unless it is necessary.
*/
const FIND_ALIASES_THRESHOLD = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably introduce a limit to the bulk create API to make it easier to reason about the possible performance.

If we assume a limit of 10k saved objects per batch and all (or the majority) of objects are only created in 2 spaces it would create an mget with number_of_spaces*object_count + object_count so trying to get 30K _id's which feels very inefficient, but maybe it's anyway performant.

If this bulk create has just one object with 4 namespaces we will also create a PIT and do a find in addition to this large mget.

Since the performance tradeoff is likely between how large the mget request becomes vs the time to create a PIT and page through results I think we should select just one strategy per batch and the heuristic should be the number of _id's in the mget for the entire bulk instead of the number of spaces per object.

So we would loop through all objects and sum up the amount of spaces that each object will be created in. And if we end up with e.g. > 1000 potential legacy url alias id's that need to be mget'd we switch to using PIT/find (then we could try to profile it and see if 1000 is the right threshold or not)

If there's one object with a * namespace we could let the entire batch fall back to PIT/find.

(this might also simplify the code since we're not having to create so many discriminated unions with a left and right that has a different meaning in different contexts)

Copy link
Contributor Author

@jportner jportner Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is just a first stab at optimizing performance and I agree it surely can be improved. I went into this with the primary assumption that it's generally more performant to fetch aliases with mget than find.

I guess the big variable here is, how many aliases do we actually have?

Here's my concern:
Assume that we have Kibana 7.16 with 1k objects in each of 100 spaces (100k objects total). You upgrade Kibana to 8.0, which means you also have 99k aliases.
If you do a bulkCreate for 1k objects, and it includes just one object with 4 namespaces:

  • With the current implementation: we create a PIT and do a find for one alias, then do mget for 1999 documents (1k objects + 999 aliases)
  • With your suggestion: we create a PIT and do a find for all aliases, we have to page through 99k search results, then do mget for 1000 documents

Aside from that: when using find, we have to add a new compound query clause for each object. I wonder if Elasticsearch could handle a single find with 1k or 10k such clauses? I haven't done any testing, I guess we could find out.


Given that most object types aren't shareable yet, I'm thinking we don't have to worry about these performance edge cases just yet. Do you feel comfortable with me/us tackling this in a follow-on (perhaps as part of #113743)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah happy to defer performance to a follow-up.

Consumers can now define how many aliases they want to search for per
page:
* collectMultiNamespaceReferences uses 100 because it's less important
* preflightCheckForCreate uses 1000 because it's more important and it
  could be paging through potentially many thousands of results
The current implementation causes an extra round trip to ES if a
new object would be created, but given the infrequency of the
affected functions and our current time constraints, it seems
acceptable because it is the simplest path forward.
@jportner
Copy link
Contributor Author

I updated this to include an alias check when an upsert would create a new object (during update or incrementCounter). Includes unit tests, going to work on integration tests now but I think this is ready for another round of reviews.

Comment on lines +1268 to +1273
if (upsert && preflightResult.checkResult === 'not_found') {
// If an upsert would result in the creation of a new object, we need to check for alias conflicts too.
// This takes an extra round trip to Elasticsearch, but this won't happen often.
// TODO: improve performance by combining these into a single preflight check
await this.preflightCheckForUpsertAliasConflict(type, id, namespace);
}
Copy link
Contributor Author

@jportner jportner Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like having an extra preflight check, but I think that it's reasonable for now given our timetable and the low frequency of upsert operations. I'd like to revisit this as part of #113743 too.

It turns out this is exposed to external consumers and caused some
integration tests to fail. The unrelated saved objects service getObjKey
function is internal only and uses a different delimiter.
Also updated dev docs.
@jportner jportner enabled auto-merge (squash) October 26, 2021 15:38
@jportner jportner merged commit 3c9a656 into elastic:master Oct 26, 2021
@kibanamachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Test Failures

  • [job] [logs] OSS Misc Functional Tests / Saved Objects Management saved objects management with hidden types Delete modal should display a warning then trying to delete hidden saved objects

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jportner jportner deleted the issue-113335-prevent-legacy-url-conflicts branch October 26, 2021 16:16
@jportner
Copy link
Contributor Author

I believe the flaky test above is unrelated to the changes in this PR.

Instead, it looks like it could be related to this open issue: #115303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent creating legacy URL conflicts
5 participants