Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the signature of vector script functions. #48260

Closed
wants to merge 2 commits into from

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Oct 18, 2019

NOTE: not ready for review. This PR is meant to help our discussion around the best way to deprecate + remove these functions. The straightforward deprecation path is difficult, because Painless doesn't allow having two methods defined with the same name and arity.

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

Before: cosineSimilarity(params.query_vector, doc['vector'])
After: cosineSimilarity(params.query_vector, 'vector')

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of doc['vector_field'] is a bit unclear.

It also allows us to avoid retrieving the vector doc values on every function
invocation. This can give a tiny speed-up, some example results on the
glove-100-angular dataset:

Algorithm                        Recall    QPS
EsBruteforce()                   1.000     5.390
EsBruteforce(cache_docvalues)    1.000     5.564

@jtibshirani jtibshirani added >breaking :Search/Search Search-related issues that do not fall into other categories >refactoring v8.0.0 v7.6.0 labels Oct 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jtibshirani jtibshirani added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Oct 18, 2019
@elastic elastic deleted a comment from elasticmachine Oct 18, 2019
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

Before: `cosineSimilarity(params.query_vector, doc['vector'])`
After: `cosineSimilarity(params.query_vector, 'vector')`

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['vector_field']` is a bit unclear.

It also allows us to avoid retrieving the vector doc values on every function
invocation. This can give a tiny speed-up, some example results on the
glove-100-angular dataset:

```
Algorithm                        Recall    QPS
EsBruteforce()                   1.000     5.390
EsBruteforce(cache_docvalues)    1.000     5.564
```
@jtibshirani
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample-matrix

@rjernst
Copy link
Member

rjernst commented Oct 25, 2019

I'm glad we are moving towards making the doc values lookup an implementation detail. However, this would be a hard break for all current users of cosineSimilarity. Could we possibly instead deprecate the current method and create a new method (completely different name)? Do we know there are sufficiently small numbers of this method that the break is "ok", or are we considering this break ok because vectors are experimental?

@jdconrad
Copy link
Contributor

jdconrad commented Oct 25, 2019

I echo @rjernst 's concerns here. We should follow a deprecation model in the same way as we do with anything in Elastic. There was some work to make this possible to deprecate whitelisted methods previously, but it still needs to be completed. So I would add a new method and deprecate this one in the docs to begin with, and we can talk about finishing the ability to deprecate whitelisted methods with an "annotation" on the whitelist.

@jtibshirani
Copy link
Contributor Author

@rjernst @jdconrad Sorry for the confusion, I should have marked this PR as 'draft/ up for discussion'. I agree we should try to follow the usual deprecation model as much as possible. We are discussing our options in the next search / kNN meeting and I'll loop back after that with a proposal.

@jtibshirani
Copy link
Contributor Author

After gathering some suggestions on how to handle the deprecation, I opened #48604 as a proposal. I'm closing this PR, it would be great to get your feedback on the new one.

@jtibshirani jtibshirani deleted the vector-functions branch October 29, 2019 00:13
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants