Deprecate the sparse_vector field type.

We have not seen much adoption of this experimental field type, and don't see a clear use case as it's currently designed. This PR deprecates the field type in 7.x. It will be removed from 8.0 in a follow-up PR.
elastic · Oct 21, 2019 · 1372639 · 1372639
1 parent 458de91
commit 1372639
Show file tree

Hide file tree

Showing 10 changed files with 193 additions and 78 deletions.
diff --git a/docs/reference/mapping/types/sparse-vector.asciidoc b/docs/reference/mapping/types/sparse-vector.asciidoc
@@ -6,6 +6,7 @@
 <titleabbrev>Sparse vector</titleabbrev>
 ++++
 
+deprecated[7.6, The `sparse_vector` type is deprecated and will be removed in 8.0.]
 experimental[]
 
 A `sparse_vector` field stores sparse vectors of float values.
@@ -38,7 +39,11 @@ PUT my_index
     }
   }
 }
+--------------------------------------------------
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
 
+[source,console]
+--------------------------------------------------
 PUT my_index/_doc/1
 {
   "my_text" : "text1",
@@ -50,8 +55,8 @@ PUT my_index/_doc/2
   "my_text" : "text2",
   "my_vector" : {"103": 0.5, "4": -0.5,  "5": 1, "11" : 1.2}
 }
-
 --------------------------------------------------
+// TEST[continued]
 
 Internally, each document's sparse vector is encoded as a binary
 doc value. Its size in bytes is equal to

diff --git a/docs/reference/vectors/vector-functions.asciidoc b/docs/reference/vectors/vector-functions.asciidoc
@@ -5,16 +5,14 @@
 
 experimental[]
 
-These functions are used for
-for <<dense-vector,`dense_vector`>>  and
-<<sparse-vector,`sparse_vector`>> fields.
-
 NOTE: During vector functions' calculation, all matched documents are
-linearly scanned. Thus, expect the query time grow linearly 
+linearly scanned. Thus, expect the query time grow linearly
 with the number of matched documents. For this reason, we recommend
 to limit the number of matched documents with a `query` parameter.
 
-Let's create an index with the following mapping and index a couple
+====== `dense_vector` functions
+
+Let's create an index with a `dense_vector` mapping and index a couple
 of documents into it.
 
 [source,console]
@@ -27,9 +25,6 @@ PUT my_index
         "type": "dense_vector",
         "dims": 3
       },
-      "my_sparse_vector" : {
-        "type" : "sparse_vector"
-      },
       "status" : {
         "type" : "keyword"
       }
@@ -40,21 +35,21 @@ PUT my_index
 PUT my_index/_doc/1
 {
   "my_dense_vector": [0.5, 10, 6],
-  "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
   "status" : "published"
 }
 
 PUT my_index/_doc/2
 {
   "my_dense_vector": [-0.5, 10, 10],
-  "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
   "status" : "published"
 }
 
+POST my_index/_refresh
+
 --------------------------------------------------
 // TESTSETUP
 
-For dense_vector fields, `cosineSimilarity` calculates the measure of
+The `cosineSimilarity` function calculates the measure of
 cosine similarity between a given query vector and document vectors.
 
 [source,console]
@@ -90,8 +85,8 @@ GET my_index/_search
 NOTE: If a document's dense vector field has a number of dimensions
 different from the query's vector, an error will be thrown.
 
-Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
-between a given query vector and document vectors.
+The `dotProduct` function calculates the measure of
+dot product between a given query vector and document vectors.
 
 [source,console]
 --------------------------------------------------
@@ -109,18 +104,24 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
+        "source": """
+          double value = dotProduct(params.query_vector, doc['my_dense_vector']);
+          return sigmoid(1, Math.E, -value); <1>
+        """,
         "params": {
-          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
+          "query_vector": [4, 3.4, -0.2]
         }
       }
     }
   }
 }
 --------------------------------------------------
 
-For dense_vector fields, `dotProduct` calculates the measure of
-dot product between a given query vector and document vectors.
+<1> Using the standard sigmoid function prevents scores from being negative.
+
+The `l1norm` function calculates L^1^ distance
+(Manhattan distance) between a given query vector and
+document vectors.
 
 [source,console]
 --------------------------------------------------
@@ -138,23 +139,28 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": """
-          double value = dotProduct(params.query_vector, doc['my_dense_vector']);
-          return sigmoid(1, Math.E, -value); <1>
-        """,
+        "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
         "params": {
-          "query_vector": [4, 3.4, -0.2]
+          "queryVector": [4, 3.4, -0.2]
         }
       }
     }
   }
 }
 --------------------------------------------------
 
-<1> Using the standard sigmoid function prevents scores from being negative.
+<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
+`l2norm` shown below represent distances or differences. This means, that
+the more similar the vectors are, the lower the scores will be that are
+produced by the `l1norm` and `l2norm` functions.
+Thus, as we need more similar vectors to score higher,
+we reversed the output from `l1norm` and `l2norm`. Also, to avoid
+division by 0 when a document vector matches the query exactly,
+we added `1` in the denominator.
 
-Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
-between a given query vector and document vectors.
+The `l2norm` function calculates L^2^ distance
+(Euclidean distance) between a given query vector and
+document vectors.
 
 [source,console]
 --------------------------------------------------
@@ -172,26 +178,77 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": """
-          double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
-          return sigmoid(1, Math.E, -value);
-        """,
-         "params": {
-          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
+        "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
+        "params": {
+          "queryVector": [4, 3.4, -0.2]
         }
       }
     }
   }
 }
 --------------------------------------------------
 
-For dense_vector fields, `l1norm` calculates L^1^ distance
-(Manhattan distance) between a given query vector and
-document vectors.
+NOTE: If a document doesn't have a value for a vector field on which
+a vector function is executed, an error will be thrown.
+
+You can check if a document has a value for the field `my_vector` by
+`doc['my_vector'].size() == 0`. Your overall script can look like this:
+
+[source,js]
+--------------------------------------------------
+"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
+--------------------------------------------------
+// NOTCONSOLE
+
+====== `sparse_vector` functions
+
+deprecated[7.6, The `sparse_vector` type is deprecated and will be removed in 8.0.]
+
+Let's create an index with a `sparse_vector` mapping and index a couple
+of documents into it.
 
 [source,console]
 --------------------------------------------------
-GET my_index/_search
+PUT my_sparse_index
+{
+  "mappings": {
+    "properties": {
+      "my_sparse_vector": {
+        "type": "sparse_vector"
+      },
+      "status" : {
+        "type" : "keyword"
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
+
+[source,console]
+--------------------------------------------------
+PUT my_sparse_index/_doc/1
+{
+  "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
+  "status" : "published"
+}
+
+PUT my_sparse_index/_doc/2
+{
+  "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
+  "status" : "published"
+}
+
+POST my_sparse_index/_refresh
+--------------------------------------------------
+// TEST[continued]
+
+The `cosineSimilaritySparse` function calculates cosine similarity
+between a given query vector and document vectors.
+
+[source,console]
+--------------------------------------------------
+GET my_sparse_index/_search
 {
   "query": {
     "script_score": {
@@ -205,31 +262,24 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
+        "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
         "params": {
-          "queryVector": [4, 3.4, -0.2]
+          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
         }
       }
     }
   }
 }
 --------------------------------------------------
+// TEST[continued]
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
 
-<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
-`l2norm` shown below represent distances or differences. This means, that
-the more similar the vectors are, the lower the scores will be that are
-produced by the `l1norm` and `l2norm` functions.
-Thus, as we need more similar vectors to score higher,
-we reversed the output from `l1norm` and `l2norm`. Also, to avoid
-division by 0 when a document vector matches the query exactly,
-we added `1` in the denominator.
-
-For sparse_vector fields, `l1normSparse` calculates L^1^ distance
+The `dotProductSparse` function calculates dot product
 between a given query vector and document vectors.
 
 [source,console]
 --------------------------------------------------
-GET my_index/_search
+GET my_sparse_index/_search
 {
   "query": {
     "script_score": {
@@ -243,23 +293,27 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
-        "params": {
-          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
+        "source": """
+          double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
+          return sigmoid(1, Math.E, -value);
+        """,
+         "params": {
+          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
         }
       }
     }
   }
 }
 --------------------------------------------------
+// TEST[continued]
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
 
-For dense_vector fields, `l2norm` calculates L^2^ distance
-(Euclidean distance) between a given query vector and
-document vectors.
+The `l1normSparse` function calculates L^1^ distance
+between a given query vector and document vectors.
 
 [source,console]
 --------------------------------------------------
-GET my_index/_search
+GET my_sparse_index/_search
 {
   "query": {
     "script_score": {
@@ -273,22 +327,24 @@ GET my_index/_search
         }
       },
       "script": {
-        "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
+        "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
         "params": {
-          "queryVector": [4, 3.4, -0.2]
+          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
         }
       }
     }
   }
 }
 --------------------------------------------------
+// TEST[continued]
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
 
-Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
+The `l2normSparse` function calculates L^2^ distance
 between a given query vector and document vectors.
 
 [source,console]
 --------------------------------------------------
-GET my_index/_search
+GET my_sparse_index/_search
 {
   "query": {
     "script_score": {
@@ -311,15 +367,5 @@ GET my_index/_search
   }
 }
 --------------------------------------------------
-
-NOTE: If a document doesn't have a value for a vector field on which
-a vector function is executed, an error will be thrown.
-
-You can check if a document has a value for the field `my_vector` by
-`doc['my_vector'].size() == 0`. Your overall script can look like this:
-
-[source,js]
---------------------------------------------------
-"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
---------------------------------------------------
-// NOTCONSOLE
+// TEST[continued]
+// TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]