Skip to content

Commit

Permalink
Range query support (#55)
Browse files Browse the repository at this point in the history
Implement range query support. PR includes:


- [x] new `RangeQuery` class
- [x] updated tests
- [x] updated docs, readme, and doc strings

---------

authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
co-authored-by: Sam Partee <sam.partee@redis.com>
  • Loading branch information
Sam Partee authored Aug 29, 2023
1 parent 57e5c89 commit e5c7579
Show file tree
Hide file tree
Showing 4 changed files with 385 additions and 65 deletions.
177 changes: 163 additions & 14 deletions docs/user_guide/hybrid_queries_02.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Complex Queries\n",
"# Query\n",
"\n",
"In this notebook, we will explore more complex queries that can be performed with ``redisvl``\n",
"\n",
Expand Down Expand Up @@ -95,8 +95,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32m19:55:11\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
"\u001b[32m19:55:11\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_index\n"
"\u001b[32m17:09:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
"\u001b[32m17:09:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_index\n"
]
}
],
Expand All @@ -120,7 +120,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Executing Hybrid Queries\n",
"## Hybrid Queries\n",
"\n",
"Hybrid queries are queries that combine multiple types of filters. For example, you may want to search for a user that is a certain age, has a certain job, and is within a certain distance of a location. This is a hybrid query that combines numeric, tag, and geographic filters."
]
Expand Down Expand Up @@ -544,6 +544,155 @@
"result_print(index.query(v))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filter Queries\n",
"\n",
"In some cases, you may not want to run a vector query, but just use a ``FilterExpression`` similar to a SQL query. The ``FilterQuery`` class enable this functionality. It is similar to the ``VectorQuery`` class but soley takes a ``FilterExpression``."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><tr><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr><tr><td>taimur</td><td>low</td><td>15</td><td>CEO</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from redisvl.query import FilterQuery\n",
"\n",
"has_low_credit = Tag(\"credit_score\") == \"low\"\n",
"\n",
"filter_query = FilterQuery(\n",
" return_fields=[\"user\", \"credit_score\", \"age\", \"job\", \"location\"],\n",
" filter_expression=has_low_credit\n",
")\n",
"\n",
"results = index.query(filter_query)\n",
"\n",
"result_print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Range Queries\n",
"\n",
"Range Queries are a useful method to perform a vector search where only results within a vector ``distance_threshold`` are returned. This enables the user to find all records within their dataset that are similar to a query vector where \"similar\" is defined by a quantitative value."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr><tr><td>0</td><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr><tr><td>0.109129190445</td><td>tyler</td><td>high</td><td>100</td><td>engineer</td></tr><tr><td>0.158809006214</td><td>tim</td><td>high</td><td>12</td><td>dermatologist</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from redisvl.query import RangeQuery\n",
"\n",
"range_query = RangeQuery(\n",
" vector=[0.1, 0.1, 0.5],\n",
" vector_field_name=\"user_embedding\",\n",
" return_fields=[\"user\", \"credit_score\", \"age\", \"job\", \"location\"],\n",
" distance_threshold=0.2\n",
")\n",
"\n",
"# same as the vector query or filter query\n",
"results = index.query(range_query)\n",
"\n",
"result_print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also change the distance threshold of the query object between uses if we like. Here we will set ``distance_threshold==0.1``. This means that the query object will return all matches that are within 0.1 of the query object. This is a small distance, so we expect to get fewer matches than before."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr><tr><td>0</td><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"range_query.set_distance_threshold(0.1)\n",
"\n",
"result_print(index.query(range_query))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Range queries can also be used with filters like any other query type. The following limits the results to only include records with a ``job`` of ``engineer`` while also being within the vector range (aka distance)."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"is_engineer = Text(\"job\") == \"engineer\"\n",
"\n",
"range_query.set_filter(is_engineer)\n",
"\n",
"result_print(index.query(range_query))"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -559,7 +708,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 23,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -598,7 +747,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 24,
"metadata": {},
"outputs": [
{
Expand All @@ -607,7 +756,7 @@
"'@credit_score:{high}'"
]
},
"execution_count": 20,
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -620,17 +769,17 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'id': 'v1:dc45946a8bc74f47858617c91d593b43', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\\x00\\x00\\x00?'}\n",
"{'id': 'v1:5c628fdfbba247c6843955de04e3a00c', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\\x00\\x00\\x00?'}\n",
"{'id': 'v1:4f1cb6dd167149d59c9c108e09407fc9', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\\x00\\x00\\x00?'}\n",
"{'id': 'v1:f1720dbeb81c4316bedf21ca60357fdf', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\\x00\\x00\\x00?'}\n"
"{'id': 'v1:d78adb45342c4404a9c40afd4e65f51b', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\\x00\\x00\\x00?'}\n",
"{'id': 'v1:a0a202b6398840c5ab2263b1fd4e704a', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\\x00\\x00\\x00?'}\n",
"{'id': 'v1:1f3b15dfb4ed490186859c1b2cb3df82', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\\x00\\x00\\x00?'}\n",
"{'id': 'v1:465de540d9d54501b09b8e47a0116620', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\\x00\\x00\\x00?'}\n"
]
}
],
Expand All @@ -653,7 +802,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 26,
"metadata": {},
"outputs": [
{
Expand All @@ -662,7 +811,7 @@
"'((@credit_score:{high} @age:[18 +inf]) @age:[-inf 100])=>[KNN 10 @user_embedding $vector AS vector_distance] RETURN 6 user credit_score age job office_location vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 10'"
]
},
"execution_count": 22,
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
7 changes: 2 additions & 5 deletions redisvl/query/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from redisvl.query.query import FilterQuery, VectorQuery
from redisvl.query.query import FilterQuery, VectorQuery, RangeQuery

__all__ = [
"VectorQuery",
"FilterQuery",
]
__all__ = ["VectorQuery", "FilterQuery", "RangeQuery"]
Loading

0 comments on commit e5c7579

Please sign in to comment.