Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds semantic cache scoped access and additional functionality #180

Merged
merged 15 commits into from
Aug 1, 2024

Conversation

justin-cechmanek
Copy link
Collaborator

This PR is the first of a few PRs that adds to RedisVL’s existing semantic cache class more functionality around dropping and updating cache entries. It also adds scoped access control and meta data fields.

Args:
document_ids (Union[str, List[str]]): The document ID or IDs to remove from the cache.
"""
if isinstance(document_ids, List):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a similar theme to the work around clear(), to me this feels like an addition to that method by supporting an optional list of ids? by default clear removes everything. Maybe if this list is provided, instead of clearing everything it will just focus on using this pipeline code to clear the subset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of combining methods. What about something like

clear(keys: Optional[List[str], str]):
"""Remove all or specific entries from the cache by it's ID.
    Args:
        document_ids (Union[str, List[str]]): The document ID or IDs to remove from the cache. If none are provided then all documents are removed.
"""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this pipeline code into index.

@@ -16,6 +18,9 @@ class SemanticCache(BaseLLMCache):
entry_id_field_name: str = "id"
prompt_field_name: str = "prompt"
vector_field_name: str = "prompt_vector"
inserted_at_field_name: str = "inserted_at"
updated_at_field_name: str = "updated_at"
tag_field_name: str = "scope_tag"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to understand this data model a bit more. As it stands here, it looks like there would be 1 field for scoping (scope_tag) and filtering. What if I have both year, location, and user ID data that I'd like to filter my cache check on?

I think we might need to generalize a bit more and support a set of optional/customizable filterable fields (providable to the class on init??)

So if staying with Hash, the schema could be something like:

  • id
  • prompt
  • prompt_vector
  • response
  • inserted_at
  • updated_at
  • metadata
  • scope_tag???
  • ?

Just thinking out loud!

@justin-cechmanek justin-cechmanek added enhancement New feature or request feature labels Jul 10, 2024
redisvl/extensions/llmcache/semantic.py Outdated Show resolved Hide resolved
redisvl/extensions/llmcache/semantic.py Outdated Show resolved Hide resolved
redisvl/extensions/llmcache/semantic.py Outdated Show resolved Hide resolved
redisvl/extensions/llmcache/semantic.py Show resolved Hide resolved
redisvl/extensions/llmcache/semantic.py Outdated Show resolved Hide resolved
redisvl/extensions/llmcache/semantic.py Show resolved Hide resolved
Copy link
Collaborator

@tylerhutcherson tylerhutcherson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, but I will address it later in my PR this evening!

@@ -182,6 +194,14 @@ def delete(self) -> None:
index."""
self._index.delete(drop=True)

def drop(self, document_ids: Union[str, List[str]]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this, are we expecting users to pass the full redis key? or just the id portion (without prefix)? I think the terminology we try to use for this throughout the library is id when we are referring to the part without the prefix. So maybe we just use ids instead of document_ids?

@tylerhutcherson tylerhutcherson merged commit d1bd692 into main Aug 1, 2024
20 checks passed
@tylerhutcherson tylerhutcherson deleted the feat/RAAE-71/semantic-cache-updates branch August 1, 2024 00:24
@justin-cechmanek justin-cechmanek restored the feat/RAAE-71/semantic-cache-updates branch August 1, 2024 04:11
@justin-cechmanek justin-cechmanek deleted the feat/RAAE-71/semantic-cache-updates branch August 1, 2024 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants