Feature - Ollama Embeddings #639

kaushikabhishek87 · 2024-07-21T17:50:45Z

Hi, thank you for all the work around guardrails, I am using NeMo in my project & it is really awesome. This is my first contribution to the repo, happy to learn. I am using Ollama as LLM with NeMo and also wanted to use Ollama Embeddings for embedding part.

This PR facilitates using Ollama Embeddings as a choice within embeddings models. With this PR users can use below structure with Ollama

models:
  - type: embeddings
    engine: OllamaEmbed
    model: nomic-embed-text
    parameters:
        base_url: http://0.0.0.0:11434

kaushikabhishek87 · 2024-07-31T10:58:19Z

Hi can anyone trigger the workflow & review

drazvan

Looks good 👍. A small suggestion.
You should update the docs here: https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/configuration-guide.md#supported-embedding-providers to mention Ollama as well.

And last but not least, you should rebase on top for develop to sign all the commits.

Thanks!

nemoguardrails/embeddings/providers/__init__.py

kaushikabhishek87 · 2024-08-20T14:36:49Z

@drazvan thanks for reviewing & comments, really appreciate that. Regarding commit sign, seems like something off at my end, note sure what, some commits are getting signed, on rebase also getting some errors. Will try again or if push come to shove, will close this PR (& other one too) & try to create a fresh PR from proper branch having signed commits.

kaushikabhishek87 · 2024-08-20T14:53:49Z

@drazvan please ignore review request, erroneously triggered

kaushikabhishek87 · 2024-08-22T14:54:54Z

Hi @drazvan is it okay if I close thie PR & this #666 to create fresh ones, I am able to resolve most of the comments but having issues with "signing commits". I will create fresh PR with properly signed commits.

drazvan · 2024-08-22T15:16:11Z

@kaushikabhishek87 : yes, feel free to create a new one and link these in the comment for reference.

Pouyanpi · 2024-08-22T16:10:06Z

Hi @kaushikabhishek87,

I noticed that you are currently using base_url as a parameter in the __init__ method. While this works for now, I suggest using **kwargs to allow for greater flexibility and future proofing.

By using **kwargs, we can easily accommodate additional parameters that might be needed for different models without having to modify the method signature each time. This makes the code more maintainable and extensible.

Here's an example of how you can implement this:

class BasicEmbeddingsIndex(EmbeddingsIndex):
    def __init__(
        self,
        embedding_model: str,
        embedding_engine: str,
        search_threshold: float = None,
        cache_config: Union[Dict, EmbeddingsCacheConfig] = None,
        use_batching: bool = False,
        max_batch_size: int = 10,
        max_batch_hold: float = 0.01,
        **kwargs,
    ):
        """Initialize the BasicEmbeddingsIndex."""
        self.embedding_model = embedding_model
        self.embedding_engine = embedding_engine
        self._embedding_size = 0
        self.search_threshold = search_threshold or float("inf")
        self._parameters = kwargs
        if isinstance(cache_config, Dict):
            self._cache_config = EmbeddingsCacheConfig(**cache_config)
        else:
            self._cache_config = cache_config

    def _init_model(self):
        """Initialize the model used for computing the embeddings."""
        self._model = init_embedding_model(
            embedding_model=self.embedding_model,
            embedding_engine=self.embedding_engine,
            **self._parameters,
        )

Additionally, we should update the __init__ method of all EmbeddingModel concrete classes to accept **kwargs. Here's an example for the OpenAIEmbeddingModel class:

class OpenAIEmbeddingModel(EmbeddingModel):
    """Embedding model using OpenAI API.

    Args:
        embedding_model (str): The name of the embedding model.

    Attributes:
        model (str): The name of the embedding model.
        embedding_size (int): The size of the embeddings.

    Methods:
        encode: Encode a list of documents into embeddings.
    """

    engine_name = "openai"

    def __init__(
        self,
        embedding_model: str,
        **kwargs,
    ):
    # do nothing with kwargs

This way, any additional parameters can be passed through kwargs and handled appropriately.

Let me know what you think! Please let me know if I'm overlooking or missing something.

Also if it is something that will take much of your time, we can do it later in a separate PR. Thanks!

@drazvan what do you think?

kaushikabhishek87 · 2024-08-22T16:20:27Z

@Pouyanpi thanks for the review & comments, make sense to me. I am closing this PR as per above discussion in comments & will create a fresh one. I will include suggested changes as this conversation unfolds.

drazvan requested changes Aug 20, 2024

View reviewed changes

nemoguardrails/embeddings/providers/__init__.py Outdated Show resolved Hide resolved

kaushikabhishek87 requested a review from drazvan August 20, 2024 14:53

kaushikabhishek87 closed this Aug 22, 2024

kaushikabhishek87 force-pushed the develop branch from d094f34 to dc509f6 Compare August 22, 2024 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature - Ollama Embeddings #639

Feature - Ollama Embeddings #639

kaushikabhishek87 commented Jul 21, 2024 •

edited

Loading

kaushikabhishek87 commented Jul 31, 2024

drazvan left a comment

kaushikabhishek87 commented Aug 20, 2024

kaushikabhishek87 commented Aug 20, 2024 •

edited

Loading

kaushikabhishek87 commented Aug 22, 2024 •

edited

Loading

drazvan commented Aug 22, 2024

Pouyanpi commented Aug 22, 2024

kaushikabhishek87 commented Aug 22, 2024

Feature - Ollama Embeddings #639

Feature - Ollama Embeddings #639

Conversation

kaushikabhishek87 commented Jul 21, 2024 • edited Loading

kaushikabhishek87 commented Jul 31, 2024

drazvan left a comment

Choose a reason for hiding this comment

kaushikabhishek87 commented Aug 20, 2024

kaushikabhishek87 commented Aug 20, 2024 • edited Loading

kaushikabhishek87 commented Aug 22, 2024 • edited Loading

drazvan commented Aug 22, 2024

Pouyanpi commented Aug 22, 2024

kaushikabhishek87 commented Aug 22, 2024

kaushikabhishek87 commented Jul 21, 2024 •

edited

Loading

kaushikabhishek87 commented Aug 20, 2024 •

edited

Loading

kaushikabhishek87 commented Aug 22, 2024 •

edited

Loading