Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - Ollama Embeddings #639

Closed
wants to merge 0 commits into from

Conversation

kaushikabhishek87
Copy link

@kaushikabhishek87 kaushikabhishek87 commented Jul 21, 2024

Hi, thank you for all the work around guardrails, I am using NeMo in my project & it is really awesome. This is my first contribution to the repo, happy to learn. I am using Ollama as LLM with NeMo and also wanted to use Ollama Embeddings for embedding part.

This PR facilitates using Ollama Embeddings as a choice within embeddings models. With this PR users can use below structure with Ollama

models:
  - type: embeddings
    engine: OllamaEmbed
    model: nomic-embed-text
    parameters:
        base_url: http://0.0.0.0:11434

@kaushikabhishek87
Copy link
Author

Hi can anyone trigger the workflow & review

Copy link
Collaborator

@drazvan drazvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍. A small suggestion.
You should update the docs here: https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/configuration-guide.md#supported-embedding-providers to mention Ollama as well.

And last but not least, you should rebase on top for develop to sign all the commits.

Thanks!

nemoguardrails/embeddings/providers/__init__.py Outdated Show resolved Hide resolved
@kaushikabhishek87
Copy link
Author

@drazvan thanks for reviewing & comments, really appreciate that. Regarding commit sign, seems like something off at my end, note sure what, some commits are getting signed, on rebase also getting some errors. Will try again or if push come to shove, will close this PR (& other one too) & try to create a fresh PR from proper branch having signed commits.

@kaushikabhishek87
Copy link
Author

kaushikabhishek87 commented Aug 20, 2024

@drazvan please ignore review request, erroneously triggered

@kaushikabhishek87
Copy link
Author

kaushikabhishek87 commented Aug 22, 2024

Hi @drazvan is it okay if I close thie PR & this #666 to create fresh ones, I am able to resolve most of the comments but having issues with "signing commits". I will create fresh PR with properly signed commits.

@drazvan
Copy link
Collaborator

drazvan commented Aug 22, 2024

@kaushikabhishek87 : yes, feel free to create a new one and link these in the comment for reference.

@Pouyanpi
Copy link
Collaborator

Hi @kaushikabhishek87,

I noticed that you are currently using base_url as a parameter in the __init__ method. While this works for now, I suggest using **kwargs to allow for greater flexibility and future proofing.

By using **kwargs, we can easily accommodate additional parameters that might be needed for different models without having to modify the method signature each time. This makes the code more maintainable and extensible.

Here's an example of how you can implement this:

class BasicEmbeddingsIndex(EmbeddingsIndex):
    def __init__(
        self,
        embedding_model: str,
        embedding_engine: str,
        search_threshold: float = None,
        cache_config: Union[Dict, EmbeddingsCacheConfig] = None,
        use_batching: bool = False,
        max_batch_size: int = 10,
        max_batch_hold: float = 0.01,
        **kwargs,
    ):
        """Initialize the BasicEmbeddingsIndex."""
        self.embedding_model = embedding_model
        self.embedding_engine = embedding_engine
        self._embedding_size = 0
        self.search_threshold = search_threshold or float("inf")
        self._parameters = kwargs
        if isinstance(cache_config, Dict):
            self._cache_config = EmbeddingsCacheConfig(**cache_config)
        else:
            self._cache_config = cache_config

    def _init_model(self):
        """Initialize the model used for computing the embeddings."""
        self._model = init_embedding_model(
            embedding_model=self.embedding_model,
            embedding_engine=self.embedding_engine,
            **self._parameters,
        )

Additionally, we should update the __init__ method of all EmbeddingModel concrete classes to accept **kwargs. Here's an example for the OpenAIEmbeddingModel class:

class OpenAIEmbeddingModel(EmbeddingModel):
    """Embedding model using OpenAI API.

    Args:
        embedding_model (str): The name of the embedding model.

    Attributes:
        model (str): The name of the embedding model.
        embedding_size (int): The size of the embeddings.

    Methods:
        encode: Encode a list of documents into embeddings.
    """

    engine_name = "openai"

    def __init__(
        self,
        embedding_model: str,
        **kwargs,
    ):
    # do nothing with kwargs 

This way, any additional parameters can be passed through kwargs and handled appropriately.

Let me know what you think! Please let me know if I'm overlooking or missing something.

Also if it is something that will take much of your time, we can do it later in a separate PR. Thanks!

@drazvan what do you think?

@kaushikabhishek87
Copy link
Author

@Pouyanpi thanks for the review & comments, make sense to me. I am closing this PR as per above discussion in comments & will create a fresh one. I will include suggested changes as this conversation unfolds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants