Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instructions for the configuration on MacOS with llama.cpp #211

Closed
a-rbts opened this issue Apr 15, 2024 · 3 comments
Closed

Instructions for the configuration on MacOS with llama.cpp #211

a-rbts opened this issue Apr 15, 2024 · 3 comments

Comments

@a-rbts
Copy link

a-rbts commented Apr 15, 2024

Greetings, and thanks for your hard work! I am trying to setup the extension properly as instructed in the README.md, but it does not seem the UI matches what's described there.
I am running llama.cpp, server, which offers an OpenAI compliant api.

  1. my first question would be to know whether I need to run two instances, one with an instruct version of the model for chat and one with the base version for FIM, or whether the extension uses the same model.
    The instructions advise for deepseek BASE for chat, but also deepseek base for completion with a good GPU. I am not sure why the chat function requires a base model.
  1. From the top ⚙️ icon open the settings page and in the Api Provider panel change from ollama to llamacpp (or others respectively).

When I open the side panel and chooses the configuration, there is no Api Provider, instead, there are just fields for the Ollama Hostname and Ollama API Port, but I am not using ollama (screenshot below). How/Where can we select llama.cpp as per instructions?
Screenshot 2024-04-15 at 11 48 04
3. Eventually (and maybe related issue), in the side panel, clicking on the robot emoji shows two dropdown boxes for chat and FIM, but only one option is present there, and is tagged "ollama" (screenshot below). No other possibility is displayed.
Screenshot 2024-04-15 at 11 53 08

@rjmacarthy
Copy link
Collaborator

Hello, thanks for the interest, please allow me to answer your questions from a personal perspective.

  1. I usually run two different models code and instruct or base and instruct because they give better results as the models are trained for specific tasks. I think you might be able to only run codellama:7b but depending on the api it might perform bad on one or the other task (chat, fim) for a number of reasons sometimes because the prompt templates differ or are automatically formatted by some providers.

  2. The ollama settings in the settings menu only point towards ollma so that I can fetch the models from the api, that is really its only purpose. Providers should be added under the menu here.
    image

This UI allows you to setup different providers and you can switch between them in the model selection chat interface.

Hope that helps!

@a-rbts
Copy link
Author

a-rbts commented Apr 15, 2024

Great thanks for the explanation I totally missed the providers menu, but this is what I was looking for.
On the other hand, it doesn't seem to work well with llama.cpp. The chat doesn't work if the provider field is set to be "lamacpp" but it works perfectly when selecting "oobabooga" (but still using llamacpp server as the api provider). It seems it has to do with the message format. Not sure why there is anything different between both providers in the configuration, it seems to be incorrect.
On the other hand, I could not get FIM work get with either llamacpp or oobabooga. llamacpp receives requests but doesn't answer anything as queries seem to be malformed (using deepseek coder base). With oobabooga, the server doesn't seem to receive requests at all even with the right provider and port selected.
I will be able to investigate this now that I am able to get something, so closing the issue.

@a-rbts a-rbts closed this as completed Apr 15, 2024
@a-rbts
Copy link
Author

a-rbts commented Apr 16, 2024

Adding more information here:

  • FIM is broken for some requests with deepseek coder but this seems to be due to this bug in llama.cpp rather than a problem with twinny. I suspect most of the supported backends like ollama would fail too since they rely on llama.cpp. A mitigation is to re-quantize the model with a recent version of llama.cpp.

  • Chat with llama.cpp backend does not work when selecting llama.cpp as provider, but works when selecting (for example ) oobabooga. Since it relies on the OpenAI standard, I am not sure why providers should behave differently (apart from maybe setting up default api paths), but they seem to do and it seems that's incorrect for the llama.cpp backend.

Hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants