Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Vector search #1792

Open
richiejp opened this issue Mar 4, 2024 · 6 comments
Open

feat: Vector search #1792

richiejp opened this issue Mar 4, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@richiejp
Copy link
Collaborator

richiejp commented Mar 4, 2024

Unless we have an Open Source LLM with a 1M+ token context length then we need vector search for the assistant API: #1273 (comment)

Even with a very large context length it is still far cheaper to use vector search with embeddings. It can all easily be done on CPU.

Implementation

I see three main options for adding vector search:

  1. Simple in-memory brute force search. We regenerate the embeddings instead of saving them to storage.
  2. Add one or more vector databases as a backend
  3. Connect to an external database

The first is easy to implement and doesn't have any upkeep because we flush everything after a restart. If we want to change chunking size or any hyperparameter it has the same cost doing a restart. There is plenty of prior art in Go:

Even implementing HNSW or Annoy would not be difficult. The main problems I see are the classic database issues. So I am in favor of doing 1. or 3. no in-between. Although saving embeddings to a flat file could be OK, just not on the first iteration.

I did make an experiment using BadgerDB, but talked myself out of it: https://github.com/richiejp/badger-cybertron-vector/blob/main/main.go. The problem is that it complicates comparing the vectors and then we also have to maintain state between restarts.

API

Obviously we will follow the OpenAI API as in #1273, but I think it would also make sense to have some API to do simple search without an LLM. Just so people can do fuzzy search with LocalAI instead of reaching for another tool. Suggestions for how this API should look welcome.

@richiejp richiejp added the enhancement New feature or request label Mar 4, 2024
@dave-gray101
Copy link
Collaborator

dave-gray101 commented Mar 4, 2024

Personally, my thought is that we should aim for something like 2... in order to get both 1 and 3. I think we should set up an interface that we require from a vector search system first - and then allow the user to select their vector search backend via configuration. I'll definitely need to do some research to see if what I'm proposing even makes sense - but I assume that no matter the vector search backend, the interface we'll need to interact with should be fairly constant.

I'm assuming that in many production cases, people will want to use an external vector search database, as they will definitely have better performance than anything we make :D

However, for the sake of our tests and quick development cycles, I like the idea of a really quick "in memory" backend - the fewer external dependencies in that case the better.

Notably, I don't think these should be exactly the same as our gRPC generation backends - this might be better accomplished with a simple go interface.

@richiejp
Copy link
Collaborator Author

richiejp commented Mar 4, 2024

I went ahead and added to the gRPC backend before seeing your post (I'll create a WIP PR shortly). Possibly it's too much of a break from the existing backends and is overloading the interface. However my feeling is also that a simple interface can cover most use cases as you say. I created an interface that is similar to a basic key-value store and is column orientated like most the vector databases I have seen.

It won't cover hybrid (i.e. non vector) searches and creating indexes. I can see uses for that, but for now I hope it is enough just to split the entries into groups.

service Backend {
  ...
  rpc StoresSet(StoresSetOptions) returns (Result) {}
  rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
  rpc StoresGet(StoresGetOptions) returns (StoresGetResult) {}
  rpc StoresFind(StoresFindOptions) returns (StoresFindResult) {}
}

message StoresKey {
  // TODO: Add shard/database/file ID to separate unrelated embeddings
  repeated float Floats = 1;
}

message StoresValue {
  bytes Bytes = 1;
}

message StoresSetOptions {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
}

message StoresDeleteOptions {
  repeated StoresKey Keys = 1;
}

message StoresGetOptions {
  repeated StoresKey Key = 1;
}

message StoresGetResult {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
}

message StoresFindOptions {
  StoresKey Key = 1;
  int32 TopK = 2;
}

message StoresFindResult {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
  repeated float Similarities = 3;
}

@richiejp
Copy link
Collaborator Author

richiejp commented Mar 8, 2024

The PR now implements an internal gRPC API with vector search. The next step is to create an HTTP API which mirrors the gRPC one in my current thinking. Then some e2e testing with an external script can be done or with the HTTP Go tests.

@richiejp
Copy link
Collaborator Author

I added a HTTP API which mirrors the gRPC API and some very basic tests for that.

@richiejp
Copy link
Collaborator Author

@richiejp
Copy link
Collaborator Author

Probably at the very least an ID field is needed so that the embedding vector is not being used as an ID.

truecharts-admin added a commit to truecharts/charts that referenced this issue Mar 27, 2024
…1.0@8f708d1 by renovate (#19852)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.10.1` -> `v2.11.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>mudler/LocalAI (docker.io/localai/localai)</summary>

###
[`v2.11.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.11.0)

[Compare
Source](https://togithub.com/mudler/LocalAI/compare/v2.10.1...v2.11.0)

### Introducing LocalAI v2.11.0: All-in-One Images!

Hey everyone! 🎉 I'm super excited to share what we've been working on at
LocalAI - the launch of v2.11.0. This isn't just any update; it's a
massive leap forward, making LocalAI easier to use, faster, and more
accessible for everyone.

#### 🌠 The Spotlight: All-in-One Images, OpenAI in a box

Imagine having a magic box that, once opened, gives you everything you
need to get your AI project off the ground with generative AI. A full
clone of OpenAI in a box. That's exactly what our AIO images are!
Designed for both CPU and GPU environments, these images come pre-packed
with a full suite of models and backends, ready to go right out of the
box.

Whether you're using Nvidia, AMD, or Intel, we've got an optimized image
for you. If you are using CPU-only you can enjoy even smaller and
lighter images.

To start LocalAI, pre-configured with function calling, llm, tts, speech
to text, and image generation, just run:

```bash
docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu

#### Do you have a Nvidia GPUs? Use this instead
#### CUDA 11

### docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-11
#### CUDA 12

### docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-12
```

##### ❤️ Why You're Going to Love AIO Images:

- Ease of Use: Say goodbye to the setup blues. With AIO images,
everything is configured upfront, so you can dive straight into the fun
part - hacking!
- Flexibility: CPU, Nvidia, AMD, Intel? We support them all. These
images are made to adapt to your setup, not the other way around.
- Speed: Spend less time configuring and more time innovating. Our AIO
images are all about getting you across the starting line as fast as
possible.

##### 🌈 Jumping In Is a Breeze:

Getting started with AIO images is as simple as pulling from Docker Hub
or Quay and running it. We take care of the rest, downloading all
necessary models for you. For all the details, including how to
customize your setup with environment variables, our updated docs have
got you covered [here](https://localai.io/basics/getting_started/),
while you can get more details of the AIO images
[here](https://localai.io/docs/reference/aio-images/).

#### 🎈 Vector Store

Thanks to the great contribution from
[@&#8203;richiejp](https://togithub.com/richiejp) now LocalAI has a new
backend type, "vector stores" that allows to use LocalAI as in-memory
Vector DB
([mudler/LocalAI#1792).
You can learn more about it [here](https://localai.io/stores/)!

#### 🐛 Bug fixes

This release contains major bugfixes to the watchdog component, and a
fix to a regression introduced in v2.10.x which was not respecting
`--f16`, `--threads` and `--context-size` to be applied as model's
defaults.

#### 🎉 New Model defaults for llama.cpp

Model defaults has changed to automatically offload maximum GPU layers
if a GPU is available, and it sets saner defaults to the models to
enhance the LLM's output.

#### 🧠 New pre-configured models

You can now run `llava-1.6-vicuna`, `llava-1.6-mistral` and
`hermes-2-pro-mistral`, see [Run other
models](https://localai.io/docs/getting-started/run-other-models/) for a
list of all the pre-configured models available in the release.

### 📣 Spread the word!

First off, a massive thank you (again!) to each and every one of you
who've chipped in to squash bugs and suggest cool new features for
LocalAI. Your help, kind words, and brilliant ideas are truly
appreciated - more than words can say!

And to those of you who've been heros, giving up your own time to help
out fellow users on Discord and in our repo, you're absolutely amazing.
We couldn't have asked for a better community.

Just so you know, LocalAI doesn't have the luxury of big corporate
sponsors behind it. It's all us, folks. So, if you've found value in
what we're building together and want to keep the momentum going,
consider showing your support. A little shoutout on your favorite social
platforms using @&#8203;LocalAI_OSS and @&#8203;mudler_it or joining our
sponsors can make a big difference.

Also, if you haven't yet joined our Discord, come on over! Here's the
link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us
keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and here's to more exciting times ahead with LocalAI!

### 🔗 Links

- Quickstart docs (how to run with AIO images):
https://localai.io/basics/getting_started/
- More reference on AIO image:
https://localai.io/docs/reference/aio-images/
- List of embedded models that can be started:
https://localai.io/docs/getting-started/run-other-models/

### 🎁 What's More in v2.11.0?

##### Bug fixes 🐛

- fix(config): pass by config options, respect defaults by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1878
- fix(watchdog): use ShutdownModel instead of StopModel by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1882
- NVIDIA GPU detection support for WSL2 environments by
[@&#8203;enricoros](https://togithub.com/enricoros) in
[mudler/LocalAI#1891
- Fix NVIDIA VRAM detection on WSL2 environments by
[@&#8203;enricoros](https://togithub.com/enricoros) in
[mudler/LocalAI#1894

##### Exciting New Features 🎉

- feat(functions/aio): all-in-one images, function template enhancements
by [@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1862
- feat(aio): entrypoint, update workflows by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1872
- feat(aio): add tests, update model definitions by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1880
- feat(stores): Vector store backend by
[@&#8203;richiejp](https://togithub.com/richiejp) in
[mudler/LocalAI#1795
- ci(aio): publish hipblas and Intel GPU images by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1883
- ci(aio): add latest tag images by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1884

##### 🧠 Models

- feat(models): add phi-2-chat, llava-1.6, bakllava, cerbero by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1879

##### 📖 Documentation and examples

- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1856
- docs(mac): improve documentation for mac build by
[@&#8203;tauven](https://togithub.com/tauven) in
[mudler/LocalAI#1873
- docs(aio): Add All-in-One images docs by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1887
- fix(aio): make image-gen for GPU functional, update docs by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1895

##### 👒 Dependencies

- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1508
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1857
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1864
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1866
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1867
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1874
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1875
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1881
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1885
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1889

##### Other Changes

- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1896
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1897

#### New Contributors

- [@&#8203;enricoros](https://togithub.com/enricoros) made their first
contribution in
[mudler/LocalAI#1891

**Full Changelog**:
mudler/LocalAI@v2.10.1...v2.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzEuMSIsInVwZGF0ZWRJblZlciI6IjM3LjI3MS4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants