Releases: meilisearch/meilisearch
v1.7.3
This new release doesn’t contain any fixes or features.
We make it only because the release-v1.7.2 had an issue and didn’t contain all the required assets (Linux, macOS, and Windows x86 binaries were missing).
What's Changed
- Update version for the next release (v1.7.3) in Cargo.toml by @meili-bot in #4519
Full Changelog: v1.7.2...v1.7.3
v1.7.2 🐇
v1.7.1 🐇
Indexing Speed Improvement 🏇
- Skip reindexing when modifying unknown faceted fields by @Kerollmops in #4479
v1.7.0 🐇
Meilisearch v1.7.0 focuses on improving v1.6.0 features, indexing speed and hybrid search.
🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.
Some SDKs might not include all new features—consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).
New features and improvements 🔥
Improved AI-powered search — Experimental
To activate AI-powered search, set vectorStore
to true
in the /experimental-features
route. Consult the Meilisearch documentation for more information.
🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
New OpenAI embedding models
When configuring OpenAI embedders), you can now specify two new models:
text-embedding-3-small
with a default dimension of 1536.text-embedding-3-large
with a default dimension of 3072.
These new models are cheaper and improve search result relevancy.
Custom OpenAI model dimensions
You can configure dimensions
for sources using the new OpenAI models: text-embedding-3-small
and text-embedding-3-large
. Dimensions must be bigger than 0 and smaller than the model size:
"embedders": {
"new_model": {
"source": "openAi",
"model": "text-embedding-3-large",
"dimensions": 512 // must be >0, must be <= 3072 for "text-embedding-3-large"
},
"legacy_model": {
"source": "openAi",
"model": "text-embedding-ada-002"
}
}
You cannot customize dimensions for older OpenAI models such as text-embedding-ada-002
. Setting dimensions
to any value except the default size of these models will result in an error.
GPU support when computing Hugging Face embeddings
Activate CUDA to use Nvidia GPUs when computing Hugging Face embeddings. This can significantly improve embedding generation speeds.
To enable GPU support through CUDA for HuggingFace embedding generation:
- Install CUDA dependencies
- Clone and compile Meilisearch with the
cuda
feature:cargo build --release --package meilisearch --features cuda
- Launch your freshly compiled Meilisearch binary
- Activate vector search
- Add a Hugging Face embedder
Improved indexing speed and reduced memory crashes
- Auto-batch task deletion to reduce indexing time (#4316) @irevoire
- Improved indexing speed for vector store (Hybrid search experimental feature indexing time more than 10 times faster) (#4332) @Kerollmops @irevoire
- Capped the maximum memory of grenad sorters to reduce memory usage (#4388) @Kerollmops
- Added multiple technical and internal indexing improvements (#4350) @ManyTheFish
- Enhance facet incremental indexing (#4433) @ManyTheFish
- Change the threshold triggering incremental indexing (#4462) @ManyTheFish
Stabilized showRankingScoreDetails
The showRankingScoreDetails
search parameter, first introduce as an experimental feature in Meilisearch v1.3.0, is now a stable feature.
Use it with the /search
endpoint to view detailed scores per ranking rule for each returned document:
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman Returns", "showRankingScoreDetails": true }'
When showRankingScoreDetails
is set to true
, returned documents include a _rankingScoreDetails
field:
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 1,
"maxMatchingWords": 1,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 1,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 1.0
},
"attribute": {
"order": 3,
"attributes_ranking_order": 0.8,
"attributes_query_word_order": 0.6363636363636364,
"score": 0.7272727272727273
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"matchingWords": 0,
"maxMatchingWords": 1,
"score": 0.3333333333333333
}
}
Improved logging
Log output modified
Log messages now follow a different pattern:
# new format ✅
2024-02-06T14:54:11Z INFO actix_server::builder: 200: starting 10 workers
# old format ❌
[2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers
Log output format — Experimental
You can now configure Meilisearch to output logs in JSON.
Relaunch your instance passing json
to the --experimental-logs-mode
command-line option:
./meilisearch --experimental-logs-mode json
--experimental-logs-format
accepts two values:
human
: default human-readable outputjson
: JSON structured logs
🗣️ This feature is experimental and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
New /logs/stream
and /logs/stderr
routes — Experimental
Meilisearch v1.7 introduces 2 new experimental API routes: /logs/stream
and /logs/stderr
.
Use the /experimental-features
route to activate both routes during runtime:
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"logsRoute": true
}'
🗣️ This feature is experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
/logs/stream
Use the POST
endpoint to output logs in a stream. The following example disables actix logging and keeps all other logs at the DEBUG
level:
curl \
-X POST http://localhost:7700/logs/stream \
-H 'Content-Type: application/json' \
--data-binary '{
"mode": "human",
"target": "actix=off,debug"
}'
This endpoint requires two paramaters:
target
: defines the log level and on which part of the engine you want to apply it. Must be a string formatted ascode_part=log_level
. Omitcode_part=
to set a single log level for the whole strram. Valid values for log level are:trace,
debug,
info,
warn,
error
, oroff
mode
: acceptsfmt
(basic) orprofile
(verbose trace)
Use the DELETE
endpoint of /logs/stream
to interrupt a stream:
curl -X DELETE http://localhost:7700/logs/stream
You may only have one listener at a time. Meilisearch log streams are not compatible with xh
or httpie
.
/logs/stderr
Use the POST
endpoint to configure the default log output for non-stream logs:
curl \
-X POST http://localhost:7700/logs/stream \
-H 'Content-Type: application/json' \
--data-binary '{
"target": "debug"
}'
/logs/stderr
accepts one parameter:
target
: defines the log level and on which part of the engine you want to apply it. Must be a string formatted ascode_part=log_level
. Omitcode_part=
to set a single log level for the whole strram. Valid values for log level are:trace,
debug,
info,
warn,
error
, oroff
Other improvements
- Prometheus experimental feature: add job variable to Grafana dashboard (#4330) @capJavert
- Multiple language support improvements, including expanded Vietnamese normalization (Ð and Đ into d). Now uses Charabia v0.8.7. (#4365) @agourlay, @choznerol, @ngdbao, @timvisee, @xshadowlegendx, and @ManyTheFish
- New experimental feature: change the behavior of Meilisearch in a few ways to run meilisearch in a cluster by externalizing the task queue.
- Add the content type to the webhook (#4450) @irevoire
Fixes 🐞
- Make update file deletion atomic (#4435) @irevoire
- Do not omit vectors when importing a dump (#4446) @dureuill
- Put a bound on OpenAI timeout (#4459) @dureuill
Misc
v1.7.0-rc.2 🐇
What's Changed since previous RC
Facet indexing
The facet incremental indexing has been optimized, and the threshold used to choose between bulk and incremental indexing has been changed to fit users' needs:
- Enhance facet incremental by @ManyTheFish in #4433
- Divide threshold by ten by @ManyTheFish in #4462
Semantic search
- Add GPU analytics by @dureuill in #4443
- Put a bound on OpenAI timeout by @dureuill in #4459
- Do not omit vectors when importing a dump by @dureuill in #4446
Benchmarks
- Add subcommand to run benchmarks by @dureuill in #4445
- Replace logging timer by spans by @dureuill in #4458
HA
Fixes
v1.7.0-rc.1 🐇
What's Changed since previous RC
- Make several indexing optimizations by @ManyTheFish in #4350
- Update charabia by @ManyTheFish in #4365
- Implement the experimental log mode cli flag and log level updates at runtime by @irevoire in #4410
- Output logs to stderr by @irevoire in #4418
v1.6.2 🦊
v1.7.0-rc.0 🐇
Meilisearch v1.7.0 mostly focuses on improving v1.6.0 features, indexing speed and hybrid search. GPU computing is now supported.
New features and improvements 🔥
Improve AI with Meilisearch (experimental feature)
🗣️ AI work is still experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
To use it, you need to enable vectorSearch
through the /experimental-features
route.
💡 More documentation about AI search with Meilisearch here.
Add new OpenAI embedding models & ability to override their models dimensions
When using OpenAi
as source
in your embedders
index settings (an example here), you can now specify two new models:
text-embedding-3-small
with a default dimension of 1536.text-embedding-3-large
with a default dimension of 3072.
The new models:
- are cheaper
- produce more relevant results in standardized tests
- allow to set up the dimensions of the embeddings to control the trade-off between accuracy and performance (including storage)
It means that it is now possible to pass the dimensions field when using the OpenAi
source. This was previously only available for the userProvided
source in the previous releases.
There are some rules, though, which we detail with these examples:
"embedders": {
"large": {
"source": "openAi",
"model": "text-embedding-3-large",
"dimensions": 512 // must be >0, must be <= 3072 for "text-embedding-3-large"
},
"small": {
"source": "openAi",
"model": "text-embedding-3-small",
"dimensions": 1024 // must be >0, must be <= 1536 for "text-embedding-3-small"
},
"legacy": {
"source": "openAi",
"model": "text-embedding-ada-002",
"dimensions": 1536 // must =1536 for "text-embedding-ada-002"
},
"omitted_dimensions": { // uses the default dimension
"source": "openAi",
"model": "text-embedding-ada-002",
}
}
Add GPU support to compute embeddings
Enabling the CUDA feature allows using an available GPU to compute embeddings with a huggingFace
embedder.
On an AWS Graviton 2, this yields a x3 - x5 improvement on indexing time.
👇 How to enable GPU support through CUDA for HuggingFace embedding generation:
Prerequisites
- Linux distribution with a compatible CUDA version
- NVidia GPU with CUDA support
- A recent Rust compiler to compile Meilisearch from source
Steps
- Follow the guide to install the CUDA dependencies
- Clone Meilisearch:
git clone https://github.com/meilisearch/meilisearch.git
- Compile Meilisearch with the
cuda
feature:cargo build --release --package meilisearch --features cuda
- In the freshly compiled Meilisearch, enable the vector store experimental feature:
❯ curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{ "vectorStore": true }'
- Add an HuggingFace embedder to the settings:
curl \
-X PATCH 'http://localhost:7700/indexes/your_index/settings/embedders' \
-H 'Content-Type: application/json' --data-binary \
'{ "default": { "source": "huggingFace" } }'
Improve indexing speed & reduce memory crashes
- Auto-batch the task deletions to reduce indexing time (#4316) @irevoire
- Improve indexing speed for vector store (makes the Hybrid search experimental feature indexing time more than 10 times faster) (#4332) @Kerollmops @irevoire
- Reduce memory usage, so reduce the memory crashes, by capping the maximum memory of the grenad sorters (#4388) @Kerollmops
Stabilize scoreDetails
feature
In v1.3.0, we introduced the experimental feature scoreDetails
. We got enough positive feedback on the feature, and we now stabilize it, making this feature enabled by default.
View detailed scores per ranking rule for each document with the showRankingScoreDetails
search parameter:
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman Returns", "showRankingScoreDetails": true }'
When showRankingScoreDetails
is set to true
, returned documents include a _rankingScoreDetails
field. This field contains score values for each ranking rule.
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 1,
"maxMatchingWords": 1,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 1,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 1.0
},
"attribute": {
"order": 3,
"attributes_ranking_order": 0.8,
"attributes_query_word_order": 0.6363636363636364,
"score": 0.7272727272727273
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"matchingWords": 0,
"maxMatchingWords": 1,
"score": 0.3333333333333333
}
}
Logs improvements
We made some changes regarding our logs to help with debugging and bug reporting.
Log format change
The default log format evolved slightly from this:
[2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers
To this:
2024-02-06T13:58:14.710803Z INFO actix_server::builder: 200: starting 10 workers
Experimental: new routes to manage logs
This new version of Meilisearch introduces 3 new experimental routes
POST /logs/stream
: streams the log happening in real-time. Requires two parameters:target
: selects what logs you’re interested in. It takes the form ofcode_part=log_level
. For example,index_scheduler=info
mode
: selects in what format of log you want. Two options are available:human
(basic logs) orprofile
(a way more complex trace)
DELETE /logs/stream
: stops the listener from the meilisearch perspective. Does not require any parameters.
💡 More information in the New experimental routes section of this file.
POST /logs/stream
route:
- You can have only one listener at a time
- Listening to the route doesn’t seem to work with
xh
orhttpie
for the moment - When killing the listener, it may stay installed on Meilisearch for some time, and you will need to call the
DELETE /logs/stream
route to get rid of it.
🗣️ This feature is experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
Other improvements
- Related to the Prometheus experimental feature: add job variable to Grafana dashboard (#4330) @capJavert
Misc
- Dependencies upgrade
- Bump rustls-webpki from 0.101.3 to 0.101.7 (#4263)
- Bump h2 from 0.3.20 to 0.3.24 (#4345)
- Update the dependencies (#4332) @Kerollmops
- CIs and tests
- Documentation
- Add Setting API reminder in issue template (#4325) @ManyTheFish
- Update README (#4319) @codesmith-emmy
- Misc
❤️ Thanks again to our external contributors:
v1.6.1 🦊
v1.6.0 🦊
Meilisearch v1.6 focuses on improving indexing performance. This new release also adds hybrid search and simplifies the process of generating embeddings for semantic search.
🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.
Some SDKs might not include all new features—consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).
New features and improvements 🔥
Experimental: Automated embeddings generation for vector search
With v1.6, you can configure Meilisearch so it automatically generates embeddings using either OpenAI or HuggingFace. If neither of these third-party options suit your application, you may provide your own embeddings manually:
openAI
: Meilisearch uses the OpenAI API to auto-embed your documents. You must supply an OpenAPI key to use this embedderhuggingFace
: Meilisearch automatically downloads the specifiedmodel
from HuggingFace and generates embeddings locally. This will use your CPU and may impact indexing performanceuserProvided
: Compute embeddings manually and supply document vectors to Meilisearch. You may be familiar with this approach if you have used vector search in a previous Meilisearch release. Read further for details on breaking changes for user provided embeddings usage
Usage
Use the embedders
index setting to configure embedders. You may set multiple embedders for an index. This example defines 3 embedders named default
, image
and translation
:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "openAi",
"apiKey": "<your-OpenAI-API-key>",
"model": "text-embedding-ada-002",
"documentTemplate": "A movie titled \'{{doc.title}}\' whose description starts with {{doc.overview|truncatewords: 20}}"
},
"image": {
"source": "userProvided",
"dimensions": 512,
},
"translation": {
"source": "huggingFace",
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"documentTemplate": "A movie titled \'{{doc.title}}\' whose description starts with {{doc.overview|truncatewords: 20}}"
}
}
}'
documentTemplate
is a view of your document that will serve as the base for computing the embedding. This field is a JSON string in the Liquid formatmodel
is the model OpenAI or HuggingFace should use when generating document embeddings
Refer to the documentation for more vector search usage instructions.
⚠️ Vector search breaking changes
If you have used vector search between v1.3.0 and v1.5.0, API usage has changed with v1.6:
-
When providing both the
q
andvector
parameters for a single query, you must provide thehybrid
parameter -
Define a model in your embedder settings is now mandatory:
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 512
}
}
- Vectors should be JSON objects instead of arrays:
"_vectors": { "image2text": [0.0, 0.1, …] } # ✅
"_vectors": [ [0.0, 0.1] ] # ❌
Done in #4226 by @dureuill, @irevoire, @Kerollmops and @ManyTheFish.
Experimental: Hybrid search
This release introduces hybrid search functionality. Hybrid search allows users to mix keyword and semantic search at search time.
Use the hybrid
search parameter to perform a hybrid search:
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{
"q": "Plumbers and dinosaurs",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
}
}'
embedder
is the embedder you choose to perform the search among the ones you defined in your settingssemanticRatio
is a number between0
and1
. The default value is0.5
.1
corresponds to a full semantic search and0
corresponds to keyword search
Tip
The new vector search functionality uses Arroy, a Rust library developed by the Meilisearch engine team. Check out @Kerollmops blog post describing the whole process.
Done in #4226 by @dureuill, @irevoire, @Kerollmops and @ManyTheFish.
Improve indexing speed
This version introduces significant indexing performance improvements. Meilisearch v1.6 has been optimized to:
- store and pre-compute less data than in previous versions
- re-index and delete only the necessary data when updating a document. For example, when you update one document field, Meilisearch will no longer re-index the whole document
On an e-commerce dataset of 2.5Gb of documents, these changes led to more than a 50% time reduction when adding documents for the first time. When updating documents frequently and partially, re-indexing performance hovers between 50% and 75%.
Done in #4090 by @ManyTheFish, @dureuill and @Kerollmops.
Disk space usage reduction
Meilisearch now stores less internal data. This leads to smaller database disk sizes.
With a ~15Mb dataset, the created database is 40% and 50% smaller. Additionally, the database size has become more stable and will display more modest growth with new document additions.
Proximity precision and performance
You can now customize the accuracy of the proximity ranking rule.
Computing this ranking rule uses a significant amount of resources and may lead to increased indexing times. Lowering its precision may lead to significant performance gains. In a minority of use cases, lower proximity precision may also impact relevancy for queries using multiple search terms.
Usage
curl \
-X PATCH 'http://localhost:7700/indexes/books/settings/proximity-precision' \
-H 'Content-Type: application/json' \
--data-binary '{
"proximityPrecision": "byAttribute"
}'
proximityPrecision
accepts either byWord
or byAttribute
:
byWord
calculates the exact distance between words. This is the default setting.byAttribute
only determines whether words are present in the same attribute. It is less accurate, but provides better performance.
Done in #4225 by @ManyTheFish.
Experimental: Limit the number of batched tasks
Meilisearch may occasionally batch too many tasks together, which may lead to system instability. Relaunch Meilisearch with the --experimental-max-number-of-batched-tasks
configuration option to address this issue:
./meilisearch --experimental-max-number-of-batched-tasks 100
You may also configure --experimental-max-number-of-batched-tasks
as an environment variable or directly in the config file with MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS
.
Done in #4249 by @Kerollmops
Task queue webhook
This release introduces a configurable webhook url that will be called whenever Meilisearch finishes processing a task.
Relaunch Meilisearch using --task-webhook-url
and --task-webhook-authorization-header
to use the webhook:
./meilisearch \
--task-webhook-url=https://example.com/example-webhook?foo=bar&number=8 \
--task-webhook-authorization-header=Bearer aSampleAPISearchKey
You may also define the webhook URL and header with environment variables or in the configuration file with MEILI_TASK_WEBHOOK_URL
and MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER
.
Fixes 🐞
- Fix document formatting performances during search (#4313) @ManyTheFish
- The dump tasks are now cancellable (#4208) @irevoire
- Fix: the payload size limit is now also applied to all routes, not only routes to add and update documents (#4231) @Karribalu
- Fix: typo tolerance is ineffective for attributes with similar content (related issue: #4256)
- Fix: the geosort is no longer ignored after the first bucket of a preceding
sort
ranking rule (#4226) - Fix hang on
/indexes
and/stats
routes (#4308) @dureuill - Limit the number of values returned by the facet search based on
maxValuePerFacet
setting (#4311) @Kerollmops
Misc
- Dependencies upgrade
- Documentation
- Misc
❤️ Thanks again to our external contributors: @Karribalu, and @vivek-26