Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: REST API routes for different corpus/model combinations - how do we name? #2502

Open
lintool opened this issue May 24, 2024 · 7 comments

Comments

@lintool
Copy link
Member

lintool commented May 24, 2024

We have different corpus/model combinations. For example, we have msmarco-v1-passage, msmarco-v1-passage.splade-pp-ed, and msmarco-v1-passage.cos-dpr-distil. They are all on MS MARCO V1 Passage, but different models - the first for BM25, SPLADE, and cosDPR.

For querying different corpus/model combinations, how do we want to configure the REST API route?

Options I see are:

(1) corpus in route, model as query parameter, e.g.,

  • /api/collection/msmarco-v1-passage/search?query={query}&model=bm25
  • /api/collection/msmarco-v1-passage/search?query={query}&model=splade-pp-ed
  • /api/collection/msmarco-v1-passage/search?query={query}&model=cos-dpr-distil

(2) corpus/model (=index) combination directly in route, e.g.,

  • /api/collection/msmarco-v1-passage/search?query={query}&model=bm25
  • /api/collection/msmarco-v1-passage.splade-pp-ed/search?query={query}
  • /api/collection/msmarco-v1-passage.cos-dpr-distil/search?query={query}

The first feels conceptually cleaner, but requires an extra layer of indirection to map into our final indexes.

If we go with option (2), then "collection" is misnamed, but we can change to "index"?

@16BitNarwhal
Copy link
Member

I initially named the route to match the PrebuiltIndexHandler since all the indexes were from there so in that case option (2) changed to "index" would be an improvement (I'm not sure why I named
it "collection" at the start). I also agree that (1) is cleaner, but I'm not sure if it works for BEIR (and correct me if I'm wrong) since I don't think .flat is a model

@ronakice
Copy link
Member

Using the first option, you can maintain clarity and separation between the corpus and the model, making it easier to add or change models in the future. It also simplifies API usage, as users will understand that the primary resource is the collection, and the model is a configurable parameter. So I side with it, but agreed too that it is a bit of work.

@lintool
Copy link
Member Author

lintool commented May 25, 2024

I also agree that (1) is cleaner, but I'm not sure if it works for BEIR (and correct me if I'm wrong) since I don't think .flat is a model

Yea, you'll need another layer of indirection.

I'll make an executive decision of going with (2) for now, with perhaps moving to (1) further down the road... it's probably too much work right now.

@16BitNarwhal can you rename /api/collection/ to /api/index/ and let's wrap this up for now?

@16BitNarwhal
Copy link
Member

https://restfulapi.net/resource-naming/
https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/
Based on the above resources, here are some considerations for renaming the api routes:

  • currently we use /index/{index_name} but we should change it to /indexes/{index_name}, since this implies there a collection/group of indexes and we are specifically picking one (this is also a breaking change so ragnarok which relies on the api would also have to change their call)
  • /indexes should be used to list out all the index information. it maintains consistency as these are the indexes we are querying from
  • /indexes/{index_name}/status should be use to gauge the status of the index (in this case, it would be a json with the property cached)
  • also a consideration is do we want to version the api route, eg. /api/v1/... , so breaking changes to the route or api could be versioned and whoever is using the route can continue to use the old ver. or upgrade at their own pace

@lintool
Copy link
Member Author

lintool commented Jun 17, 2024

Adding /api/v1/... is a breaking change, but I like the idea. Let's do v1.0 instead of just v1.

I'm okay with all the other aspects of your proposal. Let's do it!

@16BitNarwhal
Copy link
Member

Sounds good, will draft and work on #2525

@lintool
Copy link
Member Author

lintool commented Jun 18, 2024

BTW, I don't think we need to preserve the old API, so a breaking change here is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants