Skip to content

API exposing algorithms on top of project2vec

License

Notifications You must be signed in to change notification settings

thoth-station/isis-api

Repository files navigation

Isis

A service exposing package tags and package categories to Thoth's recommendation engine.

project2vec

Isis API exposes functionality on top of project2vec - description of a package using a vector. The vector consists of features that the given project provides. These features are aggregated based on keywords found in the Python ecosystem and subsequently they are extracted from project descriptions and other free text descriptions of a project (README files on linked GitHub repos).

These vectors form space in which we can search for similar packages (by computing distance between these vectors) as well as performing feature based queries by for example masking the resulting vectors (specifing features we are interested in).

The figure bellow shows a visualization in TensorBoard after dimensionality space reduction using t-SNE. There can be seen formed clusters of similar packages and search of similar packages in the ecosystem.

TensorBoard project2vec visualization

Deployment

The service is built using OpenShift's s2i. On deployment, there is first run an init container that downloads model from Ceph/S3 (created by one of the flows defined by selinon-worker flows).