Getting Started

Getting Started - vitrivr-engine

This is a quick guide on how to setup vitrivr-engine for ingestion and retrieval.

ingestion: The process of adding multimedia content to the system, such that it can be retrieved later.
retrieval: The process of querying the system to find multimedia content.

Apart from these two terms, we do not delve into any terminology. For more in-depth information, please find more information in other parts of the wiki.

Prerequisites

These are the prerequisites of this guide and are essential:

JDK 21 or higher, e.g. OpenJDK
CottontailDB at least v0.16.5 OR PostgreSQL with pgVector
Multimedia content, such as videos and images.

Setup

Start CottontailDB on the default port 1865 OR PostgreSQL with pgVector on the default port 5432.
Build vitrivr-engine (from the root of the repository): Unix:

./gradlew distZip

Windows:

.\gradlew.bat distZip

Unzip the distribution, e.g. unzip -d ../instance/ vitrivr-engine-server/build/distributions/vitrivr-engine-server-0.0.1-SNAPSHOT.zip
Prepare the media data into a folder called sandbox/media

By now, you should have the following folder structure:

+ vitrivr-engine/
|
+ instance/
  |
  + vitrivr-engine-server-0.0.1-SNAPSHOT/
    |
    + bin/
    |
    + lib/
+ sandbox/
  |
  + media/
    |
    - my-img-1.png
    |
    - my-img-2.jpg
    |
    - video.mp4

From now on, we navigate to the instance folder.

Interlude - What do we do?

The goal of this guide is to give a headstart to use vitrivr-engine for ingestion and retrieval. Ultimately, our goal is to search a multimedia collection using content-based methods, such as querying our system with a colour. In order to do so, the system has to have a representation of the files in the collection and be aware of their colour. For the sake of this guide, we configure vitrivr-engine in a way that the average colour can be searched for, hence this is our feature of choice. Furthermore, we want vitrivr-engine know the original file-names.

Disclaimer: In a more practical setup, other features are desirable, which we take into consideration in the example

Schema

Similar to (relational) databases, vitrivr-engine works on the notion of a schema, which defines the representation of the multimedia content:

Create a file schema.json:

{
  "schemas": {
    "sandbox": {}
  }
}

For the sake of this guide we simply limit ourselves to a single schema.

Database Connection

vitrivr-engine requries a running database. Currently, we support CottontailDB or PostgreSQL with pgVector.

We define the database connection at the beginning of the schema:

{
  "schemas": {
    "sandbox": {
      "connection": {
        "database": "CottontailConnectionProvider",
        "parameters": {
          "host": "127.0.0.1",
          "port": "1865"
        }
      }
    }
  }
}

{
  "schemas": {
    "sandbox": {
      "connection": {
        "database": "PgVectorConnectionProvider",
        "parameters": {
          "host": "127.0.0.1",
          "port": "5432",
          "database": "postgres",
          "username": "postgres",
          "password": <password>
        }
      }
    }
  }
}

Fields

We have two goals: (i) we want to search by (average) colour and (ii), we want to search by filename. Hence, both of this information have to be somehow represented for the system. In vitrivr-engine, such representations are called _descriptor_s, or more generally, features. Features are defined as fields on the schema:

{
  "schemas": {
    "sandbox": {
      "connection": {
        "database": "CottontailConnectionProvider",
        "parameters": {
          "Host": "127.0.0.1",
          "port": "1865"
        }
      },
      "fields": {
        "averagecolor": {
          "factory": "AverageColor"
        },
        "file": {
          "factory": "FileSourceMetadata"
        }
      }
    }
  }
}

This defines two fields, averagecolor and file on our sandbox schema.

Resolver and Exporter

In order to be able to export thumbnails during extraction (ingestion) and also have access to these thumbnails during query time (retrieval), we configure a disk resolver and a thumbnail exporter:

{
  "schemas": {
    "sandbox": {
      "connection": {
        "database": "CottontailConnectionProvider",
        "parameters": {
          "Host": "127.0.0.1",
          "port": "1865"
        }
      },
      "fields": {
        "averagecolor": {
          "factory": "AverageColor"
        },
        "file": {
          "factory": "FileSourceMetadata"
        }
      },
      "resolvers": {
        "disk": {
          "factory": "DiskResolver",
          "parameters": {
            "location": "../sandbox/thumbnails/"
          }
        }
      },
      "exporters": {
        "thumbnail": {
          "factory": "ThumbnailExporter",
          "resolverName": "disk",
          "parameters": {
            "maxSideResolution": "400",
            "mimeType": "JPG"
          }
        }
      }
    }
  }
}

We configure the disk named resolver such that its location is ../sandbox/thumbnails/ (remember: we operate from within the instance folder). Additionally, we setup the thumbnail exporter such that the MIME type will be JPG and the longer side will have 400px.

Ingestion

While the schema defines the representation, we require an ingestion pipeline which defines how and in which order this representations are computed:

Create a file ingestion-image.json:

{
  "schema": "sandbox",
  "context": {
      "contentFactory": "InMemoryContentFactory",
      "resolverName": "disk",
      "local": {
          "enumerator": {
              "path": "../sandbox/media/",
              "depth": "1"
          },
          "thumbnail": {
              "path": "../sandbox/thumbnails/"
          },
          "filter": {
              "type": "SOURCE:IMAGE"
          }
      }
  },
  "operators": {
      "enumerator": { "type": "ENUMERATOR", "factory": "FileSystemEnumerator", "mediaTypes": ["IMAGE"]},
      "decoder": { "type": "DECODER", "factory": "ImageDecoder"  },
      "avgColor": { "type": "EXTRACTOR", "fieldName": "averagecolor"},
      "file_metadata": { "type": "EXTRACTOR", "fieldName": "file" },
      "thumbnail": { "type": "EXPORTER", "exporterName": "thumbnail" },
      "filter": { "type": "TRANSFORMER", "factory": "TypeFilterTransformer"}
  },
  "operations": {
      "enumerator": { "operator": "enumerator" },
      "decoder": { "operator": "decoder", "inputs": [ "enumerator" ] },
      "averagecolor": { "operator": "avgColor","inputs": ["decoder"]},
      "thumbnail": {  "operator": "thumbnail", "inputs": ["decoder"] },
      "filter": {  "operator": "filter", "inputs": ["averagecolor", "thumbnail"], "merge": "COMBINE" },
      "file_metadata": {  "operator": "file_metadata", "inputs": ["filter"] }
  },
  "output": ["file_metadata"]
}

Frist, we define in context.local corresponding parameters, such as where the media files are (context.local.enumerator.path) and where to store the thumbnails (context.local.thumbnails.location). Second, in operators we define what operators form the pipeline. Here, the names given are also the names required to be used in the context.local. Third, we define the operations, the pipeline. See below. Fourth, with the output property, we define after which operation the persistance of the representations happens.

Pipeline The pipeline is defined as follows:

The enumerator reads the files and sends each IMAGE file to the decoder, which in turn decodes the image and sends its internal representation to the thumbnail and averagecolor operators. The filter operator merges the results from both, the thumbnail and averagecolor operators, but only further sends those of type SOURCE:IMAGE to the last operator, the file_metadata. This is then persisted, as specified with the output property.

Run Extraction

We have built vitrivr-engine, have a running CottontailDB instance and also created our schema.json and ingestion-image.json configuration files.

Let's start vitrivr-engine, which will result in the CLI running. We also pass our schema to vitrivr-engine:

./vitrivr-engine-server-0.0.1-SNAPSHOT/bin/vitrivr-engine-server schema.json

Within the CLI, we first initialise the database:

v> sandbox init

We use the command sandbox to address our defined schema and use the sub command init in order to initialise and prepare the database.

Then, we start the extraction, which results in the print of something along the line:

Started extraction job with UUID <uuid>

The server has been started as well on the (default) port 7070, hence we can use the OpenAPI swagger ui to check on the status of the extraction job (replace <uuid> with your uuid of the extraction):

curl -X 'GET' \
  'http://localhost:7070/api/sandbox/index/<uuid>' \
  -H 'accept: application/json'

Depending on your sandbox collection and hardware, this might be already done or take a while.

Once the extraction is complete, you can move on to the retrieval part:

Retrieval

For retrieval, we strictly operate with vitrivr-engine's OpenAPI swagger ui. In a real-world scenario, one would likely build an appropriate UI for vitrivr-engine. Currently, there is a vitrivr-engine video supporting branch of vitrivr-ng-min for reference.

Queries

For querying, one must be aware of the available representations (descriptors) configured on the schema and effectively extracted. In this guide, there are two fields, averagecolor and file. The former is a representation of the average colour of the media data as 3-long RGB vector, the latter a table-like strucutre with, among others, the file name as a textual value addressed as path.

In the case of a vector representation, nearest neighbour search (NNS) is performed.

Head over to the OpenAPI swagger ui and locate the query endpoint. Using the swagger UI, try out the query by specifying the schema as sandbox and paste the following example NNS query for the field averagecolor:

{
    "context": {},
    "inputs": {
        "color": {
            "type": "VECTOR",
            "data": [
                0.5,
                0.5,
                0.5
            ]
        }
    },
    "operations": {
        "op_color": {
            "type": "RETRIEVER",
            "field": "averagecolor",
            "input": "color"
        }
    },
    "output": "op_color"
}

example-query-sandbox-averagecolor

This will query vitrivr-engine for the average colour with values [0.5,0.5,0.5], which is a medium grey.

To query a sub-field of a structured descriptor, such as the file field, containing the path (text) and size (number) sub-fields, a Boolean query on sub-fields are used.

In the followng example, we search for files with a size larger than 15000 bytes:

{
    "context": {},
    "inputs": {
        "size": {
            "type": "NUMERIC",
            "value":"1500",
            "comparison":">"
        }
    },
    "operations": {
        "op1": {
            "type": "RETRIEVER",
            "field": "file.size",
            "input": "size"
        }
    },
    "output": "op1"
}

⚠️ This wiki is work-in-progress and targets the dev branch / Release Candiate 1 to be released by the end of August 2024 ⚠️

Found an issue in the wiki? Post it!

Have a question? Ask it

Disclaimer: Please keep in mind, vitrivr and vitrivr-engine are predominantly research prototypes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly