Multiple Embedding Models

Your text is in English. Your videos have visual content. Your queries are sometimes “find me the contract about renewal terms” and sometimes “find me the goal celebration.” A single embedding model is good at one of these. Compass lets you ingest the same chunk into multiple embedding spaces and pick which spaces to search at query time.

The problem a single model creates

A text model like Harrier scores English-language queries well against transcript text. Ask it to find a visual moment in a video, and it’s working from the transcript description alone: it never saw the frames. A multimodal model like Qwen3-VL embeds images and video frames directly, but it’s weaker on dense English prose. If you choose one model, you give up accuracy on half your query types.

Compass runs both. At ingest time, each chunk is stored in the vector spaces you target. At query time, you pick one space, or both, and Compass handles the merge.

Example: a collection with two spaces

The examples below send Authorization: Bearer $COMPASS_API_KEY. Set COMPASS_BASE_URL and COMPASS_API_KEY in your shell first (see the quickstart).

Create a collection with harrier (text, 768 dims) and qwen3-vl (multimodal, 896 dims):

$ curl -X POST $COMPASS_BASE_URL/collections \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "name": "media",
>     "vector_spaces": {
>       "harrier": {
>         "dims": 768,
>         "model": "microsoft/harrier-oss-v1-0.6b",
>         "status": "active"
>       },
>       "qwen3-vl": {
>         "dims": 896,
>         "model": "Qwen/Qwen3-VL-Embedding-2B",
>         "status": "active"
>       }
>     }
>   }'

Searching one space

When the query is text against text content, search harrier only. This is faster and avoids the visual model adding noise:

$ curl -X POST $COMPASS_BASE_URL/collections/media/search \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "query": "analyst report on Q3 earnings",
>     "mode": "hybrid",
>     "vector_space": "harrier",
>     "top_k": 10
>   }'

When the query is visual, search qwen3-vl only. The text query gets embedded into the multimodal space, which is trained for cross-modal retrieval:

$ curl -X POST $COMPASS_BASE_URL/collections/media/search \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "query": "goal celebration slow motion",
>     "mode": "semantic",
>     "vector_space": "qwen3-vl",
>     "top_k": 10
>   }'

Searching both spaces with RRF merge

Pass vector_space as an array. Compass runs HNSW against each named space, runs BM25 against the full-text index, and merges all three result sets using Reciprocal Rank Fusion:

$ curl -X POST $COMPASS_BASE_URL/collections/media/search \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "query": "goal celebration slow motion",
>     "mode": "hybrid",
>     "vector_space": ["harrier", "qwen3-vl"],
>     "top_k": 10
>   }'

The pipeline looks like this:

Query
  |
  +-- Tantivy BM25 -----------> FTS candidates
  +-- Harrier HNSW -----------> text-space candidates
  +-- Qwen3-VL HNSW ----------> multimodal candidates
  |
  v
  RRF merge (k=60)
  |
  v
  Optional reranker
  |
  v
  Return top_k

How RRF merges scores

RRF assigns each candidate a score of 1 / (k + rank) where k defaults to 60 and rank is the position in each individual result list. Candidates that appear in multiple lists accumulate score from each. A chunk that ranks 2nd in the BM25 list and 4th in the Qwen3-VL list scores higher than a chunk that only appears once.

The response shows the result with its merged score and the retriever source:

1 {
2   "results": [
3     {
4       "chunk": {
5         "id": 42,
6         "doc_type": "segment",
7         "text": "Goal celebration. Striker runs toward the corner flag, arms wide.",
8         "metadata": { "shot_type": "wide", "scene": "celebration" }
9       },
10       "score": 0.94,
11       "source": "both"
12     }
13   ]
14 }

source indicates which retriever found the hit: "fts" (BM25 only), "semantic" (HNSW only), or "both" (the hit appeared in both lists and got merged). score is the final RRF-merged value after any scoring boosts.

Adding a reranker

After RRF, a cross-encoder reranker re-scores the merged candidates from scratch. It doesn’t know which retriever found each candidate. It just scores (query, text) relevance independently:

$ curl -X POST $COMPASS_BASE_URL/collections/media/search \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "query": "goal celebration slow motion",
>     "mode": "hybrid",
>     "vector_space": ["harrier", "qwen3-vl"],
>     "rerank": true,
>     "top_k": 10
>   }'

Reranking is recommended when mixing spaces. The cross-encoder sees the full (query, document) pair and catches cases where RRF surface-ranked a weak match.

Choosing the right path

Query type	Content type	Recommended path
Text (“find the contract about X”)	Text documents	`harrier` only, `mode: hybrid`
Text (“find the goal celebration”)	Video frames / images	`qwen3-vl` only, `mode: semantic`
Text query, mixed collection	Text + video + images	Both spaces, `mode: hybrid`, `rerank: true`
Visual query (image embedding)	Any	`qwen3-vl` only

When in doubt, search both spaces with the reranker enabled. The latency cost is typically under 20ms at p99, and recall on mixed collections improves enough to justify it.