Your text is in English. Your videos have visual content. Your queries are sometimes “find me the contract about renewal terms” and sometimes “find me the goal celebration.” A single embedding model is good at one of these. Compass lets you ingest the same chunk into multiple embedding spaces and pick which spaces to search at query time.
A text model like Harrier scores English-language queries well against transcript text. Ask it to find a visual moment in a video, and it’s working from the transcript description alone: it never saw the frames. A multimodal model like Qwen3-VL embeds images and video frames directly, but it’s weaker on dense English prose. If you choose one model, you give up accuracy on half your query types.
Compass runs both. At ingest time, each chunk is stored in the vector spaces you target. At query time, you pick one space, or both, and Compass handles the merge.
Create a collection with harrier (text, 768 dims) and qwen3-vl (multimodal, 896 dims):
When the query is text against text content, search harrier only. This is faster and avoids the visual model adding noise:
When the query is visual, search qwen3-vl only. The text query gets embedded into the multimodal space, which is trained for cross-modal retrieval:
Pass vector_space as an array. Compass runs HNSW against each named space, runs BM25 against the full-text index, and merges all three result sets using Reciprocal Rank Fusion:
The pipeline looks like this:
RRF assigns each candidate a score of 1 / (k + rank) where k defaults to 60 and rank is the position in each individual result list. Candidates that appear in multiple lists accumulate score from each. A chunk that ranks 2nd in the BM25 list and 4th in the Qwen3-VL list scores higher than a chunk that only appears once.
The response shows the result with its merged score and the retriever source:
source indicates which retriever found the hit: "fts" (BM25 only), "semantic" (HNSW only), or "both" (the hit appeared in both lists and got merged). score is the final RRF-merged value after any scoring boosts.
After RRF, a cross-encoder reranker re-scores the merged candidates from scratch. It doesn’t know which retriever found each candidate. It just scores (query, text) relevance independently:
Reranking is recommended when mixing spaces. The cross-encoder sees the full (query, document) pair and catches cases where RRF surface-ranked a weak match.
When in doubt, search both spaces with the reranker enabled. The latency cost is typically under 20ms at p99, and recall on mixed collections improves enough to justify it.