For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
API StudioContact Support
GuidesAPI ReferenceChangelog
GuidesAPI ReferenceChangelog
  • Get Started
    • Introduction
    • Quickstart
    • Connect Cloud Storage
    • S3 Cross-Account IAM
    • Multimodal Search
    • Metadata Filtering
    • Parsers
    • Plugins
  • Compass
    • Overview
    • Quickstart
    • Multiple Embedding Models
    • Filters and Recency Boosts
    • TAMS and Time-Range Search
    • Upgrading Models
  • Integrations
    • Overview
  • Odyssey
    • Private Markets
LogoLogo
API StudioContact Support
On this page
  • Step 1. Add the new vector space
  • Step 2. Trigger the rebuild
  • Step 3. Poll rebuild status
  • Step 4. Swap the default
  • Step 5. Delete the old space (optional)
  • Using your own GPU endpoint
Compass

Upgrading Models

Was this page helpful?
Previous

Integrations

Next
Built with

Compass lets you swap embedding models without taking the collection offline. The pattern is blue-green deployment for vector spaces: build the new space alongside the existing one, re-embed in the background, test it, then swap the default. The collection serves queries from the old space the entire time, the swap itself is atomic, and the old space stays in place for rollback.

Common reasons to do this: a newer model scores better on your eval set, you’re adding a language your current model wasn’t trained on, you’re going from text-only to multimodal and need a second space, or your current model is being deprecated.

Step 1. Add the new vector space

$curl -X POST $COMPASS_BASE_URL/collections/media/vector-spaces \
> -H 'Content-Type: application/json' \
> -d '{
> "name": "qwen3",
> "dims": 1024,
> "model": "Qwen/Qwen3-Embedding-8B"
> }'

The new space is created with status: "building". The collection continues serving queries from the existing default space.

Step 2. Trigger the rebuild

Point the rebuild at your GPU embedding endpoint. Compass reads all chunks from the collection and posts them to embed_endpoint in batches. The endpoint must implement the HuggingFace TEI /embed interface:

$curl -X POST $COMPASS_BASE_URL/collections/media/vector-spaces/qwen3/rebuild \
> -H 'Content-Type: application/json' \
> -d '{
> "embed_endpoint": "http://gpu-server:8080/embed"
> }'

To spin up a TEI instance against the target model:

$docker run -p 8080:80 --gpus all \
> ghcr.io/huggingface/text-embeddings-inference \
> --model-id Qwen/Qwen3-Embedding-8B

A single A10G running Qwen3-Embedding-8B processes around 1,500 documents per second.

Step 3. Poll rebuild status

$curl $COMPASS_BASE_URL/collections/media/vector-spaces/qwen3/status

Response:

1{
2 "name": "qwen3",
3 "status": "building",
4 "progress": 0.61,
5 "total": 142000,
6 "completed": 86620
7}

When status is "active" and progress is 1.0, the space is ready. Run your evaluation queries against it by passing "vector_space": "qwen3" to the search endpoint before you swap the default.

Step 4. Swap the default

The swap is atomic. Queries in flight against the old space finish normally. New queries after the swap use the new space:

$curl -X PUT $COMPASS_BASE_URL/collections/media/default-vector-space \
> -H 'Content-Type: application/json' \
> -d '{"name": "qwen3"}'

The old space stays in place. If you find a regression, swap back by repeating this call with the old space name.

Step 5. Delete the old space (optional)

Once you’re confident in the new space, free the disk:

$curl -X DELETE \
> $COMPASS_BASE_URL/collections/media/vector-spaces/default

This is irreversible. The vectors for the old space are deleted from disk. Only do this after you have validated recall on the new space across your real query distribution.

Using your own GPU endpoint

The embed_endpoint parameter accepts any URL that implements the TEI /embed POST interface. That includes vLLM, custom FastAPI servers, or any other inference server that accepts:

1{ "inputs": ["text one", "text two"] }

and returns:

1[[0.12, -0.34], [0.56, 0.78]]

This means the rebuild can run against your own VPC-internal GPU fleet. The vectors never leave your network.