Upgrading Models | Captain Docs

Compass lets you swap embedding models without taking the collection offline. The pattern is blue-green deployment for vector spaces: build the new space alongside the existing one, re-embed in the background, test it, then swap the default. The collection serves queries from the old space the entire time, the swap itself is atomic, and the old space stays in place for rollback.

Common reasons to do this: a newer model scores better on your eval set, you’re adding a language your current model wasn’t trained on, you’re going from text-only to multimodal and need a second space, or your current model is being deprecated.

The examples below send Authorization: Bearer $COMPASS_API_KEY. Set COMPASS_BASE_URL and COMPASS_API_KEY in your shell first (see the quickstart).

Step 1. Add the new vector space

$ curl -X POST $COMPASS_BASE_URL/collections/media/vector-spaces \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "name": "qwen3",
>     "dims": 1024,
>     "model": "Qwen/Qwen3-Embedding-8B"
>   }'

The new space is created with status: "building". The collection continues serving queries from the existing default space.

Step 2. Trigger the rebuild

Point the rebuild at your GPU embedding endpoint. Compass reads all chunks from the collection and posts them to embed_endpoint in batches. The endpoint must implement the HuggingFace TEI /embed interface:

$ curl -X POST $COMPASS_BASE_URL/collections/media/vector-spaces/qwen3/rebuild \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "embed_endpoint": "http://gpu-server:8080/embed"
>   }'

To spin up a TEI instance against the target model:

$ docker run -p 8080:80 --gpus all \
>   ghcr.io/huggingface/text-embeddings-inference \
>   --model-id Qwen/Qwen3-Embedding-8B

A single A10G running Qwen3-Embedding-8B processes around 1,500 documents per second.

Step 3. Poll rebuild status

$ curl $COMPASS_BASE_URL/collections/media/vector-spaces/qwen3/status \
>   -H "Authorization: Bearer $COMPASS_API_KEY"

Response:

1 {
2   "name": "qwen3",
3   "status": "building",
4   "progress": 0.61,
5   "total": 142000,
6   "completed": 86620
7 }

When status is "active" and progress is 1.0, the space is ready. Run your evaluation queries against it by passing "vector_space": "qwen3" to the search endpoint before you swap the default.

Step 4. Swap the default

The swap is atomic. Queries in flight against the old space finish normally. New queries after the swap use the new space:

$ curl -X PUT $COMPASS_BASE_URL/collections/media/default-vector-space \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   -H 'Content-Type: application/json' \
>   -d '{"name": "qwen3"}'

The old space stays in place. If you find a regression, swap back by repeating this call with the old space name.

Step 5. Delete the old space (optional)

Once you’re confident in the new space, free the disk:

$ curl -X DELETE \
>   -H "Authorization: Bearer $COMPASS_API_KEY" \
>   $COMPASS_BASE_URL/collections/media/vector-spaces/OLD_SPACE_NAME

Replace OLD_SPACE_NAME with the name of the space you swapped away from (the previous default), not the new qwen3 space you just promoted.

This is irreversible. The vectors for the old space are deleted from disk. Only do this after you have validated recall on the new space across your real query distribution.

Using your own GPU endpoint

The embed_endpoint parameter accepts any URL that implements the TEI /embed POST interface. That includes vLLM, custom FastAPI servers, or any other inference server that accepts:

1 { "inputs": ["text one", "text two"] }

and returns:

1 [[0.12, -0.34], [0.56, 0.78]]

This means the rebuild can run against your own VPC-internal GPU fleet. The vectors never leave your network.