TAMS and Time-Range Search

What TAMS is

TAMS (Time-Addressable Media Store) is BBC R&D’s open spec for media archives. The core idea: media is addressed by time, not by file path. Content is organized as a three-level hierarchy. A Source is a logical piece of content (a match, a show, an asset). A Flow is one specific rendition of that source (720p H.264 video, or 48kHz stereo audio). A Segment is a time-bounded chunk of a flow, with a timerange_start and timerange_end in seconds.

Compass models this hierarchy using doc_type, parent_ref, and group_id. You control the structure at ingest time.

Why it matters for agents

Your agent asks “what’s at second 47” more often than it runs open-ended similarity search. Your agent retrieving bounding box positions, active speakers, or shot types needs to pin results to exact time windows before ranking by relevance. Without time-range filters, every query scans the whole collection.

TAMS search in Compass works in two steps: filter to the segments in the time window, then rank those segments by query relevance. The filter step uses Compass’s precomputed bitset facets and runs in microseconds.

Ingesting a TAMS hierarchy

Use client_id and parent_ref to link chunks within a single batch. group_id groups all segments that belong to the same source:

$ curl -X POST $COMPASS_BASE_URL/collections/media/ingest \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "chunks": [
>       {
>         "client_id": "src-001",
>         "file_id": "video-001",
>         "chunk_index": 0,
>         "doc_type": "source",
>         "text": "Premier League: Arsenal vs Chelsea",
>         "metadata": {
>           "asset_type": "video",
>           "created_at": "2026-03-15T15:00:00Z"
>         }
>       },
>       {
>         "client_id": "seg-001",
>         "file_id": "segment-001",
>         "chunk_index": 0,
>         "doc_type": "segment",
>         "parent_ref": "src-001",
>         "group_id": "src-001",
>         "text": "Goal celebration, minute 34",
>         "metadata": {
>           "timerange_start": 2040.0,
>           "timerange_end": 2055.0,
>           "scene_type": "goal",
>           "active_speaker": null,
>           "shot_type": "wide"
>         }
>       },
>       {
>         "client_id": "seg-002",
>         "file_id": "segment-002",
>         "chunk_index": 1,
>         "doc_type": "segment",
>         "parent_ref": "src-001",
>         "group_id": "src-001",
>         "text": "Penalty kick setup, minute 38",
>         "metadata": {
>           "timerange_start": 2280.0,
>           "timerange_end": 2295.0,
>           "scene_type": "penalty",
>           "shot_type": "close-up"
>         }
>       }
>     ]
>   }'

timerange_start and timerange_end are numeric seconds as float64 values. Do not pass ISO date strings. The filter engine treats them as numeric range fields. Passing "2026-03-15T00:34:00Z" instead of 2040.0 will produce no results.

Searching by content and time window

Filter to segments in a specific time range, then rank by query:

$ curl -X POST $COMPASS_BASE_URL/collections/media/search \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "query": "goal celebration",
>     "filters": {
>       "doc_type": { "in": ["segment"] },
>       "timerange_start": { "gte": 2040.0 },
>       "timerange_end": { "lte": 2100.0 }
>     },
>     "relationship_boost": {
>       "parent_weight": 0.3,
>       "sibling_weight": 0.1,
>       "mode": "max"
>     },
>     "top_k": 5
>   }'

This returns segments matching “goal celebration” within the 2040-2100 second window. relationship_boost gives a score bump to segments whose parent source also matches, and to sibling segments in the same group.

Relationship boost

relationship_boost surfaces context around the matching segment. Fields:

Field	Description
`parent_weight`	Score multiplier added when the parent source matches the query (0.0-1.0).
`sibling_weight`	Score multiplier added when a sibling segment in the same `group_id` also matched (0.0-1.0).
`mode`	`"max"`: take the higher of parent and sibling boost. `"sum"`: add them.

A parent_weight of 0.3 means: if the segment’s parent source also scored well, add 30% to this segment’s score. This surfaces segments that are doubly relevant: the specific moment matches, and the broader context matches too.

Point-in-time lookup

Available in v0.3.

To retrieve segments that contain a specific timestamp on a known asset, use the /segments/at endpoint:

$ # Point lookup: all segments on src-001 that cover second 47
$ curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001&time=47.0"
$ 
$ # Range lookup: all segments on src-001 overlapping [2040, 2100]
$ curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001&time_start=2040&time_end=2100"
$ 
$ # Enumeration: all segments on src-001 (no time filter)
$ curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001"

asset is required. It matches the segment’s group_id (which in the TAMS model equals the source’s client_id). time, time_start, and time_end are all optional. If time is set it takes precedence and returns segments where timerange_start <= time <= timerange_end. Otherwise the range parameters define an overlap window: a segment matches if timerange_start <= time_end and timerange_end >= time_start.

The response includes the full chunks sorted ascending by timerange_start, plus a took_ms field:

1 {
2   "results": [
3     {
4       "id": 42,
5       "doc_type": "segment",
6       "group_id": "src-001",
7       "text": "Goal celebration, minute 34",
8       "metadata": {
9         "timerange_start": 2040.0,
10         "timerange_end": 2055.0,
11         "scene_type": "goal",
12         "shot_type": "wide"
13       }
14     }
15   ],
16   "took_ms": 2.4
17 }

No query string is required, no vector scoring runs. Useful when your agent has an exact timestamp on an asset and needs the metadata for that moment (bounding boxes, active speaker, shot type) without running a similarity search.

timerange_start, timerange_end, time, time_start, and time_end are all numeric seconds as float64 values. Do not use ISO date strings. Passing a string in any of these will return no results.

Filter reference

These filters operators work on timerange_start and timerange_end, and on any numeric metadata field:

Operator	Syntax	Example
Exact match	`"field": value`	`"scene_type": "goal"`
Greater than or equal	`"field": { "gte": n }`	`"timerange_start": { "gte": 2040.0 }`
Less than or equal	`"field": { "lte": n }`	`"timerange_end": { "lte": 2100.0 }`
Array contains	`"field": { "contains": value }`	`"tags": { "contains": "sports" }`
Set membership	`"field": { "in": [...] }`	`"doc_type": { "in": ["segment"] }`

Multiple filters combine as AND. A segment must match every filter to be scored.

Linking to source assets (GCS, S3, etc.)

Compass doesn’t fetch from cloud storage, sign URIs, or hold storage credentials. It treats URIs as opaque strings on the chunk’s metadata block. Put your gs://, s3://, or https:// URIs in metadata at ingest time, and they come back on every search hit. Your application layer pre-signs them with your own SDK before handing them to a user or downstream agent.

$ curl -X POST $COMPASS_BASE_URL/collections/media/ingest \
>   -H 'Content-Type: application/json' \
>   -d '{
>     "chunks": [
>       {
>         "client_id": "src-001",
>         "doc_type": "source",
>         "text": "Premier League: Arsenal vs Chelsea",
>         "metadata": {
>           "asset_type": "video",
>           "gcs_video_uri": "gs://customer-bucket/match-001.mp4",
>           "gcs_sidecar_uri": "gs://customer-bucket/match-001.json"
>         }
>       },
>       {
>         "client_id": "seg-001",
>         "doc_type": "segment",
>         "parent_ref": "src-001",
>         "group_id": "src-001",
>         "text": "Goal celebration, minute 34",
>         "metadata": {
>           "timerange_start": 2040.0,
>           "timerange_end": 2055.0,
>           "thumbnail_uri": "gs://customer-bucket/match-001/thumbs/2040.jpg"
>         }
>       }
>     ]
>   }'

Search responses include these URIs in chunk.metadata. Pair them with parent_metadata (on segment hits) to resolve back to the source asset’s URIs without a second round trip.

Embedding JSON sidecars

Whatever you put in a chunk’s text field gets embedded by the active vector space and indexed for full-text search. For sidecar-driven workflows there are two patterns.

Quick: stringify the sidecar and pass it as text. Compass embeds the literal JSON. Recall is decent for short sidecars but the embedding model spends tokens on {, ", and , syntax instead of actual content.

1 {
2   "text": "{\"shot_type\":\"wide\",\"active_speaker\":\"commentator\",\"scene\":\"goal celebration\"}",
3   "metadata": { "shot_type": "wide", "active_speaker": "commentator" }
4 }

Better: flatten to prose in your ingest pipeline, then embed. Walk the sidecar JSON, produce a short natural-language description, and pass that as text. Keep the structured fields in metadata so you can still filter on them. About thirty lines of code in any language, and recall improves materially because the embedding model sees content instead of syntax.

1 {
2   "text": "Scene: goal celebration. Active speaker: commentator. Shot type: wide. Lighting: harsh sunlight.",
3   "metadata": {
4     "shot_type": "wide",
5     "active_speaker": "commentator",
6     "scene_type": "goal",
7     "lighting": "harsh sunlight"
8   }
9 }

Both patterns ship text to whatever embedding model your collection’s vector space uses (BGE-small by default, or your embed_endpoint for larger models). If you want full control over the embedding step, skip both and ship pre-computed vectors directly via the vector_space_embeddings field on the chunk.