TAMS and Time-Range Search
TAMS and Time-Range Search
TAMS and Time-Range Search
TAMS (Time-Addressable Media Store) is BBC R&D’s open spec for media archives. The core idea: media is addressed by time, not by file path. Content is organized as a three-level hierarchy. A Source is a logical piece of content (a match, a show, an asset). A Flow is one specific rendition of that source (720p H.264 video, or 48kHz stereo audio). A Segment is a time-bounded chunk of a flow, with a timerange_start_ms and timerange_end_ms in integer milliseconds.
Compass models this hierarchy using doc_type, parent_ref, and group_id. You control the structure at ingest time.
Time in Compass is always integer milliseconds. Every modern sidecar producer (Gemini Media Understanding, Deepgram, ffmpeg scene-detection) emits ms, so this is the lingua franca. If your data is in seconds, multiply by 1000 in your ingest helper before sending.
Your agent asks “what’s at second 47” more often than it runs open-ended similarity search. Your agent retrieving bounding box positions, active speakers, or shot types needs to pin results to exact time windows before ranking by relevance. Without time-range filters, every query scans the whole collection.
TAMS search in Compass works in two steps: filter to the segments in the time window, then rank those segments by query relevance. The filter step uses Compass’s precomputed bitset facets and runs in microseconds.
The examples below send Authorization: Bearer $COMPASS_API_KEY. Set COMPASS_BASE_URL and COMPASS_API_KEY in your shell first (see the quickstart).
Use client_id and parent_ref to link chunks within a single batch. group_id groups all segments that belong to the same source:
timerange_start_ms and timerange_end_ms are integer milliseconds. Do not pass seconds (e.g. 2040.0) or ISO date strings (e.g. "2026-03-15T00:34:00Z"). The filter engine treats them as plain numeric fields. Passing the wrong unit will produce no results when you later filter or run a /segments/at lookup.
Filter to segments in a specific time range, then rank by query:
This returns segments matching “goal celebration” within the 2,040,000-2,100,000 ms window (about minute 34 to minute 35 of the asset). relationship_boost gives a score bump to segments whose parent source also matches, and to sibling segments in the same group.
relationship_boost surfaces context around the matching segment. Fields:
A parent_weight of 0.3 means: if the segment’s parent source also scored well, add 30% to this segment’s score. This surfaces segments that are doubly relevant: the specific moment matches, and the broader context matches too.
Available in v0.3.
To retrieve segments that contain a specific timestamp on a known asset, use the /segments/at endpoint:
asset is required. It matches the segment’s group_id (which in the TAMS model equals the source’s client_id). time_ms, time_start_ms, and time_end_ms are all optional. If time_ms is set it takes precedence and returns segments where timerange_start_ms <= time_ms <= timerange_end_ms. Otherwise the range parameters define an overlap window: a segment matches if timerange_start_ms <= time_end_ms and timerange_end_ms >= time_start_ms.
The response includes the full chunks sorted ascending by timerange_start_ms, plus a took_ms field:
No query string is required, no vector scoring runs. Useful when your agent has an exact timestamp on an asset and needs the metadata for that moment (bounding boxes, active speaker, shot type) without running a similarity search.
timerange_start_ms, timerange_end_ms, time_ms, time_start_ms, and time_end_ms are all integer milliseconds. Do not use seconds or ISO date strings. Passing the wrong unit will return no results.
Sidecar fields like gemini.response.standout_timestamps[] or per-frame events carry a single timestamp with no end. Convention: store them as segments where timerange_start_ms and timerange_end_ms are the same value. A point lookup ?time_ms=T matches the instant when T equals that timestamp; range queries that overlap that millisecond also match.
This lets the same /segments/at endpoint serve both interval queries (“what was happening between minute 34 and 35?”) and exact-frame queries (“what was tagged at ms 5200 exactly?”) without a separate API.
These filters operators work on timerange_start_ms and timerange_end_ms, and on any numeric metadata field:
Multiple filters combine as AND. A segment must match every filter to be scored.
Compass doesn’t fetch from cloud storage, sign URIs, or hold storage credentials. It treats URIs as opaque strings on the chunk’s metadata block. Put your gs://, s3://, or https:// URIs in metadata at ingest time, and they come back on every search hit. Your application layer pre-signs them with your own SDK before handing them to a user or downstream agent.
Search responses include these URIs in chunk.metadata. Pair them with parent_metadata (on segment hits) to resolve back to the source asset’s URIs without a second round trip.
Compass stores arbitrary metadata, but a small set of field names recurs across every sidecar producer we’ve integrated. Adopt these names so customer-side transforms converge and so future tooling (the planned POST /ingest/sidecar endpoint, the compass-ingest-recipes reference package) can recognize them by name.
Spatial annotations live in chunk metadata as four numeric fields plus a bbox_format discriminator on the source chunk so consumers know the coordinate convention.
The source chunk carries the format once for the whole asset:
Compass’s range filters work on each coordinate as a plain numeric field, so queries like “which segments have the active speaker on the left half of the frame” reduce to {"bbox_x2": {"lte": 960}}.
AI-generated annotations should carry their model and prompt identity. This lets agents filter by trust (“only show me annotations from gemini-3.5-flash produced after 2026-05-01”) and makes evaluation runs comparable across prompt iterations.
These fields are advisory, not required. Compass treats them like any other string metadata, which means standard filter operators apply: {"source_model": {"in": ["gemini-3.5-flash", "gemini-4-pro"]}} is a valid filter.
Whatever you put in a chunk’s text field gets embedded by the active vector space and indexed for full-text search. For sidecar-driven workflows there are two patterns.
Quick: stringify the sidecar and pass it as text. Compass embeds the literal JSON. Recall is decent for short sidecars but the embedding model spends tokens on {, ", and , syntax instead of actual content.
Better: flatten to prose in your ingest pipeline, then embed. Walk the sidecar JSON, produce a short natural-language description, and pass that as text. Keep the structured fields in metadata so you can still filter on them. About thirty lines of code in any language, and recall improves materially because the embedding model sees content instead of syntax.
Both patterns ship text to whatever embedding model your collection’s vector space uses (BGE-small by default, or your embed_endpoint for larger models). If you want full control over the embedding step, skip both and ship pre-computed vectors directly via the embeddings field on the chunk: a map of vector-space name to the vector for that space (for example "embeddings": { "qwen3-vl": [0.12, -0.34, ...] }).