For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
API StudioContact Support
GuidesAPI ReferenceChangelog
GuidesAPI ReferenceChangelog
  • Get Started
    • Introduction
    • Quickstart
    • Connect Cloud Storage
    • S3 Cross-Account IAM
    • Multimodal Search
    • Metadata Filtering
    • Parsers
    • Plugins
  • Compass
    • Overview
    • Quickstart
    • Multiple Embedding Models
    • Filters and Recency Boosts
    • TAMS and Time-Range Search
    • Upgrading Models
  • Integrations
    • Overview
  • Odyssey
    • Private Markets
    • Live Feeds & Alerts
LogoLogo
API StudioContact Support
On this page
  • What TAMS is
  • Why it matters for agents
  • Ingesting a TAMS hierarchy
  • Searching by content and time window
  • Relationship boost
  • Point-in-time lookup
  • Instants (zero-duration events)
  • Filter reference
  • Linking to source assets (GCS, S3, etc.)
  • Standard sidecar metadata fields
  • Bounding boxes
  • Provenance metadata
  • Embedding JSON sidecars
Compass

TAMS and Time-Range Search

Was this page helpful?
Previous

Upgrading Models

Next
Built with

What TAMS is

TAMS (Time-Addressable Media Store) is BBC R&D’s open spec for media archives. The core idea: media is addressed by time, not by file path. Content is organized as a three-level hierarchy. A Source is a logical piece of content (a match, a show, an asset). A Flow is one specific rendition of that source (720p H.264 video, or 48kHz stereo audio). A Segment is a time-bounded chunk of a flow, with a timerange_start_ms and timerange_end_ms in integer milliseconds.

Compass models this hierarchy using doc_type, parent_ref, and group_id. You control the structure at ingest time.

Time in Compass is always integer milliseconds. Every modern sidecar producer (Gemini Media Understanding, Deepgram, ffmpeg scene-detection) emits ms, so this is the lingua franca. If your data is in seconds, multiply by 1000 in your ingest helper before sending.

Why it matters for agents

Your agent asks “what’s at second 47” more often than it runs open-ended similarity search. Your agent retrieving bounding box positions, active speakers, or shot types needs to pin results to exact time windows before ranking by relevance. Without time-range filters, every query scans the whole collection.

TAMS search in Compass works in two steps: filter to the segments in the time window, then rank those segments by query relevance. The filter step uses Compass’s precomputed bitset facets and runs in microseconds.

The examples below send Authorization: Bearer $COMPASS_API_KEY. Set COMPASS_BASE_URL and COMPASS_API_KEY in your shell first (see the quickstart).

Ingesting a TAMS hierarchy

Use client_id and parent_ref to link chunks within a single batch. group_id groups all segments that belong to the same source:

$curl -X POST $COMPASS_BASE_URL/collections/media/ingest \
> -H "Authorization: Bearer $COMPASS_API_KEY" \
> -H 'Content-Type: application/json' \
> -d '{
> "chunks": [
> {
> "client_id": "src-001",
> "file_id": "video-001",
> "chunk_index": 0,
> "doc_type": "source",
> "text": "Premier League: Arsenal vs Chelsea",
> "metadata": {
> "asset_type": "video",
> "created_at": "2026-03-15T15:00:00Z"
> }
> },
> {
> "client_id": "seg-001",
> "file_id": "segment-001",
> "chunk_index": 0,
> "doc_type": "segment",
> "parent_ref": "src-001",
> "group_id": "src-001",
> "text": "Goal celebration, minute 34",
> "metadata": {
> "timerange_start_ms": 2040000,
> "timerange_end_ms": 2055000,
> "scene_type": "goal",
> "active_speaker": null,
> "shot_type": "wide"
> }
> },
> {
> "client_id": "seg-002",
> "file_id": "segment-002",
> "chunk_index": 1,
> "doc_type": "segment",
> "parent_ref": "src-001",
> "group_id": "src-001",
> "text": "Penalty kick setup, minute 38",
> "metadata": {
> "timerange_start_ms": 2280000,
> "timerange_end_ms": 2295000,
> "scene_type": "penalty",
> "shot_type": "close-up"
> }
> }
> ]
> }'

timerange_start_ms and timerange_end_ms are integer milliseconds. Do not pass seconds (e.g. 2040.0) or ISO date strings (e.g. "2026-03-15T00:34:00Z"). The filter engine treats them as plain numeric fields. Passing the wrong unit will produce no results when you later filter or run a /segments/at lookup.

Searching by content and time window

Filter to segments in a specific time range, then rank by query:

$curl -X POST $COMPASS_BASE_URL/collections/media/search \
> -H "Authorization: Bearer $COMPASS_API_KEY" \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "goal celebration",
> "filters": {
> "doc_type": { "in": ["segment"] },
> "timerange_start_ms": { "gte": 2040000 },
> "timerange_end_ms": { "lte": 2100000 }
> },
> "relationship_boost": {
> "parent_weight": 0.3,
> "sibling_weight": 0.1,
> "mode": "max"
> },
> "top_k": 5
> }'

This returns segments matching “goal celebration” within the 2,040,000-2,100,000 ms window (about minute 34 to minute 35 of the asset). relationship_boost gives a score bump to segments whose parent source also matches, and to sibling segments in the same group.

Relationship boost

relationship_boost surfaces context around the matching segment. Fields:

FieldDescription
parent_weightScore multiplier added when the parent source matches the query (0.0-1.0).
sibling_weightScore multiplier added when a sibling segment in the same group_id also matched (0.0-1.0).
mode"max": take the higher of parent and sibling boost. "sum": add them.

A parent_weight of 0.3 means: if the segment’s parent source also scored well, add 30% to this segment’s score. This surfaces segments that are doubly relevant: the specific moment matches, and the broader context matches too.

Point-in-time lookup

Available in v0.3.

To retrieve segments that contain a specific timestamp on a known asset, use the /segments/at endpoint:

$# Point lookup: all segments on src-001 that cover ms 47000 (about second 47)
$curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001&time_ms=47000" \
> -H "Authorization: Bearer $COMPASS_API_KEY"
$
$# Range lookup: all segments on src-001 overlapping [2,040,000 ms, 2,100,000 ms]
$curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001&time_start_ms=2040000&time_end_ms=2100000" \
> -H "Authorization: Bearer $COMPASS_API_KEY"
$
$# Enumeration: all segments on src-001 (no time filter)
$curl "$COMPASS_BASE_URL/collections/media/segments/at?asset=src-001" \
> -H "Authorization: Bearer $COMPASS_API_KEY"

asset is required. It matches the segment’s group_id (which in the TAMS model equals the source’s client_id). time_ms, time_start_ms, and time_end_ms are all optional. If time_ms is set it takes precedence and returns segments where timerange_start_ms <= time_ms <= timerange_end_ms. Otherwise the range parameters define an overlap window: a segment matches if timerange_start_ms <= time_end_ms and timerange_end_ms >= time_start_ms.

The response includes the full chunks sorted ascending by timerange_start_ms, plus a took_ms field:

1{
2 "results": [
3 {
4 "id": 42,
5 "doc_type": "segment",
6 "group_id": "src-001",
7 "text": "Goal celebration, minute 34",
8 "metadata": {
9 "timerange_start_ms": 2040000,
10 "timerange_end_ms": 2055000,
11 "scene_type": "goal",
12 "shot_type": "wide"
13 }
14 }
15 ],
16 "took_ms": 2.4
17}

No query string is required, no vector scoring runs. Useful when your agent has an exact timestamp on an asset and needs the metadata for that moment (bounding boxes, active speaker, shot type) without running a similarity search.

timerange_start_ms, timerange_end_ms, time_ms, time_start_ms, and time_end_ms are all integer milliseconds. Do not use seconds or ISO date strings. Passing the wrong unit will return no results.

Instants (zero-duration events)

Sidecar fields like gemini.response.standout_timestamps[] or per-frame events carry a single timestamp with no end. Convention: store them as segments where timerange_start_ms and timerange_end_ms are the same value. A point lookup ?time_ms=T matches the instant when T equals that timestamp; range queries that overlap that millisecond also match.

1{
2 "client_id": "standout-001",
3 "doc_type": "segment",
4 "parent_ref": "src-001",
5 "group_id": "src-001",
6 "text": "First clean shot of the AR lenses illuminating with blue light.",
7 "metadata": {
8 "timerange_start_ms": 5200,
9 "timerange_end_ms": 5200,
10 "event_type": "standout_timestamp"
11 }
12}

This lets the same /segments/at endpoint serve both interval queries (“what was happening between minute 34 and 35?”) and exact-frame queries (“what was tagged at ms 5200 exactly?”) without a separate API.

Filter reference

These filters operators work on timerange_start_ms and timerange_end_ms, and on any numeric metadata field:

OperatorSyntaxExample
Exact match"field": value"scene_type": "goal"
Greater than or equal"field": { "gte": n }"timerange_start_ms": { "gte": 2040000 }
Less than or equal"field": { "lte": n }"timerange_end_ms": { "lte": 2100000 }
Array contains"field": { "contains": value }"tags": { "contains": "sports" }
Set membership"field": { "in": [...] }"doc_type": { "in": ["segment"] }

Multiple filters combine as AND. A segment must match every filter to be scored.

Linking to source assets (GCS, S3, etc.)

Compass doesn’t fetch from cloud storage, sign URIs, or hold storage credentials. It treats URIs as opaque strings on the chunk’s metadata block. Put your gs://, s3://, or https:// URIs in metadata at ingest time, and they come back on every search hit. Your application layer pre-signs them with your own SDK before handing them to a user or downstream agent.

$curl -X POST $COMPASS_BASE_URL/collections/media/ingest \
> -H "Authorization: Bearer $COMPASS_API_KEY" \
> -H 'Content-Type: application/json' \
> -d '{
> "chunks": [
> {
> "client_id": "src-001",
> "doc_type": "source",
> "text": "Premier League: Arsenal vs Chelsea",
> "metadata": {
> "asset_type": "video",
> "gcs_video_uri": "gs://customer-bucket/match-001.mp4",
> "gcs_sidecar_uri": "gs://customer-bucket/match-001.json"
> }
> },
> {
> "client_id": "seg-001",
> "doc_type": "segment",
> "parent_ref": "src-001",
> "group_id": "src-001",
> "text": "Goal celebration, minute 34",
> "metadata": {
> "timerange_start_ms": 2040000,
> "timerange_end_ms": 2055000,
> "thumbnail_uri": "gs://customer-bucket/match-001/thumbs/2040.jpg"
> }
> }
> ]
> }'

Search responses include these URIs in chunk.metadata. Pair them with parent_metadata (on segment hits) to resolve back to the source asset’s URIs without a second round trip.

Standard sidecar metadata fields

Compass stores arbitrary metadata, but a small set of field names recurs across every sidecar producer we’ve integrated. Adopt these names so customer-side transforms converge and so future tooling (the planned POST /ingest/sidecar endpoint, the compass-ingest-recipes reference package) can recognize them by name.

FieldTypeWhere it livesMeaning
asset_uristringsource chunkPrimary URI of the asset (gs://, s3://, https://)
source_asset_uristringsource chunkRaw / pre-processed URI when the asset was transformed
mime_typestringsource chunkvideo/mp4, audio/mpeg, etc.
has_audio, has_videoboolsource chunkModality flags
duration_msintsource chunkAsset duration in milliseconds
width, heightintsource chunkFrame dimensions for video / image
fpsfloatsource chunkFrame rate when applicable
timerange_start_ms, timerange_end_msintsegment chunksInteger milliseconds. Equal values denote an instant.
bbox_formatstringsource chunk or per-segmentpixel_xyxy, normalized_xyxy, pixel_xywh
bbox_x1, bbox_y1, bbox_x2, bbox_y2floatsegment chunksBounding box corners, interpreted per bbox_format
source_modelstringany AI-generated chunkModel that produced the annotation (gemini-3.5-flash, etc.)
source_generated_atstringany AI-generated chunkISO 8601 timestamp of generation
source_prompt_versionstringany AI-generated chunkPrompt template version for reproducibility

Bounding boxes

Spatial annotations live in chunk metadata as four numeric fields plus a bbox_format discriminator on the source chunk so consumers know the coordinate convention.

1{
2 "client_id": "seg-speaker-3",
3 "doc_type": "segment",
4 "parent_ref": "src-001",
5 "group_id": "src-001",
6 "text": "Speaker 3 active, mid-shot, frame-left.",
7 "metadata": {
8 "timerange_start_ms": 23120,
9 "timerange_end_ms": 28960,
10 "track_id": 0,
11 "active_speaker": true,
12 "bbox_x1": 541,
13 "bbox_y1": 101,
14 "bbox_x2": 643,
15 "bbox_y2": 229
16 }
17}

The source chunk carries the format once for the whole asset:

1{
2 "client_id": "src-001",
3 "doc_type": "source",
4 "metadata": {
5 "bbox_format": "pixel_xyxy",
6 "width": 1920,
7 "height": 1080
8 }
9}

Compass’s range filters work on each coordinate as a plain numeric field, so queries like “which segments have the active speaker on the left half of the frame” reduce to {"bbox_x2": {"lte": 960}}.

Provenance metadata

AI-generated annotations should carry their model and prompt identity. This lets agents filter by trust (“only show me annotations from gemini-3.5-flash produced after 2026-05-01”) and makes evaluation runs comparable across prompt iterations.

1{
2 "metadata": {
3 "source_model": "gemini-3.5-flash",
4 "source_generated_at": "2026-05-20T17:41:41Z",
5 "source_prompt_version": "media_understanding_v2"
6 }
7}

These fields are advisory, not required. Compass treats them like any other string metadata, which means standard filter operators apply: {"source_model": {"in": ["gemini-3.5-flash", "gemini-4-pro"]}} is a valid filter.

Embedding JSON sidecars

Whatever you put in a chunk’s text field gets embedded by the active vector space and indexed for full-text search. For sidecar-driven workflows there are two patterns.

Quick: stringify the sidecar and pass it as text. Compass embeds the literal JSON. Recall is decent for short sidecars but the embedding model spends tokens on {, ", and , syntax instead of actual content.

1{
2 "text": "{\"shot_type\":\"wide\",\"active_speaker\":\"commentator\",\"scene\":\"goal celebration\"}",
3 "metadata": { "shot_type": "wide", "active_speaker": "commentator" }
4}

Better: flatten to prose in your ingest pipeline, then embed. Walk the sidecar JSON, produce a short natural-language description, and pass that as text. Keep the structured fields in metadata so you can still filter on them. About thirty lines of code in any language, and recall improves materially because the embedding model sees content instead of syntax.

1{
2 "text": "Scene: goal celebration. Active speaker: commentator. Shot type: wide. Lighting: harsh sunlight.",
3 "metadata": {
4 "shot_type": "wide",
5 "active_speaker": "commentator",
6 "scene_type": "goal",
7 "lighting": "harsh sunlight"
8 }
9}

Both patterns ship text to whatever embedding model your collection’s vector space uses (BGE-small by default, or your embed_endpoint for larger models). If you want full control over the embedding step, skip both and ship pre-computed vectors directly via the embeddings field on the chunk: a map of vector-space name to the vector for that space (for example "embeddings": { "qwen3-vl": [0.12, -0.34, ...] }).