Query - v3

POST

https://host.com/v3/collections/:collection_name/query

POST

/v3/collections/:collection_name/query

1 import requests
2 
3 url = "https://host.com/v3/collections/collection_name/query"
4 
5 payload = {
6     "query": "payment authorization requirements",
7     "limit": 3,
8     "filter": {
9         "metadata": { "pageStart": { "gte": 1 } },
10         "custom_metadata": { "policy_area": "payments" }
11     },
12     "rerank": False,
13     "include": {
14         "document": True,
15         "metadata": True,
16         "regions": True,
17         "relations": True,
18         "related_chunks": True
19     },
20     "relation_types": ["supporting_detail"],
21     "relation_direction": "both"
22 }
23 headers = {"Content-Type": "application/json"}
24 
25 response = requests.post(url, json=payload, headers=headers)
26 
27 print(response.json())

1 {
2   "query": "payment authorization requirements",
3   "results": [
4     {
5       "chunk_id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744:0",
6       "score": 0.09523809523809523,
7       "text": "# Payment authorization form\n\nI authorize the merchant to charge the card on file for approved purchases.\n\nAmount authorized: <empty>\nCardholder email: <empty>\nProduct or service: <empty>\n\n## Card information\n\nCard type:\n[ ] Visa\n[ ] Mastercard\n[ ] American Express\n[ ] Other <empty>\n\n## Recurring payments\n\nCharge frequency: <empty>\nPayment amount: <empty>\nCancellation terms: <empty>",
8       "modality": "pdf",
9       "match_sources": [
10         "content_embedding",
11         "keyword",
12         "ocr"
13       ],
14       "document": {
15         "id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744",
16         "filename": "payment-authorization-form.pdf",
17         "source": {
18           "type": "file",
19           "uri": "captain://org_example/users/user_example/collections/customer_forms/payment-authorization-form.pdf"
20         }
21       },
22       "location": {
23         "page_start": 1,
24         "page_end": 1,
25         "chunk_index": 0,
26         "parent_chunk_index": 0
27       },
28       "regions": [
29         {
30           "type": "title",
31           "text": "Payment authorization form",
32           "page": 1,
33           "bbox": {
34             "top": 0.0619,
35             "left": 0.0662,
36             "width": 0.3382,
37             "height": 0.0183
38           },
39           "confidence": "high"
40         },
41         {
42           "type": "key_value",
43           "text": "Amount authorized: <empty>\nCardholder email: <empty>\nProduct or service: <empty>",
44           "page": 1,
45           "bbox": {
46             "top": 0.2134,
47             "left": 0.0621,
48             "width": 0.8676,
49             "height": 0.0385
50           },
51           "confidence": "high"
52         },
53         {
54           "type": "section_header",
55           "text": "Card information",
56           "page": 1,
57           "bbox": {
58             "top": 0.3283,
59             "left": 0.098,
60             "width": 0.1536,
61             "height": 0.012
62           },
63           "confidence": "high"
64         }
65       ],
66       "metadata": {
67         "vectorScore": 0.42747492,
68         "bm25Score": 2.2132137,
69         "rrfScore": 0.09523809523809523,
70         "pageStart": 1,
71         "pageEnd": 1
72       },
73       "custom_metadata": {
74         "document_type": "authorization_form",
75         "review_status": "approved"
76       },
77       "relations": [
78         {
79           "relation_id": "019ef651-2c1b-74d2-b2c9-example001",
80           "source_chunk_id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744:0",
81           "target_chunk_id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744:1",
82           "relation_type": "supporting_detail",
83           "target_status": "found",
84           "metadata": {
85             "note": "Pair the authorization language with the signature block."
86           },
87           "created_at": "2026-06-23T21:09:35.388273Z",
88           "updated_at": "2026-06-23T21:09:35.388273Z"
89         }
90       ],
91       "related_chunks": [
92         {
93           "chunk_id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744:1",
94           "document_id": "f4a91c7e8b2d4a0c9e6f13b5a8d02744",
95           "text": "Customer signature: <empty>\nDate: <empty>",
96           "location": {
97             "page_start": 1,
98             "page_end": 1,
99             "chunk_index": 1,
100             "parent_chunk_index": 0
101           },
102           "metadata": {
103             "source": "vector",
104             "modality": "text",
105             "vectorScore": 0.50538486,
106             "rrfScore": 0.045454545454545456,
107             "pageStart": 1,
108             "pageEnd": 1,
109             "category": "signature_block"
110           }
111         }
112       ]
113     },
114     {
115       "chunk_id": "a1c3e5f7b9d24f6082ab46ce9d135790:0",
116       "score": 0.5,
117       "text": "[Video: product-demo.mp4, 0s-25s]\nThe clip shows a presenter explaining how payment authorization works. Visual elements include an approval checklist, a diagram of stored payment credentials, and a short walkthrough of recurring billing controls.",
118       "modality": "video",
119       "match_sources": [
120         "keyword"
121       ],
122       "document": {
123         "id": "a1c3e5f7b9d24f6082ab46ce9d135790",
124         "filename": "product-demo.mp4",
125         "source": {
126           "type": "file",
127           "uri": "s3://example-video-eval/payment/product-demo.mp4",
128           "mime_type": "video/mp4"
129         }
130       },
131       "location": {
132         "chunk_index": 0
133       },
134       "metadata": {
135         "bm25Score": 9.5263195,
136         "rrfScore": 0.047619047619047616,
137         "rerankScore": 0.75390625,
138         "crossModalRrfScore": 0.5,
139         "rerankOrigin": "text"
140       },
141       "rerank_score": 0.75390625,
142       "relations": [
143         {
144           "relation_id": "relation_id",
145           "source_chunk_id": "source_chunk_id",
146           "target_chunk_id": "target_chunk_id",
147           "relation_type": "relation_type"
148         }
149       ],
150       "related_chunks": [
151         {
152           "chunk_id": "chunk_id"
153         }
154       ]
155     },
156     {
157       "chunk_id": "c9e8a7b6d5f4432190acbe1782345678:3",
158       "score": 0.333333,
159       "text": "The spreadsheet row lists authorization category, required approval, renewal cadence, and current policy owner for recurring payments.",
160       "modality": "spreadsheet",
161       "match_sources": [
162         "table",
163         "metadata"
164       ],
165       "document": {
166         "id": "c9e8a7b6d5f4432190acbe1782345678",
167         "filename": "billing-policy-controls.xlsx",
168         "source": {
169           "type": "file",
170           "uri": "captain://org_example/users/user_example/collections/policies/billing-policy-controls.xlsx",
171           "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
172         }
173       },
174       "location": {
175         "chunk_index": 3,
176         "sheet_name": "authorization_controls",
177         "row_start": 18,
178         "row_end": 22,
179         "col_start": 1,
180         "col_end": 6
181       },
182       "metadata": {
183         "vectorScore": 0.38133752,
184         "bm25Score": 4.10291,
185         "rrfScore": 0.333333,
186         "sheetName": "authorization_controls",
187         "rowStart": 18,
188         "rowEnd": 22,
189         "colStart": 1,
190         "colEnd": 6,
191         "columns": [
192           "policy",
193           "approval_required",
194           "renewal_cadence",
195           "owner"
196         ]
197       },
198       "custom_metadata": {
199         "policy_area": "payments"
200       },
201       "rerank_score": 0.7109375,
202       "relations": [
203         {
204           "relation_id": "relation_id",
205           "source_chunk_id": "source_chunk_id",
206           "target_chunk_id": "target_chunk_id",
207           "relation_type": "relation_type"
208         }
209       ],
210       "related_chunks": [
211         {
212           "chunk_id": "chunk_id"
213         }
214       ]
215     }
216   ],
217   "total_results": 3,
218   "limit": 3,
219   "rerank": {
220     "used": true,
221     "reason": "required_for_multimodal"
222   },
223   "warnings": [
224     "Reranking cannot be disabled for collections that contain multimodal content. Reranking was applied."
225   ],
226   "execution_time_ms": 2427,
227   "request_id": "req_20260623_example001"
228 }

Search indexed files and return source chunks with optional document, metadata, region (bounding boxes), relation, and related chunk context.

Compared with v2: this response uses results[].text, supports include controls for document, chunk, region, and relation context, and returns structured rerank details.

Result identity: results[].chunk_id is the stable identifier for a retrieved chunk. Use it when reading chunk details, updating chunk metadata, creating chunk relations, or storing a reference to a search result. Chunk IDs include the parent document identifier and the chunk index.

Scores: score is the final retrieval score for the result. metadata.vectorScore, metadata.bm25Score, metadata.rrfScore, and metadata.crossModalRrfScore are ranking signals used to produce the final result order. These values are useful for debugging retrieval behavior and should not be compared across unrelated collections.

Modality and match sources: modality describes the indexed content type that matched the query, such as pdf, document, image, video, spreadsheet, or text. match_sources lists the retrieval signals that contributed to the match, such as content_embedding, keyword, ocr, table, transcript, metadata, or summary.

Document and source: document identifies the parent file for the chunk. document.source.type describes how the file was ingested, and document.source.uri identifies the original file location or Captain-managed file URI.

Location: location identifies where the chunk appears inside the source file. PDF and document results use page fields, media results use time fields, and spreadsheet results use sheet, row, and column fields. Fields that do not apply are returned as null.

Regions: regions contains extracted layout regions when requested. Regions are used for OCR text, form fields, headings, tables, charts, and image areas. Bounding boxes are relative to the rendered page or image. The origin is the top-left corner. To draw a region, set x = left * renderedWidth, y = top * renderedHeight, width = width * renderedWidth, and height = height * renderedHeight.

Media: media contains media-specific context for audio and video results, such as transcript data for the retrieved segment. If no media-specific context is returned, the field is null.

Metadata: metadata contains Captain-generated fields about retrieval, ranking, source location, and indexed content. custom_metadata contains metadata supplied by your application during indexing or through metadata update endpoints.

Reranking: rerank_score is the score assigned by the reranker for this result. The top-level rerank object reports whether reranking was applied and why. Multimodal collections require reranking to combine text and non-text results into one ranked list.

Relations: relations contains graph edges connected to the retrieved chunk. related_chunks contains the linked chunks when relation context is requested, including their text, location, and metadata.

Search indexed files and return source chunks with optional document, metadata, region (bounding boxes), relation, and related chunk context. **Compared with v2:** this response uses `results[].text`, supports include controls for document, chunk, region, and relation context, and returns structured rerank details. **Result identity:** `results[].chunk_id` is the stable identifier for a retrieved chunk. Use it when reading chunk details, updating chunk metadata, creating chunk relations, or storing a reference to a search result. Chunk IDs include the parent document identifier and the chunk index. **Scores:** `score` is the final retrieval score for the result. `metadata.vectorScore`, `metadata.bm25Score`, `metadata.rrfScore`, and `metadata.crossModalRrfScore` are ranking signals used to produce the final result order. These values are useful for debugging retrieval behavior and should not be compared across unrelated collections. **Modality and match sources:** `modality` describes the indexed content type that matched the query, such as `pdf`, `document`, `image`, `video`, `spreadsheet`, or `text`. `match_sources` lists the retrieval signals that contributed to the match, such as `content_embedding`, `keyword`, `ocr`, `table`, `transcript`, `metadata`, or `summary`. **Document and source:** `document` identifies the parent file for the chunk. `document.source.type` describes how the file was ingested, and `document.source.uri` identifies the original file location or Captain-managed file URI. **Location:** `location` identifies where the chunk appears inside the source file. PDF and document results use page fields, media results use time fields, and spreadsheet results use sheet, row, and column fields. Fields that do not apply are returned as `null`. **Regions:** `regions` contains extracted layout regions when requested. Regions are used for OCR text, form fields, headings, tables, charts, and image areas. Bounding boxes are relative to the rendered page or image. The origin is the top-left corner. To draw a region, set `x = left * renderedWidth`, `y = top * renderedHeight`, `width = width * renderedWidth`, and `height = height * renderedHeight`. **Media:** `media` contains media-specific context for audio and video results, such as transcript data for the retrieved segment. If no media-specific context is returned, the field is `null`. **Metadata:** `metadata` contains Captain-generated fields about retrieval, ranking, source location, and indexed content. `custom_metadata` contains metadata supplied by your application during indexing or through metadata update endpoints. **Reranking:** `rerank_score` is the score assigned by the reranker for this result. The top-level `rerank` object reports whether reranking was applied and why. Multimodal collections require reranking to combine text and non-text results into one ranked list. **Relations:** `relations` contains graph edges connected to the retrieved chunk. `related_chunks` contains the linked chunks when relation context is requested, including their text, location, and metadata.

Path parameters

collection_namestringRequired

Request

This endpoint expects an object.

querystringRequired

limitintegerOptional1-100Defaults to 10

filtermap from strings to anyOptional

rerankbooleanOptionalDefaults to false

includeobjectOptional

relation_typeslist of stringsOptional

relation_directionenumOptionalDefaults to outgoing

Allowed values:

Response

Query response.

querystring

resultslist of objects

total_resultsinteger

limitinteger

rerankobject

warningslist of strings

execution_time_msinteger

request_idstring

Errors

400

Query Collection V3request Bad Request Error

401

Query Collection V3request Unauthorized Error

403

Query Collection V3request Forbidden Error

404

Query Collection V3request Not Found Error

Search indexed files and return source chunks with optional document, metadata, region (bounding boxes), relation, and related chunk context.

Compared with v2: this response uses results[].text, supports include controls for document, chunk, region, and relation context, and returns structured rerank details.

Media: media contains media-specific context for audio and video results, such as transcript data for the retrieved segment. If no media-specific context is returned, the field is null.