For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
API StudioContact Support
GuidesAPI ReferenceChangelog
GuidesAPI ReferenceChangelog
  • API Reference
      • GETList Collections
      • PUTCreate Collection
      • DELDelete Collection
      • PATCHChange Collection Environment
      • GETList Documents
      • DELWipe Collection Documents
      • GETGet Document
      • DELDelete Document
LogoLogo
API StudioContact Support
API Referencecollections

Get Document

GET
https://api.runcaptain.com/v2/collections/:collection_name/documents/:document_id
GET
/v2/collections/:collection_name/documents/:document_id
1import requests
2import json
3
4BASE_URL = "https://api.runcaptain.com"
5API_KEY = "your_api_key"
6COLLECTION_NAME = "my_documents"
7DOCUMENT_ID = "doc_abc123"
8
9headers = {
10 "Authorization": f"Bearer {API_KEY}",
11}
12
13# Set include_bbox=true to also return normalized element layout (bounding boxes)
14response = requests.get(
15 f"{BASE_URL}/v2/collections/{COLLECTION_NAME}/documents/{DOCUMENT_ID}",
16 params={"include_bbox": True},
17 headers=headers,
18 timeout=30.0
19)
20
21if response.status_code == 200:
22 data = response.json()
23 print(f"Document: {data['metadata']['filename']}")
24 print(f"Chunks: {data['chunk_count']}")
25 for chunk in data.get("chunks", []):
26 print(f" [{chunk['chunk_index']}] {chunk['content'][:80]}")
27else:
28 print(f"Error: {response.status_code}")
200Retrieved
1{
2 "document_id": "doc_abc123",
3 "collection_name": "my_documents",
4 "chunk_count": 2,
5 "metadata": {
6 "filename": "contract.pdf",
7 "file_type": "pdf",
8 "file_size": 248173,
9 "mime_type": "application/pdf",
10 "uri": "captain://your-org/collections/my_documents/contract.pdf",
11 "summary": "A master services agreement covering payment terms, liability, and termination.",
12 "tags": [
13 "contract",
14 "msa"
15 ],
16 "indexing_status": "completed",
17 "created_at": "2026-05-27T21:14:35.818123+00:00",
18 "indexed_at": "2026-05-27T21:15:55.294107+00:00"
19 },
20 "chunks": [
21 {
22 "content": "This Master Services Agreement governs the terms between the parties...",
23 "chunk_index": 0,
24 "parent_chunk_index": null,
25 "page_start": 1,
26 "page_end": 1,
27 "tokens": 151,
28 "category": null,
29 "metadata": {
30 "file_tags": [
31 "contract",
32 "msa"
33 ]
34 }
35 }
36 ],
37 "ocr_data": {
38 "usage": {
39 "num_pages": 3,
40 "non_empty_cell_count": null
41 }
42 },
43 "bounding_boxes": [
44 {
45 "type": "section_header",
46 "content": "1. Payment Terms",
47 "page": 1,
48 "bbox": {
49 "top": 0.0812,
50 "left": 0.0901,
51 "width": 0.7204,
52 "height": 0.0223
53 },
54 "confidence": "high",
55 "image_url": null
56 }
57 ]
58}

Retrieve a single document by ID.

Returns the document’s metadata, all chunks from the Captain vector store, and OCR usage. Set include_bbox to true to also include bounding_boxes, the document’s normalized element layout with each element’s type, page, and bounding box coordinates.

Was this page helpful?
Previous

Delete Document

Next
Built with

Authentication

AuthorizationBearer
Bearer token authentication using API key

Path parameters

collection_namestringRequired
document_idstringRequired

Headers

X-Organization-IDstringOptional

Query parameters

include_bboxbooleanOptionalDefaults to false

When true, include normalized element layout (bounding boxes) under bounding_boxes. Omitted by default to keep the response lean.

Response

Successful Response
document_idstring
Unique identifier for the document
collection_namestring
Name of the collection the document belongs to
chunk_countinteger
Number of chunks returned for the document
metadataobject
chunkslist of objects

All chunks for the document, ordered by chunk_index

ocr_datamap from strings to any or null

Optional. OCR usage information (e.g. num_pages) when available; null otherwise. Internal provider/billing fields are not included.

bounding_boxeslist of objects

Normalized element layout. Present only when include_bbox=true; null when the document’s parse was stored as a URL reference rather than inline blocks.