Index R2 Directory | Captain Docs

Index all files from a specific directory (prefix) in a Cloudflare R2 bucket into a collection. Uses prefix-based filtering to index only objects within the specified path. Returns a job_id for tracking progress via GET /v2/jobs/{job_id}.

Authentication

AuthorizationBearer

Bearer token authentication using API key

Path parameters

collection_namestringRequired

Request

This endpoint expects an object.

bucket_namestringRequired

Name of the R2 bucket

directory_pathstringRequired

Path to the directory (prefix) within the bucket. Accepts either a relative path (e.g., ‘reports/2024/january’) or a full R2 URI (e.g., ‘r2://my-bucket/reports/2024/january’). All objects within this prefix will be indexed.

account_idstringRequired

Cloudflare account ID (found in your R2 dashboard URL)

access_key_idstringRequired

R2 S3 API token Access Key ID

secret_access_keystringRequired

R2 S3 API token Secret Access Key

processing_typeenumRequired

Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.

Allowed values:

jurisdictionenumOptionalDefaults to default

R2 jurisdiction. ‘default’ for global, ‘eu’ for EU-only storage, ‘fedramp’ for FedRAMP-compliant storage.

Allowed values:

max_filesintegerOptional

Maximum number of files to index (optional)

skip_existingbooleanOptionalDefaults to true

Skip files that are already indexed in the collection. When true, only new files will be indexed. Set to false to re-index all files.

custom_metadatamap from strings to anyOptional

Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.

Response

Indexing Job Started

job_idstring

statusenum

Allowed values:

1	import requests
2
3	BASE_URL = "https://api.runcaptain.com"
4	API_KEY = "your_api_key"
5	ORG_ID = "your_organization_id"
6
7	headers = {
8	"Authorization": f"Bearer {API_KEY}",
9	"X-Organization-ID": ORG_ID,
10	"Content-Type": "application/json"
11	}
12
13	response = requests.post(
14	f"{BASE_URL}/v2/collections/my_documents/index/r2/directory",
15	headers=headers,
16	json={
17	"bucket_name": "my-r2-bucket",
18	"directory_path": "reports/2025/",
19	"account_id": "your_cloudflare_account_id",
20	"access_key_id": "your_r2_access_key_id",
21	"secret_access_key": "your_r2_secret_access_key",
22	"processing_type": "advanced"
23	},
24	timeout=60.0
25	)
26
27	if response.status_code in [200, 201]:
28	data = response.json()
29	print(f"Job started! ID: {data['job_id']}")
30	else:
31	print(f"Error: {response.status_code}")

1	{
2	"job_id": "job_r2dir_abc123",
3	"status": "pending"
4	}

Authentication

Path parameters

Headers

Request

Response