Index S3 Bucket
Index all files from an S3 bucket into a collection. Returns a job_id for tracking progress via GET /v2/jobs/{job_id}.
Path parameters
collection_name
Name of the collection to index into
Headers
Authorization
Captain API key for authentication
X-Organization-ID
Organization UUID
Idempotency-Key
UUID for request deduplication
Request
This endpoint expects an object.
bucket_name
Name of the S3 bucket
aws_access_key_id
AWS access key ID with read access to the bucket
aws_secret_access_key
AWS secret access key
processing_type
Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.
Allowed values:
bucket_region
AWS region where the bucket is located
max_files
Maximum number of files to index (optional)
skip_existing
Skip files that are already indexed in the collection. When true, only new files will be indexed. Set to false to re-index all files.
custom_metadata
Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.
Response
Indexing Job Started
job_id
status
Allowed values: