Index S3 File

Index a single file from S3 bucket into a collection. Headers: - Authorization: Bearer {api_key} - Captain API key for authentication - X-Organization-ID: Organization UUID Args: collection_name: Name of the collection (path parameter) body: S3 file configuration with file_uri Returns: { job_id, status: "pending" }

Path parameters

collection_namestringRequired

Headers

authorizationstring or nullOptional

Request

This endpoint expects an object.
bucket_namestringRequired
file_uristringRequired
processing_typeenumRequired

Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.

authobject or nullOptional

Cross-account role-assumption auth. When provided, Captain calls sts:AssumeRole on the supplied role_arn (with the supplied external_id) instead of using static IAM-user keys. Mutually exclusive with aws_access_key_id/aws_secret_access_key.

aws_access_key_idstring or nullOptional

AWS access key ID with read access to the bucket. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.

aws_secret_access_keystring or nullOptional

AWS secret access key. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.

bucket_regionstringOptionalDefaults to us-east-1
custom_metadatamap from strings to strings or integers or doubles or booleans or lists of strings or nullOptional

Custom metadata to attach to all chunks from this file. Keys must be strings. Values: str, int, float, bool, or List[str].

overwrite_existingbooleanOptionalDefaults to false

When true, files that already exist in the collection will be deleted and re-indexed with the latest changes. Requires skip_existing=false. Setting both to true returns a 400 error.

parsing_scriptstring or nullOptional

Relative path to a JS parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate. Without this, .json files are indexed as raw text.

skip_existingbooleanOptionalDefaults to true

When true, files already indexed in the collection are skipped and will not be re-indexed with incoming changes. When false, all incoming files are indexed regardless of whether they already exist.

Response

Successful Response
job_idstring
statusstringDefaults to pending

Errors

400
Bad Request Error