Index S3 Bucket
Path parameters
Headers
Request
Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.
Cross-account role-assumption auth. When provided, Captain calls sts:AssumeRole on the supplied role_arn (with the supplied external_id) instead of using static IAM-user keys. No long-lived secrets cross the boundary; recommended for production. Mutually exclusive with aws_access_key_id/aws_secret_access_key.
AWS access key ID with read access to the bucket. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.
AWS secret access key. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.
Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or List[str].
When true, files that already exist in the collection will be deleted and re-indexed with the latest changes. Requires skip_existing=false. Setting both to true returns a 400 error.
Relative path to a JS parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate. Without this, .json files are indexed as raw text.
When true, files already indexed in the collection are skipped and will not be re-indexed with incoming changes. When false, all incoming files are indexed regardless of whether they already exist.