Index S3 File
Index a single file from an S3 bucket into a collection. Returns a job_id for tracking progress.
Authentication
Path parameters
Headers
Request
S3 URI format: s3://bucket-name/path/to/file.pdf
Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.
AWS access key ID with read access to the bucket. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.
AWS secret access key. Use this for long-lived IAM-user credentials. Omit when using the role-based ‘auth’ block.
Cross-account role-assumption auth. When provided, Captain calls sts:AssumeRole on the supplied role_arn instead of using static IAM-user keys. Mutually exclusive with aws_access_key_id/aws_secret_access_key.
Custom metadata to attach to all chunks from this file. Keys must be strings. Values: str, int, float, bool, or array of strings.
Relative path to a JavaScript parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate that executes the script to extract text and metadata. Without this parameter, .json files are indexed as raw text. Scripts are org-scoped and managed in the Parser Studio.
When true, files already indexed in the collection are skipped and will not be re-indexed with incoming changes. When false, all incoming files are indexed regardless of whether they already exist.
When true, files that already exist in the collection will be deleted and re-indexed with the latest changes. Requires skip_existing=false. Setting both to true returns a 400 error.