Index R2 File
Index a single file from a Cloudflare R2 bucket into a collection. Returns a job_id for tracking progress.
Authentication
Path parameters
Headers
Request
R2 URI format: r2://bucket-name/path/to/file.pdf
Cloudflare account ID (found in your R2 dashboard URL)
Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.
R2 jurisdiction. ‘default’ for global, ‘eu’ for EU-only storage, ‘fedramp’ for FedRAMP-compliant storage.
Custom metadata to attach to all chunks from this file. Keys must be strings. Values: str, int, float, bool, or array of strings.
Relative path to a JavaScript parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate that executes the script to extract text and metadata. Without this parameter, .json files are indexed as raw text. Scripts are org-scoped and managed in the Parser Studio.
When true, files already indexed in the collection are skipped and will not be re-indexed with incoming changes. When false, all incoming files are indexed regardless of whether they already exist.
When true, files that already exist in the collection will be deleted and re-indexed with the latest changes. Requires skip_existing=false. Setting both to true returns a 400 error.