Index GCS Directory
Index all files from a specific directory in a GCS bucket into a collection. Uses prefix-based filtering to index only files within the specified directory path. Returns a job_id for tracking progress via GET /v2/jobs/{job_id}.
Authentication
Path parameters
Headers
Request
Path to the directory within the bucket. Accepts either a relative path (e.g., ‘reports/2024/january’) or a full GCS URI (e.g., ‘gs://my-bucket/reports/2024/january’). All files within this directory and its subdirectories will be indexed.
Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.
Maximum number of files to index (optional)
When true, files already indexed in the collection are skipped and will not be re-indexed with incoming changes. When false, all incoming files are indexed regardless of whether they already exist.
Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.
Relative path to a JavaScript parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate that executes the script to extract text and metadata. Without this parameter, .json files are indexed as raw text. Scripts are org-scoped and managed in the Parser Studio.
When true, files that already exist in the collection will be deleted and re-indexed with the latest changes. Requires skip_existing=false. setting both to true returns a 400 error.