Index GCS Directory | Captain Docs

Index all files from a specific directory in a GCS bucket into a collection. Uses prefix-based filtering to index only files within the specified directory path. Returns a job_id for tracking progress via GET /v2/jobs/{job_id}.

Path parameters

collection_namestringRequired

Name of the collection to index into

Request

This endpoint expects an object.

bucket_namestringRequired

Name of the GCS bucket

directory_pathstringRequired

Path to the directory within the bucket. Accepts either a relative path (e.g., ‘reports/2024/january’) or a full GCS URI (e.g., ‘gs://my-bucket/reports/2024/january’). All files within this directory and its subdirectories will be indexed.

service_account_jsonstringRequired

GCP service account JSON key with read access to the bucket

processing_typeenumRequired

Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.

Allowed values:

max_filesintegerOptional

Maximum number of files to index (optional)

skip_existingbooleanOptionalDefaults to true

Skip files that are already indexed in the collection. When true, only new files will be indexed. Set to false to re-index all files.

custom_metadatamap from strings to anyOptional

Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.

Response

Indexing Job Started

job_idstring

statusenum

Allowed values:

1	curl -X POST https://api.runcaptain.com/v2/collections/my_documents/index/gcs/directory \
2	-H "Content-Type: application/json" \
3	-d '{
4	"bucket_name": "my-gcs-bucket",
5	"directory_path": "reports/2024/january",
6	"service_account_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
7	"processing_type": "advanced"
8	}'

Path parameters

Headers

Request

Response