Index Url | Captain Docs

Index documents from public URL(s) into a collection.

Accepts either a single url string or a urls array of strings. Documents are downloaded from the provided URLs and processed through the same pipeline as cloud storage indexing.

Supported formats: PDF, TXT, DOCX, CSV, XLSX, and other document types.

Headers:

Authorization: Bearer {api_key} - Captain API key for authentication
X-Organization-ID: Organization UUID
Idempotency-Key: UUID for request deduplication (optional)

Args: collection_name: Name of the collection (path parameter) body: URL configuration with url or urls

Returns: { job_id, status: “pending” }

Index documents from public URL(s) into a collection. Accepts either a single `url` string or a `urls` array of strings. Documents are downloaded from the provided URLs and processed through the same pipeline as cloud storage indexing. Supported formats: PDF, TXT, DOCX, CSV, XLSX, and other document types. Headers: - Authorization: Bearer {api_key} - Captain API key for authentication - X-Organization-ID: Organization UUID - Idempotency-Key: UUID for request deduplication (optional) Args: collection_name: Name of the collection (path parameter) body: URL configuration with url or urls Returns: { job_id, status: "pending" }

Path parameters

collection_namestringRequired

Request

This endpoint expects an object.

processing_typeenumRequired

Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.

Allowed values:

custom_metadatamap from strings to optional strings or integers or doubles or booleans or lists of stringsOptional

Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or List[str].

parsing_scriptstringOptional

Relative path to a JS parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate. Without this, .json files are indexed as raw text.

urlstringOptional

A single public URL to a hosted document. Supported types: PDF, DOCX, DOC, XLSX, XLS, CSV, TSV, TXT, MD, JSON, YAML, YML, PNG, JPG, JPEG, GIF, BMP, TIFF. Provide either ‘url’ or ‘urls’, not both.

urlslist of stringsOptional

A list of public URLs to hosted documents. Provide either 'url' or 'urls', not both.

Response

Successful Response

job_idstring

statusstringDefaults to pending

1	import requests
2
3	url = "https://host.com/v2/collections/collection_name/index/url"
4
5	payload = { "processing_type": "advanced" }
6	headers = {"Content-Type": "application/json"}
7
8	response = requests.post(url, json=payload, headers=headers)
9
10	print(response.json())