Index URL | Captain Docs

import requests
BASE_URL = "https://api.runcaptain.com"
API_KEY = "your_api_key"
ORG_ID = "your_organization_id"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Organization-ID": ORG_ID,
    "Content-Type": "application/json"
}
# Single URL
response = requests.post(
    f"{BASE_URL}/v2/collections/my_documents/index/url",
    headers=headers,
    json={
        "url": "https://example.com/documents/report.pdf",
        "processing_type": "advanced"
    },
    timeout=60.0
)
# Multiple URLs
# response = requests.post(
#     f"{BASE_URL}/v2/collections/my_documents/index/url",
#     headers=headers,
#     json={
#         "urls": [
#             "https://example.com/report.pdf",
#             "https://example.com/memo.txt"
#         ],
#         "processing_type": "basic"
#     },
#     timeout=60.0
# )
if response.status_code in [200, 201]:
    data = response.json()
    print(f"Job started! ID: {data['job_id']}")
else:
    print(f"Error: {response.status_code}")

{
  "job_id": "job_url_abc123",
  "status": "pending"
}

Index documents from public URL(s) into a collection.

Accepts either a single url string or a urls array of strings pointing to hosted documents (PDF, TXT, DOCX, CSV, XLSX, etc.).

Documents are downloaded and processed through the same pipeline as cloud storage indexing.

Returns a job_id for tracking progress via GET /v2/jobs/{job_id}.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

X-Organization-IDstring

API Key authentication via header

Path parameters

collection_namestringRequired

Request

This endpoint expects an object.

processing_typeenumRequired

Document processing type. ‘advanced’ uses agentic OCR with AI-enhanced extraction for complex layouts, tables, figures, charts, and documents containing images. ‘basic’ provides reliable OCR optimized for general document indexing and high-volume processing.

Allowed values:

urlstringOptional

A single public URL to a hosted document (PDF, TXT, DOCX, etc.). Provide either ‘url’ or ‘urls’, not both.

urlslist of stringsOptional

An array of public URLs to hosted documents. Provide either 'url' or 'urls', not both.

custom_metadatamap from strings to anyOptional

Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.

Response

Indexing job started

job_idstring

statusenum

Allowed values: