Index URLs
Authentication
Path parameters
Headers
Request
Processing mode. For hosted documents: ‘advanced’ enables AI-enhanced extraction for complex layouts, tables, figures, and charts; ‘basic’ provides standard document processing. For web pages: ‘advanced’ extracts both text content and page images; ‘basic’ extracts text content only (faster, lower cost).
A single public URL to a document or web page. Hosted files (PDF, DOCX, etc.) are indexed directly. Web pages (HTML) are automatically scraped — text and images are extracted. Provide either ‘url’ or ‘urls’, not both.
An array of public URLs to documents or web pages. Each URL is auto-detected — hosted files are indexed directly, web pages are scraped. Provide either ‘url’ or ‘urls’, not both.
Custom metadata to attach to all indexed chunks. Keys must be strings. Values: str, int, float, bool, or array of strings.
Relative path to a JavaScript parsing script for JSON files (e.g. ‘research/paper-parser’). When provided, .json files are processed through a sandboxed V8 isolate that executes the script to extract text and metadata. Without this parameter, .json files are indexed as raw text. Scripts are org-scoped and managed in the Parser Studio.