Query Collection
Authentication
Path parameters
Headers
Request
Enable LLM-generated answers based on the relevant sections retrieved. When false, returns raw search results.
Enable real-time streaming of the response
Number of results to return. Only valid when inference=false. Not supported when inference=true (the agent controls its own search strategy).
Enable reranking for improved relevance ordering. Uses Gemini Flash 2.5 by default, or Voyage AI rerank-2.5 as fallback. Adds ~100-300ms latency.
Filter expression for vector search. Supports: ne, gte, lte, nin, or
Custom system prompt to override the default RAG prompt when inference=true. Allows customizing how the LLM processes and responds to the query with the retrieved context.
Include normalized bounding box layout data for each search result. Returns element-level positions (titles, paragraphs, tables, figures, form fields) with page coordinates for PDF and DOCX files. Only supported with inference=false.
When inference=true, include the raw search result chunks that were used as context for the LLM response. Defaults to false. Always true when inference=false.
Response
Successful Response - returns JSON when stream: false, or SSE event stream when stream: true.
AI-generated summary/response (when inference=true)
Alias for summary (v1 compatibility)
Search results with chunk content, scores, and source URIs. Returned for both inference=true (the chunks used as context) and inference=false (raw search results).
Unique request identifier (used for streaming)
Streaming configuration (when stream=true)