Query Collection

Execute a natural language query against a collection. When `inference=true`, returns an AI-generated response with relevant documents. When `inference=false`, returns raw search results with content and metadata. ## Streaming (SSE) When `stream: true` and `inference: true`, the response is a Server-Sent Events stream. Every `data:` field is a JSON object with a `type` discriminator. ### SSE Event Types | `type` value | Schema | Description | |---|---|---| | `text.delta` | `QueryStreamTextEvent` | Incremental text chunk of the AI response. | | `tool.start` | `QueryStreamToolStartEvent` | The agent is performing a knowledge-base search. | | `tool.end` | `QueryStreamToolEndEvent` | A tool call completed. `tool_call_id` correlates with the preceding `tool.start`. | | `stream_complete` | `QueryStreamCompleteEvent` | Stream finished successfully. Close the connection. | | `stream_error` | `QueryStreamErrorEvent` | An error occurred. Close the connection. | ### Example SSE Stream ``` data: {"type":"tool.start","seq":1,"run_id":"run_abc","tool_call_id":"tc_1","name":"searchKnowledgeBase","args":{"query":"revenue projections Q4"}} data: {"type":"tool.end","seq":2,"run_id":"run_abc","tool_call_id":"tc_1","name":"searchKnowledgeBase","ok":true,"result_summary":{"resultCount":12}} data: {"type":"text.delta","seq":3,"run_id":"run_abc","data":"Based on the documents"} data: {"type":"text.delta","seq":4,"run_id":"run_abc","data":" provided, the revenue"} data: {"type":"text.delta","seq":5,"run_id":"run_abc","data":" projections for Q4 show"} data: {"type":"text.delta","seq":6,"run_id":"run_abc","data":" a 15% increase over Q3."} data: {"type":"stream_complete","metadata":{"totalResults":12,"totalSearches":1},"stats":{"totalTokens":150}} ``` ### Notes - The agent may perform multiple searches per query. Each search produces a `tool.start` / `tool.end` pair. - Text chunks are interleaved between tool events — text arrives after the agent has gathered results from a search. - Connect with `Accept: text/event-stream` and set a generous timeout (120s+) for long responses. ## Bounding Box Data Set `include_bbox: true` (inference=false only) to receive element-level layout coordinates for each search result. Each result will include a `layout` object with normalized bounding box blocks for PDF and DOCX files. Each block contains: - `type`: element type (text, title, section_header, list_item, table, figure, key_value, header, footer) - `content`: the text content - `page`: page number - `bbox`: normalized 0-1 coordinates `{ top, left, width, height }` relative to page dimensions - `confidence`: extraction confidence (high/low) when available - `image_url`: presigned URL for figure/chart images when available Files without OCR data (TXT, CSV, images) will have `layout: null`.

Authentication

AuthorizationBearer
Bearer token authentication using API key

Path parameters

collection_namestringRequired

Headers

X-Organization-IDstringRequired

Request

This endpoint expects an object.
querystringRequired
The natural language query to search for
streamfalseRequired

Enable real-time streaming of the response

inferencebooleanOptionalDefaults to false

Enable LLM-generated answers based on the relevant sections retrieved. When false, returns raw search results.

top_kintegerOptionalDefaults to 10

Number of results to return. Only valid when inference=false. Not supported when inference=true (the agent controls its own search strategy).

rerankbooleanOptionalDefaults to false

Enable Voyage AI rerank-2.5 reranking for improved relevance ordering. Adds ~100-300ms latency.

metadata_filtermap from strings to anyOptional
Filter expression for vector search. Supports: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or
custom_promptstringOptional

Custom system prompt to override the default RAG prompt when inference=true. Allows customizing how the LLM processes and responds to the query with the retrieved context.

include_bboxbooleanOptionalDefaults to false

Include normalized bounding box layout data for each search result. Returns element-level positions (titles, paragraphs, tables, figures, form fields) with page coordinates for PDF and DOCX files. Only supported with inference=false.

Response

successboolean or null
Whether the query was successful
summarystring or null

AI-generated summary/response (when inference=true)

responsestring or null

Alias for summary (v1 compatibility)

relevant_documentslist of objects or null

List of relevant documents (when inference=true)

inferenceboolean or null
Whether inference mode was used
search_resultslist of objects or null

Raw search results with content (when inference=false)

total_resultsinteger or null
Total number of search results found
top_kinteger or null
Number of results returned
querystring or null
The original query
tokens_usedmap from strings to integers or null
Token usage breakdown by category
execution_time_msinteger or null
Query execution time in milliseconds
request_idstring or null

Unique request identifier (used for streaming)

streamingobject or null

Streaming configuration (when stream=true)

token_balanceobject or null
Current token balance after this request