Query Collection

Execute a natural language query against a collection.

When inference=true, returns an AI-generated response with relevant documents. When inference=false, returns raw search results with content and metadata.

Path parameters

collection_namestringRequired
Name of the collection to query

Headers

AuthorizationstringRequired
Captain API key for authentication
X-Organization-IDstringRequired
Organization UUID
Idempotency-KeystringOptional
UUID for request deduplication

Request

This endpoint expects an object.
querystringRequired
The natural language query to search for
inferencebooleanOptionalDefaults to false

Enable LLM-generated answers based on the relevant sections retrieved. When false, returns raw search results.

streambooleanOptionalDefaults to false

Enable real-time streaming of the response

top_kintegerOptionalDefaults to 80
Number of results to return
model_idenumOptional

Model to use for inference. Options: ‘gpt-oss-120b’ or ‘claude-sonnet-4.5’. If not specified, defaults to gpt-oss-120b with claude-sonnet-4.5 fallback.

Allowed values:
metadata_filtermap from strings to anyOptional
Filter expression for vector search. Supports: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or
custom_promptstringOptional

Custom system prompt to override the default RAG prompt when inference=true. Allows customizing how the LLM processes and responds to the query with the retrieved context.

Response

Successful Response
successboolean or null
Whether the query was successful
summarystring or null

AI-generated summary/response (when inference=true)

responsestring or null

Alias for summary (v1 compatibility)

relevant_documentslist of objects or null

List of relevant documents (when inference=true)

inferenceboolean or null
Whether inference mode was used
search_resultslist of objects or null

Raw search results with content (when inference=false)

total_resultsinteger or null
Total number of search results found
top_kinteger or null
Number of results returned
querystring or null
The original query
tokens_usedmap from strings to integers or null
Token usage breakdown by category
execution_time_msinteger or null
Query execution time in milliseconds
request_idstring or null

Unique request identifier (used for streaming)

streamingobject or null

Streaming configuration (when stream=true)

token_balanceobject or null
Current token balance after this request