Query Collection
Execute a natural language query against a collection.
When inference=true, returns an AI-generated response with relevant documents.
When inference=false, returns raw search results with content and metadata.
Path parameters
Headers
Request
Enable LLM-generated answers based on the relevant sections retrieved. When false, returns raw search results.
Enable real-time streaming of the response
Model to use for inference. Options: ‘gpt-oss-120b’ or ‘claude-sonnet-4.5’. If not specified, defaults to gpt-oss-120b with claude-sonnet-4.5 fallback.
Custom system prompt to override the default RAG prompt when inference=true. Allows customizing how the LLM processes and responds to the query with the retrieved context.
Response
AI-generated summary/response (when inference=true)
Alias for summary (v1 compatibility)
List of relevant documents (when inference=true)
Raw search results with content (when inference=false)
Unique request identifier (used for streaming)
Streaming configuration (when stream=true)