Query Collection | Captain Docs

Execute a natural language query against a collection.

When inference=true, returns an AI-generated response with relevant documents. When inference=false, returns raw search results with content and metadata.

Path parameters

collection_namestringRequired

Name of the collection to query

Request

This endpoint expects an object.

querystringRequired

The natural language query to search for

inferencebooleanOptionalDefaults to false

Enable LLM-generated answers based on the relevant sections retrieved. When false, returns raw search results.

streambooleanOptionalDefaults to false

Enable real-time streaming of the response

top_kintegerOptionalDefaults to 80

Number of results to return

model_idenumOptional

Model to use for inference. Options: ‘gpt-oss-120b’ or ‘claude-sonnet-4.5’. If not specified, defaults to gpt-oss-120b with claude-sonnet-4.5 fallback.

Allowed values:

metadata_filtermap from strings to anyOptional

Filter expression for vector search. Supports: $eq,$ ne, $gt,$ gte, $lt,$ lte, $in,$ nin, $and,$ or

Filter expression for vector search. Supports: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or

custom_promptstringOptional

Custom system prompt to override the default RAG prompt when inference=true. Allows customizing how the LLM processes and responds to the query with the retrieved context.

Response

Successful Response

successboolean or null

Whether the query was successful

summarystring or null

AI-generated summary/response (when inference=true)

responsestring or null

Alias for summary (v1 compatibility)

relevant_documentslist of objects or null

List of relevant documents (when inference=true)

inferenceboolean or null

Whether inference mode was used

search_resultslist of objects or null

Raw search results with content (when inference=false)

total_resultsinteger or null

Total number of search results found

top_kinteger or null

Number of results returned

querystring or null

The original query

tokens_usedmap from strings to integers or null

Token usage breakdown by category

execution_time_msinteger or null

Query execution time in milliseconds

request_idstring or null

Unique request identifier (used for streaming)

streamingobject or null

Streaming configuration (when stream=true)

token_balanceobject or null

Current token balance after this request

1	import requests
2	import uuid
3	import json
4
5	BASE_URL = "https://api.runcaptain.com"
6	API_KEY = "your_api_key"
7	ORG_ID = "your_organization_id"
8
9	headers = {
10	"Authorization": f"Bearer {API_KEY}",
11	"X-Organization-ID": ORG_ID,
12	"Content-Type": "application/json",
13	"Idempotency-Key": str(uuid.uuid4())
14	}
15
16	response = requests.post(
17	f"{BASE_URL}/v2/collections/my_documents/query",
18	headers=headers,
19	json={
20	"query": "What are the key terms in the contract?",
21	"inference": False,
22	"stream": False,
23	"top_k": 80
24	},
25	timeout=120.0
26	)
27
28	if response.status_code == 200:
29	result = response.json()
30	if result.get("response") is not None:
31	print("Response:", result.get("response"))
32	else:
33	print(json.dumps(result, indent=2))
34	else:
35	print(f"Error: {response.status_code}")
36	try:
37	print(response.json())
38	except:
39	print(response.text)

1	{
2	"response": "Based on the contract, the key terms include...",
3	"relevant_documents": [
4	{
5	"relevancy_score": 0.92,
6	"document_id": "doc_abc123",
7	"document_name": "contract.pdf",
8	"document_type": "pdf"
9	}
10	],
11	"query": "What are the key terms in the contract?",
12	"status": "success",
13	"collection_name": "my_documents"
14	}

Path parameters

Headers

Request

Response