Captain API
The Captain API enables you to process large context (millions of tokens) and ask questions using the familiar OpenAI SDK. Unlike traditional LLMs with limited context windows, Captain can handle unlimited context through intelligent chunking, parallel processing, and generative merging.
Key Features
- Unlimited Context: Process millions of tokens in a single request
- Multiple Input Methods: Inline text or file upload
- Real-time Streaming: Get responses as they're generated via Server-Sent Events
- OpenAI SDK Compatible: Drop-in replacement for OpenAI API
- Tool Calling Support: Enable LLM to call functions and tools during processing
- Automatic Optimization: Large inputs handled automatically behind the scenes
- Intelligent Processing: 30-40% chunk overlap for accuracy with parallel LLM processing
Authentication
All requests require authentication:
Streaming Responses
Enable real-time streaming to receive responses as they're generated using Server-Sent Events (SSE).
Enable Streaming
Set stream=true in your request:
import requests
# Configuration
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
headers = {
"Authorization": f"Bearer {API_KEY}",
"X-Organization-ID": ORG_ID
}
response = requests.post(
"https://api.runcaptain.com/v1/responses",
headers=headers,
data={
'input': 'Large text content...',
'query': 'What are the main themes?',
'stream': 'true'
},
stream=True # Important: Enable streaming in requests library
)
# Process streamed chunks
for line in response.iter_lines():
if line:
line_text = line.decode('utf-8')
if line_text.startswith('data: '):
data = line_text[6:] # Remove 'data: ' prefix
print(data, end='', flush=True)
Stream Response Format
Streaming responses use Server-Sent Events (SSE) format:
data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " 1) Cloud computing evolution..."}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}
Stream Events
| Event Type | Description |
|---|---|
chunk |
Content chunk (streamed response text) |
complete |
Stream finished successfully |
error |
Error occurred during processing |
POST /v1/responses
Direct HTTP endpoint for infinite context processing. Use this endpoint when making direct HTTP requests without the OpenAI SDK.
Authentication
All requests require authentication via headers:
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
input |
string | Yes | Your context/document text (unlimited size) |
query |
string | Yes | The question to ask about the context |
stream |
string | No | Enable streaming: "true" or "false" (default: "false") |
Request Example (Python)
import requests
# Configuration
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
BASE_URL = "https://api.runcaptain.com"
headers = {
"Authorization": f"Bearer {API_KEY}",
"X-Organization-ID": ORG_ID
}
# Non-streaming request
response = requests.post(
f"{BASE_URL}/v1/responses",
headers=headers,
data={
'input': 'Your large document text here...',
'query': 'What are the main themes?'
}
)
result = response.json()
print(result['response'])
Request Example (JavaScript)
const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
const BASE_URL = 'https://api.runcaptain.com';
const response = await fetch(`${BASE_URL}/v1/responses`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'X-Organization-ID': ORG_ID,
'Content-Type': 'application/x-www-form-urlencoded'
},
body: new URLSearchParams({
'input': 'Your large document text here...',
'query': 'What are the main themes?'
})
});
const result = await response.json();
console.log(result.response);
Request Example (cURL)
curl -X POST https://api.runcaptain.com/v1/responses \
-H "Authorization: Bearer cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
-H "X-Organization-ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
-d "input=Your large document text here..." \
-d "query=What are the main themes?"
Streaming Example
response = requests.post(
f"{BASE_URL}/v1/responses",
headers=headers,
data={
'input': 'Your large document text here...',
'query': 'What are the main themes?',
'stream': 'true'
},
stream=True # Important: Enable streaming in requests
)
for line in response.iter_lines():
if line:
line_text = line.decode('utf-8')
if line_text.startswith('data: '):
data = line_text[6:] # Remove 'data: ' prefix
print(data, end='', flush=True)
Response Format
Non-Streaming Response:
{
"status": "success",
"response": "The document explores three main themes: cloud computing evolution, security best practices, and cost optimization strategies.",
"request_id": "resp_1729876543_a1b2c3d4"
}
Streaming Response (SSE):
data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " cloud computing evolution,"}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}
SDK Integration
Captain supports multiple SDK approaches for different use cases:
Supported SDKs
- Python SDK → - Official OpenAI Python SDK with
extra_body - JavaScript SDK → - Official OpenAI JavaScript SDK with
extra_body(⭐ Recommended for JS/TS) - Vercel AI SDK → - Vercel's AI SDK with custom header approach
- Direct HTTP → - Maximum control with fetch/requests
Python SDK
Use Captain with the official OpenAI Python SDK for a familiar developer experience.
Installation
Using Captain with OpenAI SDK ⭐ Recommended
Captain separates your instructions from your context using the OpenAI SDK's standard message format combined with the extra_body parameter.
Key Concepts:
- System messages: Provide instructions to the AI (e.g., "You are a helpful assistant...")
- User messages: Contain your query or question
- extra_body.captain.context: Provides your large context/documents (can be millions of tokens)
#!/usr/bin/env python3
"""
Captain OpenAI-Compatible Client - Recommended Approach
"""
from openai import OpenAI
# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Initialize OpenAI client pointing to Captain
client = OpenAI(
base_url=BASE_URL,
api_key=API_KEY,
default_headers={
"X-Organization-ID": ORG_ID
}
)
# Load your large document
print("Loading text file...")
with open("large_document.txt", "r") as f:
context = f.read()
print(f"Loaded {len(context):,} characters")
print("-" * 50)
# Captain approach: instructions in messages, context in extra_body
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "system", "content": "You are a helpful assistant specialized in document analysis."},
{"role": "user", "content": "What are the main themes in this document?"}
],
stream=True,
temperature=0.7,
extra_body={
"captain": {
"context": context # Large context goes here
}
}
)
# Stream the response
print("Response: ", end="", flush=True)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("
" + "-" * 50)
print("Done!")
Why use this approach: - ✅ Clean separation: instructions vs. context - ✅ Your system prompts pass directly to the AI (not replaced) - ✅ Context properly chunked and processed for unlimited size - ✅ Compatible with OpenAI SDK patterns
Alternative: File Upload Endpoint (for very large files)
For files so large that loading them into memory is impractical, use the dedicated multipart upload endpoint:
#!/usr/bin/env python3
"""
Captain File Upload for Very Large Contexts (>10MB)
"""
import json
import requests
# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Load large file
with open("very_large_file.txt", "r", encoding="utf-8") as f:
context = f.read()
context_size = len(context.encode('utf-8'))
print(f"Uploading {context_size:,} bytes via multipart...")
# Prepare multipart form data
url = f"{BASE_URL}/chat/completions/upload"
headers = {
"Authorization": f"Bearer {API_KEY}",
"X-Organization-ID": ORG_ID
}
files = {
'file': ('context.txt', context.encode('utf-8'), 'text/plain')
}
data = {
'messages': json.dumps([
{"role": "user", "content": "What are the main themes?"}
]),
'model': 'captain-voyager-latest',
'stream': 'true',
'temperature': '0.7'
}
# Make streaming request
response = requests.post(url, headers=headers, files=files, data=data, stream=True)
# Parse SSE stream
print("Response: ", end="", flush=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data_str = line[6:]
if data_str == '[DONE]':
break
try:
chunk_data = json.loads(data_str)
if 'choices' in chunk_data:
delta = chunk_data['choices'][0].get('delta', {})
if 'content' in delta and delta['content']:
print(delta['content'], end="", flush=True)
except json.JSONDecodeError:
pass
print("\n" + "-" * 50)
print("Done!")
When to use this endpoint:
- ✅ Very large files where loading into memory first is impractical
- ✅ When you want to upload files directly via multipart form data
- ✅ For workflows where you're already using requests library instead of OpenAI SDK
Note: Captain handles files of any size automatically - you only need this endpoint if you want to use multipart uploads instead of the standard OpenAI SDK approach.
Message Roles Explained
System Role ({"role": "system", ...}):
- Provides instructions and guidance to the AI (OPTIONAL)
- Sets the AI's behavior and personality
- Examples: "You are a legal expert", "Be concise", "Focus on security aspects"
- Passes directly to the AI model (not replaced by Captain)
- Completely optional - omit to use Captain's default helpful persona
- When provided, your instructions take priority over Captain's defaults
User Role ({"role": "user", ...}):
- Contains your query or question
- What you want to know about the context
- Examples: "What are the main themes?", "List all vulnerabilities"
Context (extra_body.captain.context):
- Your large documents, text, or data to analyze
- Can be millions of tokens (unlimited size)
- Automatically chunked and processed by Captain
- Examples: Full contracts, codebases, research papers
Example 1: Custom System Prompt (Define Your Own Role)
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "system", "content": "You are Dr. Watson, a medical expert specializing in research analysis"},
{"role": "user", "content": "What are the key findings?"}
],
extra_body={
"captain": {
"context": medical_research_papers # Large document(s)
}
}
)
# AI responds as Dr. Watson with your custom medical expertise
Example 2: Captain's Default Persona (No System Prompt)
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "user", "content": "What are the key findings?"}
],
extra_body={
"captain": {
"context": medical_research_papers # Large document(s)
}
}
)
# AI responds with Captain's default helpful, informative persona
Important: Context must be provided via extra_body. Do not place large documents in system or user messages.
JavaScript SDK
Use the official OpenAI JavaScript SDK with Captain - recommended for most TypeScript/JavaScript projects.
Installation
Basic Example
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.CAPTAIN_API_KEY,
baseURL: 'https://api.runcaptain.com/v1',
defaultHeaders: {
'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
},
});
const context = `
Company Policies:
- Remote work: Allowed 3 days/week
- Vacation: 20 days per year
`;
const response = await client.chat.completions.create({
model: 'captain-voyager-latest',
messages: [
{ role: 'user', content: "What's the remote work policy?" }
],
extra_body: {
captain: {
context: context
}
},
});
console.log(response.choices[0].message.content);
Streaming
const stream = await client.chat.completions.create({
model: 'captain-voyager-latest',
messages: [
{ role: 'user', content: "Summarize this document" }
],
stream: true,
extra_body: {
captain: {
context: largeDocument
}
},
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Vercel AI SDK
Use Vercel's AI SDK with Captain via custom header approach or upload endpoint.
Note:
- For small contexts (<4KB): Use base64-encoded X-Captain-Context header
- For large contexts (>4KB): Use /v1/chat/completions/upload endpoint with FormData
Installation
Small Context Example
import { createOpenAI } from '@ai-sdk/openai';
import { streamText } from 'ai';
const context = `
Company Policies:
- Vacation: 20 days per year
- Remote work: 3 days per week
`;
// Base64 encode the context for header transmission
const contextBase64 = Buffer.from(context).toString('base64');
const captain = createOpenAI({
apiKey: process.env.CAPTAIN_API_KEY,
baseURL: 'https://api.runcaptain.com/v1',
headers: {
'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
'X-Captain-Context': contextBase64,
},
});
const { textStream } = await streamText({
model: captain.chat('captain-voyager-latest'),
messages: [
{ role: 'user', content: 'What is the vacation policy?' }
],
});
for await (const chunk of textStream) {
process.stdout.write(chunk);
}
Large Context Example (Upload Endpoint)
For contexts larger than ~4KB, use the upload endpoint:
const largeContext = `...your large document...`;
const formData = new FormData();
const blob = new Blob([largeContext], { type: 'text/plain' });
formData.append('file', blob, 'context.txt');
formData.append('messages', JSON.stringify([
{ role: 'user', content: 'Summarize this document' }
]));
formData.append('model', 'captain-voyager-latest');
formData.append('stream', 'true');
const response = await fetch('https://api.runcaptain.com/v1/chat/completions/upload', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CAPTAIN_API_KEY}`,
'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
},
body: formData
});
// Parse SSE stream...
For complete Vercel AI SDK documentation, see Vercel AI SDK Guide.
Direct HTTP Fetch
For maximum control, use direct HTTP requests with the captain parameter.
Basic Example
const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
const response = await fetch('https://api.runcaptain.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'X-Organization-ID': ORG_ID,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'captain-voyager-latest',
messages: [
{ role: 'user', content: 'What is the vacation policy?' }
],
captain: {
context: context
}
})
});
const result = await response.json();
console.log(result.choices[0].message.content);
SDK Parameters
Common parameters across all SDKs:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | "captain-voyager-latest" | Model to use (currently only captain-voyager-latest) |
messages |
array | Required | Array of message objects with role and content |
temperature |
float | 0.7 | Randomness (0.0-2.0) |
max_tokens |
integer | 16000 | Maximum tokens in response |
stream |
boolean | false | Enable streaming responses |
top_p |
float | 0.95 | Nucleus sampling parameter |
tools |
array | null | Array of tool definitions for function calling |
tool_choice |
string | "auto" | Control tool usage: "auto", "none", or tool name |
POST /v1/chat/completions
Standard OpenAI-compatible chat completions endpoint. Use this for contexts that fit in memory (< 1 MB) or when using the OpenAI Python SDK.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | No | Model name (default: "captain-voyager-latest") |
messages |
array | Yes | Array of message objects with role and content |
temperature |
float | No | Sampling temperature 0.0-2.0 (default: 0.7) |
max_tokens |
integer | No | Maximum response tokens (default: 16000) |
stream |
boolean | No | Enable streaming responses (default: false) |
top_p |
float | No | Nucleus sampling 0.0-1.0 (default: 0.95) |
tools |
array | No | Tool definitions for function calling (default: null) |
tool_choice |
string | No | Control tool usage: "auto", "none" (default: "auto") |
Captain-specific extensions (in extra_body):
| Parameter | Type | Description |
|---|---|---|
captain.context |
string | Large text context (alternative to system messages) |
Request Example (Using OpenAI SDK)
See the "OpenAI SDK Integration" section below for complete examples.
Response Format
Streaming response (SSE format):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"Hello"}}]}
data: [DONE]
Non-streaming response:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "captain-voyager-latest",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The document explores three main themes..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 150000,
"completion_tokens": 500,
"total_tokens": 150500
}
}
POST /v1/chat/completions/upload
Note: As of the latest update, the standard /v1/chat/completions endpoint now automatically handles large contexts (>250KB). This upload endpoint is required if you have larger txt files.
Upload extremely large files (1MB+) directly with your chat completion request using multipart form data.
Use this endpoint for: - Explicit multipart file uploads - Legacy integrations requiring direct file upload - When you want manual control over file upload process
Parameters (Multipart Form Data)
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | Text file to process (supports .txt, can be 100MB+) |
messages |
string | Yes | JSON-encoded array of chat messages |
model |
string | No | Model name (default: "captain-voyager-latest") |
stream |
string | No | Enable streaming: "true" or "false" (default: "true") |
temperature |
string | No | Sampling temperature (default: "0.7") |
max_tokens |
string | No | Maximum response tokens (default: 16000) |
Note: Form data values must be strings. Boolean values like stream should be sent as "true" or "false".
Request Example (HTTP Multipart Upload)
Note: This uses direct HTTP requests, not the OpenAI SDK. For OpenAI SDK usage, use the standard /v1/chat/completions endpoint which now handles large contexts automatically.
#!/usr/bin/env python3
"""
Captain HTTP API - Multipart File Upload for Large Contexts
"""
import json
import requests
# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Load large text file (can be 100MB+)
print("Loading text file...")
file_path = "large_document.txt"
with open(file_path, "r", encoding="utf-8") as f:
context = f.read()
context_size = len(context.encode('utf-8'))
print(f"Loaded {len(context):,} characters ({context_size:,} bytes)")
print("-" * 50)
# Prepare request
url = f"{BASE_URL}/chat/completions/upload"
headers = {
"Authorization": f"Bearer {API_KEY}",
"X-Organization-ID": ORG_ID
}
messages = [
{
"role": "user",
"content": "What are the main themes in this corpus?",
}
]
# Prepare multipart form data
files = {
'file': ('context.txt', context.encode('utf-8'), 'text/plain')
}
data = {
'messages': json.dumps(messages),
'model': 'captain-voyager-latest',
'stream': 'true', # Enable streaming
'temperature': '0.7'
}
print(f"Uploading {context_size:,} bytes via multipart...")
# Make streaming request
response = requests.post(url, headers=headers, files=files, data=data, stream=True)
if response.status_code != 200:
print(f"Error: {response.status_code}")
print(response.text)
exit(1)
print("Response: ", end="", flush=True)
# Parse SSE stream (OpenAI format)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data_str = line[6:] # Remove 'data: ' prefix
if data_str == '[DONE]':
break
try:
chunk_data = json.loads(data_str)
if 'choices' in chunk_data and len(chunk_data['choices']) > 0:
delta = chunk_data['choices'][0].get('delta', {})
if 'content' in delta and delta['content']:
print(delta['content'], end="", flush=True)
except json.JSONDecodeError:
pass
print("\n" + "-" * 50)
print("Done!")
How It Works
- File Upload: Your large file is processed automatically
- Parallel Processing: The file is split into chunks (80k tokens each)
- Worker Execution: 15+ parallel workers process chunks simultaneously
- Compression: Each worker compresses its chunk to ~8-10k tokens
- Response Streaming: The reducer streams the final response in real-time
For a 4.6MB Bible text: - Creates 15 chunks of ~80k tokens each - Runs 15 parallel workers - Each worker processes ~20k tokens - Total processing time: ~10-15 seconds - Streams response as it's generated
Response Format
Returns streaming response in OpenAI format with [DONE] marker:
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"The document"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":" explores"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Important Notes
- Streaming Recommended: Always use
stream=truefor best experience - Wait Time: First chunk may take 10-30 seconds for very large files (worker startup time)
- File Size: Supports files up to 100MB+
- Encoding: Files must be UTF-8 encoded text
- Timeout: The stream will wait up to 60 seconds for processing to start
Error Responses
400 Bad Request
401 Unauthorized
413 Payload Too Large
{
"status": "error",
"message": "Input exceeds maximum size. Please use /v1/responses/upload endpoint for files >100MB",
"error_code": "PAYLOAD_TOO_LARGE"
}
500 Internal Server Error
{
"status": "error",
"message": "Failed to process request",
"error_code": "PROCESSING_ERROR",
"request_id": "resp_1729876543_a1b2c3d4"
}
Tool Calling
Enable the LLM to call functions and tools while processing your documents. Tool calling allows the model to request external operations (calculations, API calls, data lookups) during response generation.
Overview
Tool calling works with both streaming and non-streaming modes. When tools are provided, the LLM autonomously decides whether to use them based on your query and the available context.
Key Requirements:
- All tools must have "strict": true in the function definition (Cerebras requirement)
- Tools work with any context size (small or infinite)
- Compatible with both streaming (stream=true) and non-streaming modes
Basic Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcaptain.com/v1",
api_key="YOUR_API_KEY",
default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform basic arithmetic operations",
"parameters": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
},
"strict": True # Required for Captain/Cerebras
}
}
]
# Make request with tools
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "system", "content": "Q1 Revenue: $125,000\nQ2 Revenue: $150,000"},
{"role": "user", "content": "What's the total revenue for Q1 and Q2?"}
],
tools=tools,
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Tool Definition Format
Each tool must follow OpenAI's function calling schema:
{
"type": "function",
"function": {
"name": "tool_name", # Function name (alphanumeric + underscores)
"description": "What it does", # Clear description for the LLM
"parameters": { # JSON Schema for parameters
"type": "object",
"properties": {
"param_name": {
"type": "string|number|boolean|array|object",
"description": "Parameter description"
}
},
"required": ["param_name"]
},
"strict": True # REQUIRED
}
Complete Example: Multiple Tools
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcaptain.com/v1",
api_key="YOUR_API_KEY",
default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)
# Define multiple tools
tools = [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform arithmetic calculations",
"parameters": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
},
"strict": True
}
},
{
"type": "function",
"function": {
"name": "analyze_sentiment",
"description": "Analyze sentiment of text (positive, negative, neutral)",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Text to analyze"
}
},
"required": ["text"]
},
"strict": True
}
}
]
# Load large document
with open("quarterly_report.txt") as f:
context = f.read()
# Request with tools
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "system", "content": "You are a financial analysis assistant."},
{"role": "user", "content": "Calculate total Q1-Q4 revenue and analyze overall sentiment"}
],
tools=tools,
stream=True,
extra_body={
"captain": {
"context": context # Large quarterly report context
}
}
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Tool Choice Parameter
Control when the LLM should use tools:
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[...],
tools=tools,
tool_choice="auto", # Let model decide (default)
# tool_choice="none", # Never use tools
stream=True
)
Streaming vs Non-Streaming with Tools
Streaming Mode:
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[...],
tools=tools,
stream=True # Recommended for real-time feedback
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Non-Streaming Mode:
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[...],
tools=tools,
stream=False # Get complete response at once
)
print(response.choices[0].message.content)
Tool Calling with Large Context
Tool calling works seamlessly with Captain's infinite context processing. Simply provide your large document and tools - Captain handles everything automatically:
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcaptain.com/v1",
api_key="YOUR_API_KEY",
default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)
# Load very large document (Bible, entire codebase, etc.)
with open("very_large_document.txt") as f:
context = f.read() # Captain handles any size automatically
# Define tools
tools = [{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform arithmetic calculations on values in the document",
"parameters": {
"type": "object",
"properties": {
"operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
},
"strict": True
}
}]
# Use standard OpenAI SDK - Captain handles the large context + tool calling
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[
{"role": "system", "content": "You are a financial analysis assistant."},
{"role": "user", "content": "Calculate the total revenue across all quarters"}
],
tools=tools,
stream=True,
extra_body={
"captain": {
"context": context # Any size context
}
}
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Key Points: - Captain automatically handles large contexts (100M+ tokens) - No special upload needed - just use system messages - Tools + infinite context work together seamlessly - The LLM can use tools while processing massive documents
How Tool Calling Works
- LLM Analysis: Model analyzes your query and available tools
- Tool Invocation: Model requests tool execution with specific parameters
- Result Processing: Tool results are fed back to the model
- Final Response: Model generates final answer using tool results
Important Notes: - Tools are logged but not executed by default (you see what the LLM would call) - The LLM autonomously decides which tools to use and when - Multiple tool calls may occur for complex queries - Works seamlessly with infinite context processing
Supported Models
Tool calling is supported on:
- captain-voyager-latest (default)
Best Practices for Tool Calling
Do:
- ✅ Provide clear, descriptive tool names and descriptions
- ✅ Use specific parameter descriptions
- ✅ Set "strict": True in all function definitions
- ✅ Test with streaming mode for better UX
- ✅ Keep tool parameters simple and well-defined
Don't:
- ❌ Forget "strict": True
- ❌ Use vague tool descriptions
- ❌ Define too many tools (keep it focused)
- ❌ Expect tool auto-execution (currently logged only)
Troubleshooting
Tool not being called:
- Ensure "strict": True is set
- Make description more specific to your use case
- Try being more explicit in your query
Invalid tool schema error:
- Verify JSON schema format in parameters
- Check all required fields are present
- Ensure type values are valid JSON Schema types
Best Practices
Input Size Optimization
Captain handles files of any size automatically. You have two options:
- Standard approach: Use OpenAI SDK with system messages or
captain.context- works for any size - Multipart upload: Use
/v1/chat/completions/uploadfor very large files if you prefer multipart form data
Streaming vs Non-Streaming
Use Streaming When: - You want real-time responses - Processing very large documents (reduces perceived latency) - Building chat interfaces - Need to show progress to users
Use Non-Streaming When: - You need the complete response at once - Processing in batch jobs - Storing responses in databases - Simpler implementation needed
Query Optimization
Good Queries: - "What are the main conclusions in section 3?" - "Summarize the methodology described in the paper" - "List all security vulnerabilities mentioned"
Avoid: - Vague queries: "Tell me about this" - Multiple questions: "What are the themes, findings, and recommendations?" - Yes/no questions without context: "Is this good?"
Better Approach: - Break complex queries into separate requests - Be specific about what sections or topics to focus on - Ask for structured output: "List the top 5..."
Use Cases
Legal Document Analysis
Process entire legal contracts, case files, or regulatory documents to extract key clauses, risks, and obligations.
Scientific Paper Review
Analyze research papers, extract methodologies, findings, and compare multiple papers simultaneously.
Code Repository Analysis
Upload entire codebases to understand architecture, identify patterns, or find specific implementations.
Financial Report Processing
Process annual reports, 10-Ks, earnings transcripts to extract financial metrics and strategic insights.
Customer Support Ticket Analysis
Analyze thousands of support tickets to identify common issues, trends, and resolution patterns.
Rate Limits
| Tier | Requests per Minute | Max Input Size |
|---|---|---|
| Standard | 10 | 50MB |
| Premium | 60 | Unlimited |
Contact support@runcaptain.com to upgrade to Premium tier.
Related APIs
- Integrate Captain with Datalake - Index cloud storage buckets for persistent querying
- API Reference - Complete API documentation
- Getting Started - Quick start guide
Choosing the Right Endpoint
Use this guide to select the best endpoint for your use case:
/v1/chat/completions (OpenAI SDK)
Best for: - Standard contexts (< 1 MB) - Contexts that fit in memory - Using the official OpenAI Python SDK - Familiar OpenAI-compatible interface
Limits: - No size limits - Captain handles any file size automatically
Example use case:
from openai import OpenAI
client = OpenAI(base_url="https://api.runcaptain.com/v1", api_key="...")
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[{"role": "user", "content": "Summarize this"}],
extra_body={"captain": {"context": "Your text here..."}}
)
/v1/chat/completions/upload (Multipart Upload)
Best for: - Very large files (1 MB - 100 MB+) - Files already on disk - Maximum performance for huge contexts - Parallel processing of massive texts
Advantages: - Direct file upload (no encoding overhead) - Optimized for 10 MB+ files - Automatic parallel processing - Streaming starts immediately after upload
Example use case:
import requests
with open('bible.txt', 'rb') as f:
response = requests.post(
"https://api.runcaptain.com/v1/chat/completions/upload",
headers={"Authorization": f"Bearer {API_KEY}", "X-Organization-ID": ORG_ID},
files={'file': f},
data={'messages': '[{"role":"user","content":"Summarize"}]', 'stream': 'true'},
stream=True
)