Skip to content

Captain API

The Captain API enables you to process large context (millions of tokens) and ask questions using the familiar OpenAI SDK. Unlike traditional LLMs with limited context windows, Captain can handle unlimited context through intelligent chunking, parallel processing, and generative merging.


Key Features

  • Unlimited Context: Process millions of tokens in a single request
  • Multiple Input Methods: Inline text or file upload
  • Real-time Streaming: Get responses as they're generated via Server-Sent Events
  • OpenAI SDK Compatible: Drop-in replacement for OpenAI API
  • Tool Calling Support: Enable LLM to call functions and tools during processing
  • Automatic Optimization: Large inputs handled automatically behind the scenes
  • Intelligent Processing: 30-40% chunk overlap for accuracy with parallel LLM processing

Authentication

All requests require authentication:

Authorization: Bearer YOUR_API_KEY
X-Organization-ID: YOUR_ORG_UUID


Streaming Responses

Enable real-time streaming to receive responses as they're generated using Server-Sent Events (SSE).

Enable Streaming

Set stream=true in your request:

import requests

# Configuration
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Organization-ID": ORG_ID
}

response = requests.post(
    "https://api.runcaptain.com/v1/responses",
    headers=headers,
    data={
        'input': 'Large text content...',
        'query': 'What are the main themes?',
        'stream': 'true'
    },
    stream=True  # Important: Enable streaming in requests library
)

# Process streamed chunks
for line in response.iter_lines():
    if line:
        line_text = line.decode('utf-8')
        if line_text.startswith('data: '):
            data = line_text[6:]  # Remove 'data: ' prefix
            print(data, end='', flush=True)

Stream Response Format

Streaming responses use Server-Sent Events (SSE) format:

data: {"type": "chunk", "data": "The document explores"}

data: {"type": "chunk", "data": " three main themes:"}

data: {"type": "chunk", "data": " 1) Cloud computing evolution..."}

event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}

Stream Events

Event Type Description
chunk Content chunk (streamed response text)
complete Stream finished successfully
error Error occurred during processing

POST /v1/responses

Direct HTTP endpoint for infinite context processing. Use this endpoint when making direct HTTP requests without the OpenAI SDK.

Authentication

All requests require authentication via headers:

Authorization: Bearer YOUR_API_KEY
X-Organization-ID: YOUR_ORG_UUID

Parameters

Parameter Type Required Description
input string Yes Your context/document text (unlimited size)
query string Yes The question to ask about the context
stream string No Enable streaming: "true" or "false" (default: "false")

Request Example (Python)

import requests

# Configuration
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
BASE_URL = "https://api.runcaptain.com"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Organization-ID": ORG_ID
}

# Non-streaming request
response = requests.post(
    f"{BASE_URL}/v1/responses",
    headers=headers,
    data={
        'input': 'Your large document text here...',
        'query': 'What are the main themes?'
    }
)

result = response.json()
print(result['response'])

Request Example (JavaScript)

const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
const BASE_URL = 'https://api.runcaptain.com';

const response = await fetch(`${BASE_URL}/v1/responses`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'X-Organization-ID': ORG_ID,
    'Content-Type': 'application/x-www-form-urlencoded'
  },
  body: new URLSearchParams({
    'input': 'Your large document text here...',
    'query': 'What are the main themes?'
  })
});

const result = await response.json();
console.log(result.response);

Request Example (cURL)

curl -X POST https://api.runcaptain.com/v1/responses \
  -H "Authorization: Bearer cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "X-Organization-ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  -d "input=Your large document text here..." \
  -d "query=What are the main themes?"

Streaming Example

response = requests.post(
    f"{BASE_URL}/v1/responses",
    headers=headers,
    data={
        'input': 'Your large document text here...',
        'query': 'What are the main themes?',
        'stream': 'true'
    },
    stream=True  # Important: Enable streaming in requests
)

for line in response.iter_lines():
    if line:
        line_text = line.decode('utf-8')
        if line_text.startswith('data: '):
            data = line_text[6:]  # Remove 'data: ' prefix
            print(data, end='', flush=True)

Response Format

Non-Streaming Response:

{
  "status": "success",
  "response": "The document explores three main themes: cloud computing evolution, security best practices, and cost optimization strategies.",
  "request_id": "resp_1729876543_a1b2c3d4"
}

Streaming Response (SSE):

data: {"type": "chunk", "data": "The document explores"}

data: {"type": "chunk", "data": " three main themes:"}

data: {"type": "chunk", "data": " cloud computing evolution,"}

event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}


SDK Integration

Captain supports multiple SDK approaches for different use cases:

Supported SDKs

  1. Python SDK → - Official OpenAI Python SDK with extra_body
  2. JavaScript SDK → - Official OpenAI JavaScript SDK with extra_body (⭐ Recommended for JS/TS)
  3. Vercel AI SDK → - Vercel's AI SDK with custom header approach
  4. Direct HTTP → - Maximum control with fetch/requests

Python SDK

Use Captain with the official OpenAI Python SDK for a familiar developer experience.

Installation

pip install openai

Captain separates your instructions from your context using the OpenAI SDK's standard message format combined with the extra_body parameter.

Key Concepts: - System messages: Provide instructions to the AI (e.g., "You are a helpful assistant...") - User messages: Contain your query or question
- extra_body.captain.context: Provides your large context/documents (can be millions of tokens)

#!/usr/bin/env python3
"""
Captain OpenAI-Compatible Client - Recommended Approach
"""
from openai import OpenAI

# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

# Initialize OpenAI client pointing to Captain
client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY,
    default_headers={
        "X-Organization-ID": ORG_ID
    }
)

# Load your large document
print("Loading text file...")
with open("large_document.txt", "r") as f:
    context = f.read()

print(f"Loaded {len(context):,} characters")
print("-" * 50)

# Captain approach: instructions in messages, context in extra_body
response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant specialized in document analysis."},
        {"role": "user", "content": "What are the main themes in this document?"}
    ],
    stream=True,
    temperature=0.7,
    extra_body={
        "captain": {
            "context": context  # Large context goes here
        }
    }
)

# Stream the response
print("Response: ", end="", flush=True)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print("
" + "-" * 50)
print("Done!")

Why use this approach: - ✅ Clean separation: instructions vs. context - ✅ Your system prompts pass directly to the AI (not replaced) - ✅ Context properly chunked and processed for unlimited size - ✅ Compatible with OpenAI SDK patterns

Alternative: File Upload Endpoint (for very large files)

For files so large that loading them into memory is impractical, use the dedicated multipart upload endpoint:

#!/usr/bin/env python3
"""
Captain File Upload for Very Large Contexts (>10MB)
"""
import json
import requests

# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

# Load large file
with open("very_large_file.txt", "r", encoding="utf-8") as f:
    context = f.read()

context_size = len(context.encode('utf-8'))
print(f"Uploading {context_size:,} bytes via multipart...")

# Prepare multipart form data
url = f"{BASE_URL}/chat/completions/upload"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Organization-ID": ORG_ID
}

files = {
    'file': ('context.txt', context.encode('utf-8'), 'text/plain')
}

data = {
    'messages': json.dumps([
        {"role": "user", "content": "What are the main themes?"}
    ]),
    'model': 'captain-voyager-latest',
    'stream': 'true',
    'temperature': '0.7'
}

# Make streaming request
response = requests.post(url, headers=headers, files=files, data=data, stream=True)

# Parse SSE stream
print("Response: ", end="", flush=True)
for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data_str = line[6:]
            if data_str == '[DONE]':
                break
            try:
                chunk_data = json.loads(data_str)
                if 'choices' in chunk_data:
                    delta = chunk_data['choices'][0].get('delta', {})
                    if 'content' in delta and delta['content']:
                        print(delta['content'], end="", flush=True)
            except json.JSONDecodeError:
                pass

print("\n" + "-" * 50)
print("Done!")

When to use this endpoint: - ✅ Very large files where loading into memory first is impractical - ✅ When you want to upload files directly via multipart form data - ✅ For workflows where you're already using requests library instead of OpenAI SDK

Note: Captain handles files of any size automatically - you only need this endpoint if you want to use multipart uploads instead of the standard OpenAI SDK approach.

Message Roles Explained

System Role ({"role": "system", ...}): - Provides instructions and guidance to the AI (OPTIONAL) - Sets the AI's behavior and personality - Examples: "You are a legal expert", "Be concise", "Focus on security aspects" - Passes directly to the AI model (not replaced by Captain) - Completely optional - omit to use Captain's default helpful persona - When provided, your instructions take priority over Captain's defaults

User Role ({"role": "user", ...}): - Contains your query or question - What you want to know about the context - Examples: "What are the main themes?", "List all vulnerabilities"

Context (extra_body.captain.context): - Your large documents, text, or data to analyze - Can be millions of tokens (unlimited size) - Automatically chunked and processed by Captain - Examples: Full contracts, codebases, research papers

Example 1: Custom System Prompt (Define Your Own Role)

response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "system", "content": "You are Dr. Watson, a medical expert specializing in research analysis"},
        {"role": "user", "content": "What are the key findings?"}
    ],
    extra_body={
        "captain": {
            "context": medical_research_papers  # Large document(s)
        }
    }
)
# AI responds as Dr. Watson with your custom medical expertise

Example 2: Captain's Default Persona (No System Prompt)

response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "user", "content": "What are the key findings?"}
    ],
    extra_body={
        "captain": {
            "context": medical_research_papers  # Large document(s)
        }
    }
)
# AI responds with Captain's default helpful, informative persona

Important: Context must be provided via extra_body. Do not place large documents in system or user messages.


JavaScript SDK

Use the official OpenAI JavaScript SDK with Captain - recommended for most TypeScript/JavaScript projects.

Installation

npm install openai

Basic Example

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.CAPTAIN_API_KEY,
  baseURL: 'https://api.runcaptain.com/v1',
  defaultHeaders: {
    'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
  },
});

const context = `
Company Policies:
- Remote work: Allowed 3 days/week
- Vacation: 20 days per year
`;

const response = await client.chat.completions.create({
  model: 'captain-voyager-latest',
  messages: [
    { role: 'user', content: "What's the remote work policy?" }
  ],
  extra_body: {
    captain: {
      context: context
    }
  },
});

console.log(response.choices[0].message.content);

Streaming

const stream = await client.chat.completions.create({
  model: 'captain-voyager-latest',
  messages: [
    { role: 'user', content: "Summarize this document" }
  ],
  stream: true,
  extra_body: {
    captain: {
      context: largeDocument
    }
  },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Vercel AI SDK

Use Vercel's AI SDK with Captain via custom header approach or upload endpoint.

Note: - For small contexts (<4KB): Use base64-encoded X-Captain-Context header - For large contexts (>4KB): Use /v1/chat/completions/upload endpoint with FormData

Installation

npm install @ai-sdk/openai ai

Small Context Example

import { createOpenAI } from '@ai-sdk/openai';
import { streamText } from 'ai';

const context = `
Company Policies:
- Vacation: 20 days per year
- Remote work: 3 days per week
`;

// Base64 encode the context for header transmission
const contextBase64 = Buffer.from(context).toString('base64');

const captain = createOpenAI({
  apiKey: process.env.CAPTAIN_API_KEY,
  baseURL: 'https://api.runcaptain.com/v1',
  headers: {
    'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
    'X-Captain-Context': contextBase64,
  },
});

const { textStream } = await streamText({
  model: captain.chat('captain-voyager-latest'),
  messages: [
    { role: 'user', content: 'What is the vacation policy?' }
  ],
});

for await (const chunk of textStream) {
  process.stdout.write(chunk);
}

Large Context Example (Upload Endpoint)

For contexts larger than ~4KB, use the upload endpoint:

const largeContext = `...your large document...`;

const formData = new FormData();
const blob = new Blob([largeContext], { type: 'text/plain' });
formData.append('file', blob, 'context.txt');
formData.append('messages', JSON.stringify([
  { role: 'user', content: 'Summarize this document' }
]));
formData.append('model', 'captain-voyager-latest');
formData.append('stream', 'true');

const response = await fetch('https://api.runcaptain.com/v1/chat/completions/upload', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.CAPTAIN_API_KEY}`,
    'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
  },
  body: formData
});

// Parse SSE stream...

For complete Vercel AI SDK documentation, see Vercel AI SDK Guide.


Direct HTTP Fetch

For maximum control, use direct HTTP requests with the captain parameter.

Basic Example

const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';

const response = await fetch('https://api.runcaptain.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'X-Organization-ID': ORG_ID,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'captain-voyager-latest',
    messages: [
      { role: 'user', content: 'What is the vacation policy?' }
    ],
    captain: {
      context: context
    }
  })
});

const result = await response.json();
console.log(result.choices[0].message.content);

SDK Parameters

Common parameters across all SDKs:

Parameter Type Default Description
model string "captain-voyager-latest" Model to use (currently only captain-voyager-latest)
messages array Required Array of message objects with role and content
temperature float 0.7 Randomness (0.0-2.0)
max_tokens integer 16000 Maximum tokens in response
stream boolean false Enable streaming responses
top_p float 0.95 Nucleus sampling parameter
tools array null Array of tool definitions for function calling
tool_choice string "auto" Control tool usage: "auto", "none", or tool name

POST /v1/chat/completions

Standard OpenAI-compatible chat completions endpoint. Use this for contexts that fit in memory (< 1 MB) or when using the OpenAI Python SDK.

Parameters

Parameter Type Required Description
model string No Model name (default: "captain-voyager-latest")
messages array Yes Array of message objects with role and content
temperature float No Sampling temperature 0.0-2.0 (default: 0.7)
max_tokens integer No Maximum response tokens (default: 16000)
stream boolean No Enable streaming responses (default: false)
top_p float No Nucleus sampling 0.0-1.0 (default: 0.95)
tools array No Tool definitions for function calling (default: null)
tool_choice string No Control tool usage: "auto", "none" (default: "auto")

Captain-specific extensions (in extra_body):

Parameter Type Description
captain.context string Large text context (alternative to system messages)

Request Example (Using OpenAI SDK)

See the "OpenAI SDK Integration" section below for complete examples.

Response Format

Streaming response (SSE format):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"Hello"}}]}

data: [DONE]

Non-streaming response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "captain-voyager-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The document explores three main themes..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 150000,
    "completion_tokens": 500,
    "total_tokens": 150500
  }
}


POST /v1/chat/completions/upload

Note: As of the latest update, the standard /v1/chat/completions endpoint now automatically handles large contexts (>250KB). This upload endpoint is required if you have larger txt files.

Upload extremely large files (1MB+) directly with your chat completion request using multipart form data.

Use this endpoint for: - Explicit multipart file uploads - Legacy integrations requiring direct file upload - When you want manual control over file upload process

Parameters (Multipart Form Data)

Parameter Type Required Description
file file Yes Text file to process (supports .txt, can be 100MB+)
messages string Yes JSON-encoded array of chat messages
model string No Model name (default: "captain-voyager-latest")
stream string No Enable streaming: "true" or "false" (default: "true")
temperature string No Sampling temperature (default: "0.7")
max_tokens string No Maximum response tokens (default: 16000)

Note: Form data values must be strings. Boolean values like stream should be sent as "true" or "false".

Request Example (HTTP Multipart Upload)

Note: This uses direct HTTP requests, not the OpenAI SDK. For OpenAI SDK usage, use the standard /v1/chat/completions endpoint which now handles large contexts automatically.

#!/usr/bin/env python3
"""
Captain HTTP API - Multipart File Upload for Large Contexts
"""
import json
import requests

# Configuration
BASE_URL = "https://api.runcaptain.com/v1"
API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

# Load large text file (can be 100MB+)
print("Loading text file...")
file_path = "large_document.txt"
with open(file_path, "r", encoding="utf-8") as f:
    context = f.read()

context_size = len(context.encode('utf-8'))
print(f"Loaded {len(context):,} characters ({context_size:,} bytes)")
print("-" * 50)

# Prepare request
url = f"{BASE_URL}/chat/completions/upload"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Organization-ID": ORG_ID
}

messages = [
    {
        "role": "user",
        "content": "What are the main themes in this corpus?",
    }
]

# Prepare multipart form data
files = {
    'file': ('context.txt', context.encode('utf-8'), 'text/plain')
}

data = {
    'messages': json.dumps(messages),
    'model': 'captain-voyager-latest',
    'stream': 'true',  # Enable streaming
    'temperature': '0.7'
}

print(f"Uploading {context_size:,} bytes via multipart...")

# Make streaming request
response = requests.post(url, headers=headers, files=files, data=data, stream=True)

if response.status_code != 200:
    print(f"Error: {response.status_code}")
    print(response.text)
    exit(1)

print("Response: ", end="", flush=True)

# Parse SSE stream (OpenAI format)
for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data_str = line[6:]  # Remove 'data: ' prefix
            if data_str == '[DONE]':
                break
            try:
                chunk_data = json.loads(data_str)
                if 'choices' in chunk_data and len(chunk_data['choices']) > 0:
                    delta = chunk_data['choices'][0].get('delta', {})
                    if 'content' in delta and delta['content']:
                        print(delta['content'], end="", flush=True)
            except json.JSONDecodeError:
                pass

print("\n" + "-" * 50)
print("Done!")

How It Works

  1. File Upload: Your large file is processed automatically
  2. Parallel Processing: The file is split into chunks (80k tokens each)
  3. Worker Execution: 15+ parallel workers process chunks simultaneously
  4. Compression: Each worker compresses its chunk to ~8-10k tokens
  5. Response Streaming: The reducer streams the final response in real-time

For a 4.6MB Bible text: - Creates 15 chunks of ~80k tokens each - Runs 15 parallel workers - Each worker processes ~20k tokens - Total processing time: ~10-15 seconds - Streams response as it's generated

Response Format

Returns streaming response in OpenAI format with [DONE] marker:

data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"The document"}}]}

data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":" explores"}}]}

data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Important Notes

  • Streaming Recommended: Always use stream=true for best experience
  • Wait Time: First chunk may take 10-30 seconds for very large files (worker startup time)
  • File Size: Supports files up to 100MB+
  • Encoding: Files must be UTF-8 encoded text
  • Timeout: The stream will wait up to 60 seconds for processing to start

Error Responses

400 Bad Request

{
  "status": "error",
  "message": "Input text is required",
  "error_code": "MISSING_INPUT"
}

401 Unauthorized

{
  "status": "error",
  "message": "Invalid API key",
  "error_code": "INVALID_API_KEY"
}

413 Payload Too Large

{
  "status": "error",
  "message": "Input exceeds maximum size. Please use /v1/responses/upload endpoint for files >100MB",
  "error_code": "PAYLOAD_TOO_LARGE"
}

500 Internal Server Error

{
  "status": "error",
  "message": "Failed to process request",
  "error_code": "PROCESSING_ERROR",
  "request_id": "resp_1729876543_a1b2c3d4"
}

Tool Calling

Enable the LLM to call functions and tools while processing your documents. Tool calling allows the model to request external operations (calculations, API calls, data lookups) during response generation.

Overview

Tool calling works with both streaming and non-streaming modes. When tools are provided, the LLM autonomously decides whether to use them based on your query and the available context.

Key Requirements: - All tools must have "strict": true in the function definition (Cerebras requirement) - Tools work with any context size (small or infinite) - Compatible with both streaming (stream=true) and non-streaming modes

Basic Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcaptain.com/v1",
    api_key="YOUR_API_KEY",
    default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform basic arithmetic operations",
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {
                        "type": "string",
                        "enum": ["add", "subtract", "multiply", "divide"]
                    },
                    "a": {"type": "number"},
                    "b": {"type": "number"}
                },
                "required": ["operation", "a", "b"]
            },
            "strict": True  # Required for Captain/Cerebras
        }
    }
]

# Make request with tools
response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "system", "content": "Q1 Revenue: $125,000\nQ2 Revenue: $150,000"},
        {"role": "user", "content": "What's the total revenue for Q1 and Q2?"}
    ],
    tools=tools,
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Tool Definition Format

Each tool must follow OpenAI's function calling schema:

{
    "type": "function",
    "function": {
        "name": "tool_name",           # Function name (alphanumeric + underscores)
        "description": "What it does",  # Clear description for the LLM
        "parameters": {                 # JSON Schema for parameters
            "type": "object",
            "properties": {
                "param_name": {
                    "type": "string|number|boolean|array|object",
                    "description": "Parameter description"
                }
            },
            "required": ["param_name"]
        },
        "strict": True  # REQUIRED
}

Complete Example: Multiple Tools

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcaptain.com/v1",
    api_key="YOUR_API_KEY",
    default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)

# Define multiple tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {
                        "type": "string",
                        "enum": ["add", "subtract", "multiply", "divide"]
                    },
                    "a": {"type": "number"},
                    "b": {"type": "number"}
                },
                "required": ["operation", "a", "b"]
            },
            "strict": True
        }
    },
    {
        "type": "function",
        "function": {
            "name": "analyze_sentiment",
            "description": "Analyze sentiment of text (positive, negative, neutral)",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "Text to analyze"
                    }
                },
                "required": ["text"]
            },
            "strict": True
        }
    }
]

# Load large document
with open("quarterly_report.txt") as f:
    context = f.read()

# Request with tools
response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "system", "content": "You are a financial analysis assistant."},
        {"role": "user", "content": "Calculate total Q1-Q4 revenue and analyze overall sentiment"}
    ],
    tools=tools,
    stream=True,
    extra_body={
        "captain": {
            "context": context  # Large quarterly report context
        }
    }
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Tool Choice Parameter

Control when the LLM should use tools:

response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[...],
    tools=tools,
    tool_choice="auto",  # Let model decide (default)
    # tool_choice="none",  # Never use tools
    stream=True
)

Streaming vs Non-Streaming with Tools

Streaming Mode:

response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[...],
    tools=tools,
    stream=True  # Recommended for real-time feedback
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Non-Streaming Mode:

response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[...],
    tools=tools,
    stream=False  # Get complete response at once
)

print(response.choices[0].message.content)

Tool Calling with Large Context

Tool calling works seamlessly with Captain's infinite context processing. Simply provide your large document and tools - Captain handles everything automatically:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcaptain.com/v1",
    api_key="YOUR_API_KEY",
    default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
)

# Load very large document (Bible, entire codebase, etc.)
with open("very_large_document.txt") as f:
    context = f.read()  # Captain handles any size automatically

# Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "calculate",
        "description": "Perform arithmetic calculations on values in the document",
        "parameters": {
            "type": "object",
            "properties": {
                "operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
                "a": {"type": "number"},
                "b": {"type": "number"}
            },
            "required": ["operation", "a", "b"]
        },
        "strict": True
    }
}]

# Use standard OpenAI SDK - Captain handles the large context + tool calling
response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[
        {"role": "system", "content": "You are a financial analysis assistant."},
        {"role": "user", "content": "Calculate the total revenue across all quarters"}
    ],
    tools=tools,
    stream=True,
    extra_body={
        "captain": {
            "context": context  # Any size context
        }
    }
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Key Points: - Captain automatically handles large contexts (100M+ tokens) - No special upload needed - just use system messages - Tools + infinite context work together seamlessly - The LLM can use tools while processing massive documents

How Tool Calling Works

  1. LLM Analysis: Model analyzes your query and available tools
  2. Tool Invocation: Model requests tool execution with specific parameters
  3. Result Processing: Tool results are fed back to the model
  4. Final Response: Model generates final answer using tool results

Important Notes: - Tools are logged but not executed by default (you see what the LLM would call) - The LLM autonomously decides which tools to use and when - Multiple tool calls may occur for complex queries - Works seamlessly with infinite context processing

Supported Models

Tool calling is supported on: - captain-voyager-latest (default)

Best Practices for Tool Calling

Do: - ✅ Provide clear, descriptive tool names and descriptions - ✅ Use specific parameter descriptions - ✅ Set "strict": True in all function definitions - ✅ Test with streaming mode for better UX - ✅ Keep tool parameters simple and well-defined

Don't: - ❌ Forget "strict": True - ❌ Use vague tool descriptions - ❌ Define too many tools (keep it focused) - ❌ Expect tool auto-execution (currently logged only)

Troubleshooting

Tool not being called: - Ensure "strict": True is set - Make description more specific to your use case - Try being more explicit in your query

Invalid tool schema error: - Verify JSON schema format in parameters - Check all required fields are present - Ensure type values are valid JSON Schema types


Best Practices

Input Size Optimization

Captain handles files of any size automatically. You have two options:

  • Standard approach: Use OpenAI SDK with system messages or captain.context - works for any size
  • Multipart upload: Use /v1/chat/completions/upload for very large files if you prefer multipart form data

Streaming vs Non-Streaming

Use Streaming When: - You want real-time responses - Processing very large documents (reduces perceived latency) - Building chat interfaces - Need to show progress to users

Use Non-Streaming When: - You need the complete response at once - Processing in batch jobs - Storing responses in databases - Simpler implementation needed

Query Optimization

Good Queries: - "What are the main conclusions in section 3?" - "Summarize the methodology described in the paper" - "List all security vulnerabilities mentioned"

Avoid: - Vague queries: "Tell me about this" - Multiple questions: "What are the themes, findings, and recommendations?" - Yes/no questions without context: "Is this good?"

Better Approach: - Break complex queries into separate requests - Be specific about what sections or topics to focus on - Ask for structured output: "List the top 5..."


Use Cases

Process entire legal contracts, case files, or regulatory documents to extract key clauses, risks, and obligations.

Scientific Paper Review

Analyze research papers, extract methodologies, findings, and compare multiple papers simultaneously.

Code Repository Analysis

Upload entire codebases to understand architecture, identify patterns, or find specific implementations.

Financial Report Processing

Process annual reports, 10-Ks, earnings transcripts to extract financial metrics and strategic insights.

Customer Support Ticket Analysis

Analyze thousands of support tickets to identify common issues, trends, and resolution patterns.


Rate Limits

Tier Requests per Minute Max Input Size
Standard 10 50MB
Premium 60 Unlimited

Contact support@runcaptain.com to upgrade to Premium tier.



Choosing the Right Endpoint

Use this guide to select the best endpoint for your use case:

/v1/chat/completions (OpenAI SDK)

Best for: - Standard contexts (< 1 MB) - Contexts that fit in memory - Using the official OpenAI Python SDK - Familiar OpenAI-compatible interface

Limits: - No size limits - Captain handles any file size automatically

Example use case:

from openai import OpenAI

client = OpenAI(base_url="https://api.runcaptain.com/v1", api_key="...")
response = client.chat.completions.create(
    model="captain-voyager-latest",
    messages=[{"role": "user", "content": "Summarize this"}],
    extra_body={"captain": {"context": "Your text here..."}}
)

/v1/chat/completions/upload (Multipart Upload)

Best for: - Very large files (1 MB - 100 MB+) - Files already on disk - Maximum performance for huge contexts - Parallel processing of massive texts

Advantages: - Direct file upload (no encoding overhead) - Optimized for 10 MB+ files - Automatic parallel processing - Streaming starts immediately after upload

Example use case:

import requests

with open('bible.txt', 'rb') as f:
    response = requests.post(
        "https://api.runcaptain.com/v1/chat/completions/upload",
        headers={"Authorization": f"Bearer {API_KEY}", "X-Organization-ID": ORG_ID},
        files={'file': f},
        data={'messages': '[{"role":"user","content":"Summarize"}]', 'stream': 'true'},
        stream=True
    )