Captain API

The Captain API enables you to process large context (millions of tokens) and ask questions using the familiar OpenAI SDK. Unlike traditional LLMs with limited context windows, Captain can handle unlimited context through intelligent chunking, parallel processing, and generative merging.


Key Features

  • Unlimited Context: Process millions of tokens in a single request
  • Multiple Input Methods: Inline text or file upload
  • Real-time Streaming: Get responses as they’re generated via Server-Sent Events
  • OpenAI SDK Compatible: Drop-in replacement for OpenAI API
  • Tool Calling Support: Enable LLM to call functions and tools during processing
  • Automatic Optimization: Large inputs handled automatically behind the scenes
  • Intelligent Processing: 30-40% chunk overlap for accuracy with parallel LLM processing

Authentication

All requests require authentication:

1Authorization: Bearer YOUR_API_KEY
2X-Organization-ID: YOUR_ORG_UUID


Streaming Responses

Enable real-time streaming to receive responses as they’re generated using Server-Sent Events (SSE).

Enable Streaming

Set stream=true in your request:

1import requests
2
3# Configuration
4API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
5ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
6
7headers = {
8 "Authorization": f"Bearer {API_KEY}",
9 "X-Organization-ID": ORG_ID
10}
11
12response = requests.post(
13 "https://api.runcaptain.com/v1/responses",
14 headers=headers,
15 data={
16 'input': 'Large text content...',
17 'query': 'What are the main themes?',
18 'stream': 'true'
19 },
20 stream=True # Important: Enable streaming in requests library
21)
22
23# Process streamed chunks
24for line in response.iter_lines():
25 if line:
26 line_text = line.decode('utf-8')
27 if line_text.startswith('data: '):
28 data = line_text[6:] # Remove 'data: ' prefix
29 print(data, end='', flush=True)

Stream Response Format

Streaming responses use Server-Sent Events (SSE) format:

data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " 1) Cloud computing evolution..."}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}

Stream Events

Event TypeDescription
chunkContent chunk (streamed response text)
completeStream finished successfully
errorError occurred during processing

POST /v1/responses

Direct HTTP endpoint for infinite context processing. Use this endpoint when making direct HTTP requests without the OpenAI SDK.

Authentication

All requests require authentication via headers:

1Authorization: Bearer YOUR_API_KEY
2X-Organization-ID: YOUR_ORG_UUID

Parameters

ParameterTypeRequiredDescription
inputstringYesYour context/document text (unlimited size)
querystringYesThe question to ask about the context
streamstringNoEnable streaming: “true” or “false” (default: “false”)

Request Example (Python)

1import requests
2
3# Configuration
4API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
5ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
6BASE_URL = "https://api.runcaptain.com"
7
8headers = {
9 "Authorization": f"Bearer {API_KEY}",
10 "X-Organization-ID": ORG_ID
11}
12
13# Non-streaming request
14response = requests.post(
15 f"{BASE_URL}/v1/responses",
16 headers=headers,
17 data={
18 'input': 'Your large document text here...',
19 'query': 'What are the main themes?'
20 }
21)
22
23result = response.json()
24print(result['response'])

Request Example (JavaScript)

1const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
2const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
3const BASE_URL = 'https://api.runcaptain.com';
4
5const response = await fetch(`${BASE_URL}/v1/responses`, {
6 method: 'POST',
7 headers: {
8 'Authorization': `Bearer ${API_KEY}`,
9 'X-Organization-ID': ORG_ID,
10 'Content-Type': 'application/x-www-form-urlencoded'
11 },
12 body: new URLSearchParams({
13 'input': 'Your large document text here...',
14 'query': 'What are the main themes?'
15 })
16});
17
18const result = await response.json();
19console.log(result.response);

Request Example (cURL)

$curl -X POST https://api.runcaptain.com/v1/responses \
> -H "Authorization: Bearer cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
> -H "X-Organization-ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
> -d "input=Your large document text here..." \
> -d "query=What are the main themes?"

Streaming Example

1response = requests.post(
2 f"{BASE_URL}/v1/responses",
3 headers=headers,
4 data={
5 'input': 'Your large document text here...',
6 'query': 'What are the main themes?',
7 'stream': 'true'
8 },
9 stream=True # Important: Enable streaming in requests
10)
11
12for line in response.iter_lines():
13 if line:
14 line_text = line.decode('utf-8')
15 if line_text.startswith('data: '):
16 data = line_text[6:] # Remove 'data: ' prefix
17 print(data, end='', flush=True)

Response Format

Non-Streaming Response:

1{
2 "status": "success",
3 "response": "The document explores three main themes: cloud computing evolution, security best practices, and cost optimization strategies.",
4 "request_id": "resp_1729876543_a1b2c3d4"
5}

Streaming Response (SSE):

data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " cloud computing evolution,"}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}

SDK Integration

Captain supports multiple SDK approaches for different use cases:

Supported SDKs

  1. Python SDK → - Official OpenAI Python SDK with extra_body
  2. JavaScript SDK → - Official OpenAI JavaScript SDK with extra_body (⭐ Recommended for JS/TS)
  3. Vercel AI SDK → - Vercel’s AI SDK with custom header approach
  4. Direct HTTP → - Maximum control with fetch/requests

Python SDK

Use Captain with the official OpenAI Python SDK for a familiar developer experience.

Installation

$pip install openai

Captain separates your instructions from your context using the OpenAI SDK’s standard message format combined with the extra_body parameter.

Key Concepts:

  • System messages: Provide instructions to the AI (e.g., “You are a helpful assistant…”)
  • User messages: Contain your query or question
  • extra_body.captain.context: Provides your large context/documents (can be millions of tokens)
1#!/usr/bin/env python3
2"""
3Captain OpenAI-Compatible Client - Recommended Approach
4"""
5from openai import OpenAI
6
7# Configuration
8BASE_URL = "https://api.runcaptain.com/v1"
9API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
10ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
11
12# Initialize OpenAI client pointing to Captain
13client = OpenAI(
14 base_url=BASE_URL,
15 api_key=API_KEY,
16 default_headers={
17 "X-Organization-ID": ORG_ID
18 }
19)
20
21# Load your large document
22print("Loading text file...")
23with open("large_document.txt", "r") as f:
24 context = f.read()
25
26print(f"Loaded {len(context):,} characters")
27print("-" * 50)
28
29# Captain approach: instructions in messages, context in extra_body
30response = client.chat.completions.create(
31 model="captain-voyager-latest",
32 messages=[
33 {"role": "system", "content": "You are a helpful assistant specialized in document analysis."},
34 {"role": "user", "content": "What are the main themes in this document?"}
35 ],
36 stream=True,
37 temperature=0.7,
38 extra_body={
39 "captain": {
40 "context": context # Large context goes here
41 }
42 }
43)
44
45# Stream the response
46print("Response: ", end="", flush=True)
47for chunk in response:
48 if chunk.choices[0].delta.content:
49 print(chunk.choices[0].delta.content, end="", flush=True)
50
51print("
52" + "-" * 50)
53print("Done!")

Why use this approach:

  • ✅ Clean separation: instructions vs. context
  • ✅ Your system prompts pass directly to the AI (not replaced)
  • ✅ Context properly chunked and processed for unlimited size
  • ✅ Compatible with OpenAI SDK patterns

Alternative: File Upload Endpoint (for very large files)

For files so large that loading them into memory is impractical, use the dedicated multipart upload endpoint:

1#!/usr/bin/env python3
2"""
3Captain File Upload for Very Large Contexts (>10MB)
4"""
5import json
6import requests
7
8# Configuration
9BASE_URL = "https://api.runcaptain.com/v1"
10API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
11ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
12
13# Load large file
14with open("very_large_file.txt", "r", encoding="utf-8") as f:
15 context = f.read()
16
17context_size = len(context.encode('utf-8'))
18print(f"Uploading {context_size:,} bytes via multipart...")
19
20# Prepare multipart form data
21url = f"{BASE_URL}/chat/completions/upload"
22headers = {
23 "Authorization": f"Bearer {API_KEY}",
24 "X-Organization-ID": ORG_ID
25}
26
27files = {
28 'file': ('context.txt', context.encode('utf-8'), 'text/plain')
29}
30
31data = {
32 'messages': json.dumps([
33 {"role": "user", "content": "What are the main themes?"}
34 ]),
35 'model': 'captain-voyager-latest',
36 'stream': 'true',
37 'temperature': '0.7'
38}
39
40# Make streaming request
41response = requests.post(url, headers=headers, files=files, data=data, stream=True)
42
43# Parse SSE stream
44print("Response: ", end="", flush=True)
45for line in response.iter_lines():
46 if line:
47 line = line.decode('utf-8')
48 if line.startswith('data: '):
49 data_str = line[6:]
50 if data_str == '[DONE]':
51 break
52 try:
53 chunk_data = json.loads(data_str)
54 if 'choices' in chunk_data:
55 delta = chunk_data['choices'][0].get('delta', {})
56 if 'content' in delta and delta['content']:
57 print(delta['content'], end="", flush=True)
58 except json.JSONDecodeError:
59 pass
60
61print("\n" + "-" * 50)
62print("Done!")

When to use this endpoint:

  • ✅ Very large files where loading into memory first is impractical
  • ✅ When you want to upload files directly via multipart form data
  • ✅ For workflows where you’re already using requests library instead of OpenAI SDK

Note: Captain handles files of any size automatically - you only need this endpoint if you want to use multipart uploads instead of the standard OpenAI SDK approach.

Message Roles Explained

System Role ({"role": "system", ...}):

  • Provides instructions and guidance to the AI (OPTIONAL)
  • Sets the AI’s behavior and personality
  • Examples: “You are a legal expert”, “Be concise”, “Focus on security aspects”
  • Passes directly to the AI model (not replaced by Captain)
  • Completely optional - omit to use Captain’s default helpful persona
  • When provided, your instructions take priority over Captain’s defaults

User Role ({"role": "user", ...}):

  • Contains your query or question
  • What you want to know about the context
  • Examples: “What are the main themes?”, “List all vulnerabilities”

Context (extra_body.captain.context):

  • Your large documents, text, or data to analyze
  • Can be millions of tokens (unlimited size)
  • Automatically chunked and processed by Captain
  • Examples: Full contracts, codebases, research papers

Example 1: Custom System Prompt (Define Your Own Role)

1response = client.chat.completions.create(
2 model="captain-voyager-latest",
3 messages=[
4 {"role": "system", "content": "You are Dr. Watson, a medical expert specializing in research analysis"},
5 {"role": "user", "content": "What are the key findings?"}
6 ],
7 extra_body={
8 "captain": {
9 "context": medical_research_papers # Large document(s)
10 }
11 }
12)
13# AI responds as Dr. Watson with your custom medical expertise

Example 2: Captain’s Default Persona (No System Prompt)

1response = client.chat.completions.create(
2 model="captain-voyager-latest",
3 messages=[
4 {"role": "user", "content": "What are the key findings?"}
5 ],
6 extra_body={
7 "captain": {
8 "context": medical_research_papers # Large document(s)
9 }
10 }
11)
12# AI responds with Captain's default helpful, informative persona

Important: Context must be provided via extra_body. Do not place large documents in system or user messages.


JavaScript SDK

Use the official OpenAI JavaScript SDK with Captain - recommended for most TypeScript/JavaScript projects.

Installation

$npm install openai

Basic Example

1import OpenAI from 'openai';
2
3const client = new OpenAI({
4 apiKey: process.env.CAPTAIN_API_KEY,
5 baseURL: 'https://api.runcaptain.com/v1',
6 defaultHeaders: {
7 'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
8 },
9});
10
11const context = `
12Company Policies:
13- Remote work: Allowed 3 days/week
14- Vacation: 20 days per year
15`;
16
17const response = await client.chat.completions.create({
18 model: 'captain-voyager-latest',
19 messages: [
20 { role: 'user', content: "What's the remote work policy?" }
21 ],
22 extra_body: {
23 captain: {
24 context: context
25 }
26 },
27});
28
29console.log(response.choices[0].message.content);

Streaming

1const stream = await client.chat.completions.create({
2 model: 'captain-voyager-latest',
3 messages: [
4 { role: 'user', content: "Summarize this document" }
5 ],
6 stream: true,
7 extra_body: {
8 captain: {
9 context: largeDocument
10 }
11 },
12});
13
14for await (const chunk of stream) {
15 process.stdout.write(chunk.choices[0]?.delta?.content || '');
16}

Vercel AI SDK

Use Vercel’s AI SDK with Captain via custom header approach or upload endpoint.

Note:

  • For small contexts (<4KB): Use base64-encoded X-Captain-Context header
  • For large contexts (>4KB): Use /v1/chat/completions/upload endpoint with FormData

Installation

$npm install @ai-sdk/openai ai

Small Context Example

1import { createOpenAI } from '@ai-sdk/openai';
2import { streamText } from 'ai';
3
4const context = `
5Company Policies:
6- Vacation: 20 days per year
7- Remote work: 3 days per week
8`;
9
10// Base64 encode the context for header transmission
11const contextBase64 = Buffer.from(context).toString('base64');
12
13const captain = createOpenAI({
14 apiKey: process.env.CAPTAIN_API_KEY,
15 baseURL: 'https://api.runcaptain.com/v1',
16 headers: {
17 'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
18 'X-Captain-Context': contextBase64,
19 },
20});
21
22const { textStream } = await streamText({
23 model: captain.chat('captain-voyager-latest'),
24 messages: [
25 { role: 'user', content: 'What is the vacation policy?' }
26 ],
27});
28
29for await (const chunk of textStream) {
30 process.stdout.write(chunk);
31}

Large Context Example (Upload Endpoint)

For contexts larger than ~4KB, use the upload endpoint:

1const largeContext = `...your large document...`;
2
3const formData = new FormData();
4const blob = new Blob([largeContext], { type: 'text/plain' });
5formData.append('file', blob, 'context.txt');
6formData.append('messages', JSON.stringify([
7 { role: 'user', content: 'Summarize this document' }
8]));
9formData.append('model', 'captain-voyager-latest');
10formData.append('stream', 'true');
11
12const response = await fetch('https://api.runcaptain.com/v1/chat/completions/upload', {
13 method: 'POST',
14 headers: {
15 'Authorization': `Bearer ${process.env.CAPTAIN_API_KEY}`,
16 'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
17 },
18 body: formData
19});
20
21// Parse SSE stream...

For complete Vercel AI SDK documentation, see Vercel AI SDK Guide.


Direct HTTP Fetch

For maximum control, use direct HTTP requests with the captain parameter.

Basic Example

1const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
2const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
3
4const response = await fetch('https://api.runcaptain.com/v1/chat/completions', {
5 method: 'POST',
6 headers: {
7 'Authorization': `Bearer ${API_KEY}`,
8 'X-Organization-ID': ORG_ID,
9 'Content-Type': 'application/json'
10 },
11 body: JSON.stringify({
12 model: 'captain-voyager-latest',
13 messages: [
14 { role: 'user', content: 'What is the vacation policy?' }
15 ],
16 captain: {
17 context: context
18 }
19 })
20});
21
22const result = await response.json();
23console.log(result.choices[0].message.content);

SDK Parameters

Common parameters across all SDKs:

ParameterTypeDefaultDescription
modelstring”captain-voyager-latest”Model to use (currently only captain-voyager-latest)
messagesarrayRequiredArray of message objects with role and content
temperaturefloat0.7Randomness (0.0-2.0)
max_tokensinteger16000Maximum tokens in response
streambooleanfalseEnable streaming responses
top_pfloat0.95Nucleus sampling parameter
toolsarraynullArray of tool definitions for function calling
tool_choicestring”auto”Control tool usage: “auto”, “none”, or tool name

POST /v1/chat/completions

Standard OpenAI-compatible chat completions endpoint. Use this for contexts that fit in memory (< 1 MB) or when using the OpenAI Python SDK.

Parameters

ParameterTypeRequiredDescription
modelstringNoModel name (default: “captain-voyager-latest”)
messagesarrayYesArray of message objects with role and content
temperaturefloatNoSampling temperature 0.0-2.0 (default: 0.7)
max_tokensintegerNoMaximum response tokens (default: 16000)
streambooleanNoEnable streaming responses (default: false)
top_pfloatNoNucleus sampling 0.0-1.0 (default: 0.95)
toolsarrayNoTool definitions for function calling (default: null)
tool_choicestringNoControl tool usage: “auto”, “none” (default: “auto”)

Captain-specific extensions (in extra_body):

ParameterTypeDescription
captain.contextstringLarge text context (alternative to system messages)

Request Example (Using OpenAI SDK)

See the “OpenAI SDK Integration” section below for complete examples.

Response Format

Streaming response (SSE format):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"Hello"}}]}
data: [DONE]

Non-streaming response:

1{
2 "id": "chatcmpl-123",
3 "object": "chat.completion",
4 "created": 1234567890,
5 "model": "captain-voyager-latest",
6 "choices": [
7 {
8 "index": 0,
9 "message": {
10 "role": "assistant",
11 "content": "The document explores three main themes..."
12 },
13 "finish_reason": "stop"
14 }
15 ],
16 "usage": {
17 "prompt_tokens": 150000,
18 "completion_tokens": 500,
19 "total_tokens": 150500
20 }
21}

POST /v1/chat/completions/upload

Note: As of the latest update, the standard /v1/chat/completions endpoint now automatically handles large contexts (>250KB). This upload endpoint is required if you have larger txt files.

Upload extremely large files (1MB+) directly with your chat completion request using multipart form data.

Use this endpoint for:

  • Explicit multipart file uploads
  • Legacy integrations requiring direct file upload
  • When you want manual control over file upload process

Parameters (Multipart Form Data)

ParameterTypeRequiredDescription
filefileYesText file to process (supports .txt, can be 100MB+)
messagesstringYesJSON-encoded array of chat messages
modelstringNoModel name (default: “captain-voyager-latest”)
streamstringNoEnable streaming: “true” or “false” (default: “true”)
temperaturestringNoSampling temperature (default: “0.7”)
max_tokensstringNoMaximum response tokens (default: 16000)

Note: Form data values must be strings. Boolean values like stream should be sent as "true" or "false".

Request Example (HTTP Multipart Upload)

Note: This uses direct HTTP requests, not the OpenAI SDK. For OpenAI SDK usage, use the standard /v1/chat/completions endpoint which now handles large contexts automatically.

1#!/usr/bin/env python3
2"""
3Captain HTTP API - Multipart File Upload for Large Contexts
4"""
5import json
6import requests
7
8# Configuration
9BASE_URL = "https://api.runcaptain.com/v1"
10API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
11ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
12
13# Load large text file (can be 100MB+)
14print("Loading text file...")
15file_path = "large_document.txt"
16with open(file_path, "r", encoding="utf-8") as f:
17 context = f.read()
18
19context_size = len(context.encode('utf-8'))
20print(f"Loaded {len(context):,} characters ({context_size:,} bytes)")
21print("-" * 50)
22
23# Prepare request
24url = f"{BASE_URL}/chat/completions/upload"
25headers = {
26 "Authorization": f"Bearer {API_KEY}",
27 "X-Organization-ID": ORG_ID
28}
29
30messages = [
31 {
32 "role": "user",
33 "content": "What are the main themes in this corpus?",
34 }
35]
36
37# Prepare multipart form data
38files = {
39 'file': ('context.txt', context.encode('utf-8'), 'text/plain')
40}
41
42data = {
43 'messages': json.dumps(messages),
44 'model': 'captain-voyager-latest',
45 'stream': 'true', # Enable streaming
46 'temperature': '0.7'
47}
48
49print(f"Uploading {context_size:,} bytes via multipart...")
50
51# Make streaming request
52response = requests.post(url, headers=headers, files=files, data=data, stream=True)
53
54if response.status_code != 200:
55 print(f"Error: {response.status_code}")
56 print(response.text)
57 exit(1)
58
59print("Response: ", end="", flush=True)
60
61# Parse SSE stream (OpenAI format)
62for line in response.iter_lines():
63 if line:
64 line = line.decode('utf-8')
65 if line.startswith('data: '):
66 data_str = line[6:] # Remove 'data: ' prefix
67 if data_str == '[DONE]':
68 break
69 try:
70 chunk_data = json.loads(data_str)
71 if 'choices' in chunk_data and len(chunk_data['choices']) > 0:
72 delta = chunk_data['choices'][0].get('delta', {})
73 if 'content' in delta and delta['content']:
74 print(delta['content'], end="", flush=True)
75 except json.JSONDecodeError:
76 pass
77
78print("\n" + "-" * 50)
79print("Done!")

How It Works

  1. File Upload: Your large file is processed automatically
  2. Parallel Processing: The file is split into chunks (80k tokens each)
  3. Worker Execution: 15+ parallel workers process chunks simultaneously
  4. Compression: Each worker compresses its chunk to ~8-10k tokens
  5. Response Streaming: The reducer streams the final response in real-time

For a 4.6MB Bible text:

  • Creates 15 chunks of ~80k tokens each
  • Runs 15 parallel workers
  • Each worker processes ~20k tokens
  • Total processing time: ~10-15 seconds
  • Streams response as it’s generated

Response Format

Returns streaming response in OpenAI format with [DONE] marker:

data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"The document"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":" explores"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Important Notes

  • Streaming Recommended: Always use stream=true for best experience
  • Wait Time: First chunk may take 10-30 seconds for very large files (worker startup time)
  • File Size: Supports files up to 100MB+
  • Encoding: Files must be UTF-8 encoded text
  • Timeout: The stream will wait up to 60 seconds for processing to start

Error Responses

400 Bad Request

1{
2 "status": "error",
3 "message": "Input text is required",
4 "error_code": "MISSING_INPUT"
5}

401 Unauthorized

1{
2 "status": "error",
3 "message": "Invalid API key",
4 "error_code": "INVALID_API_KEY"
5}

413 Payload Too Large

1{
2 "status": "error",
3 "message": "Input exceeds maximum size. Please use /v1/responses/upload endpoint for files >100MB",
4 "error_code": "PAYLOAD_TOO_LARGE"
5}

500 Internal Server Error

1{
2 "status": "error",
3 "message": "Failed to process request",
4 "error_code": "PROCESSING_ERROR",
5 "request_id": "resp_1729876543_a1b2c3d4"
6}

Tool Calling

Enable the LLM to call functions and tools while processing your documents. Tool calling allows the model to request external operations (calculations, API calls, data lookups) during response generation.

Overview

Tool calling works with both streaming and non-streaming modes. When tools are provided, the LLM autonomously decides whether to use them based on your query and the available context.

Key Requirements:

  • All tools must have "strict": true in the function definition (Cerebras requirement)
  • Tools work with any context size (small or infinite)
  • Compatible with both streaming (stream=true) and non-streaming modes

Basic Example

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.runcaptain.com/v1",
5 api_key="YOUR_API_KEY",
6 default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7)
8
9# Define tools
10tools = [
11 {
12 "type": "function",
13 "function": {
14 "name": "calculate",
15 "description": "Perform basic arithmetic operations",
16 "parameters": {
17 "type": "object",
18 "properties": {
19 "operation": {
20 "type": "string",
21 "enum": ["add", "subtract", "multiply", "divide"]
22 },
23 "a": {"type": "number"},
24 "b": {"type": "number"}
25 },
26 "required": ["operation", "a", "b"]
27 },
28 "strict": True # Required for Captain/Cerebras
29 }
30 }
31]
32
33# Make request with tools
34response = client.chat.completions.create(
35 model="captain-voyager-latest",
36 messages=[
37 {"role": "system", "content": "Q1 Revenue: $125,000\nQ2 Revenue: $150,000"},
38 {"role": "user", "content": "What's the total revenue for Q1 and Q2?"}
39 ],
40 tools=tools,
41 stream=True
42)
43
44for chunk in response:
45 if chunk.choices[0].delta.content:
46 print(chunk.choices[0].delta.content, end="", flush=True)

Tool Definition Format

Each tool must follow OpenAI’s function calling schema:

1{
2 "type": "function",
3 "function": {
4 "name": "tool_name", # Function name (alphanumeric + underscores)
5 "description": "What it does", # Clear description for the LLM
6 "parameters": { # JSON Schema for parameters
7 "type": "object",
8 "properties": {
9 "param_name": {
10 "type": "string|number|boolean|array|object",
11 "description": "Parameter description"
12 }
13 },
14 "required": ["param_name"]
15 },
16 "strict": True # REQUIRED
17}

Complete Example: Multiple Tools

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.runcaptain.com/v1",
5 api_key="YOUR_API_KEY",
6 default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7)
8
9# Define multiple tools
10tools = [
11 {
12 "type": "function",
13 "function": {
14 "name": "calculate",
15 "description": "Perform arithmetic calculations",
16 "parameters": {
17 "type": "object",
18 "properties": {
19 "operation": {
20 "type": "string",
21 "enum": ["add", "subtract", "multiply", "divide"]
22 },
23 "a": {"type": "number"},
24 "b": {"type": "number"}
25 },
26 "required": ["operation", "a", "b"]
27 },
28 "strict": True
29 }
30 },
31 {
32 "type": "function",
33 "function": {
34 "name": "analyze_sentiment",
35 "description": "Analyze sentiment of text (positive, negative, neutral)",
36 "parameters": {
37 "type": "object",
38 "properties": {
39 "text": {
40 "type": "string",
41 "description": "Text to analyze"
42 }
43 },
44 "required": ["text"]
45 },
46 "strict": True
47 }
48 }
49]
50
51# Load large document
52with open("quarterly_report.txt") as f:
53 context = f.read()
54
55# Request with tools
56response = client.chat.completions.create(
57 model="captain-voyager-latest",
58 messages=[
59 {"role": "system", "content": "You are a financial analysis assistant."},
60 {"role": "user", "content": "Calculate total Q1-Q4 revenue and analyze overall sentiment"}
61 ],
62 tools=tools,
63 stream=True,
64 extra_body={
65 "captain": {
66 "context": context # Large quarterly report context
67 }
68 }
69)
70
71for chunk in response:
72 if chunk.choices[0].delta.content:
73 print(chunk.choices[0].delta.content, end="", flush=True)

Tool Choice Parameter

Control when the LLM should use tools:

1response = client.chat.completions.create(
2 model="captain-voyager-latest",
3 messages=[...],
4 tools=tools,
5 tool_choice="auto", # Let model decide (default)
6 # tool_choice="none", # Never use tools
7 stream=True
8)

Streaming vs Non-Streaming with Tools

Streaming Mode:

1response = client.chat.completions.create(
2 model="captain-voyager-latest",
3 messages=[...],
4 tools=tools,
5 stream=True # Recommended for real-time feedback
6)
7
8for chunk in response:
9 if chunk.choices[0].delta.content:
10 print(chunk.choices[0].delta.content, end="", flush=True)

Non-Streaming Mode:

1response = client.chat.completions.create(
2 model="captain-voyager-latest",
3 messages=[...],
4 tools=tools,
5 stream=False # Get complete response at once
6)
7
8print(response.choices[0].message.content)

Tool Calling with Large Context

Tool calling works seamlessly with Captain’s infinite context processing. Simply provide your large document and tools - Captain handles everything automatically:

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.runcaptain.com/v1",
5 api_key="YOUR_API_KEY",
6 default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7)
8
9# Load very large document (Bible, entire codebase, etc.)
10with open("very_large_document.txt") as f:
11 context = f.read() # Captain handles any size automatically
12
13# Define tools
14tools = [{
15 "type": "function",
16 "function": {
17 "name": "calculate",
18 "description": "Perform arithmetic calculations on values in the document",
19 "parameters": {
20 "type": "object",
21 "properties": {
22 "operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
23 "a": {"type": "number"},
24 "b": {"type": "number"}
25 },
26 "required": ["operation", "a", "b"]
27 },
28 "strict": True
29 }
30}]
31
32# Use standard OpenAI SDK - Captain handles the large context + tool calling
33response = client.chat.completions.create(
34 model="captain-voyager-latest",
35 messages=[
36 {"role": "system", "content": "You are a financial analysis assistant."},
37 {"role": "user", "content": "Calculate the total revenue across all quarters"}
38 ],
39 tools=tools,
40 stream=True,
41 extra_body={
42 "captain": {
43 "context": context # Any size context
44 }
45 }
46)
47
48for chunk in response:
49 if chunk.choices[0].delta.content:
50 print(chunk.choices[0].delta.content, end="", flush=True)

Key Points:

  • Captain automatically handles large contexts (100M+ tokens)
  • No special upload needed - just use system messages
  • Tools + infinite context work together seamlessly
  • The LLM can use tools while processing massive documents

How Tool Calling Works

  1. LLM Analysis: Model analyzes your query and available tools
  2. Tool Invocation: Model requests tool execution with specific parameters
  3. Result Processing: Tool results are fed back to the model
  4. Final Response: Model generates final answer using tool results

Important Notes:

  • Tools are logged but not executed by default (you see what the LLM would call)
  • The LLM autonomously decides which tools to use and when
  • Multiple tool calls may occur for complex queries
  • Works seamlessly with infinite context processing

Supported Models

Tool calling is supported on:

  • captain-voyager-latest (default)

Best Practices for Tool Calling

Do:

  • ✅ Provide clear, descriptive tool names and descriptions
  • ✅ Use specific parameter descriptions
  • ✅ Set "strict": True in all function definitions
  • ✅ Test with streaming mode for better UX
  • ✅ Keep tool parameters simple and well-defined

Don’t:

  • ❌ Forget "strict": True
  • ❌ Use vague tool descriptions
  • ❌ Define too many tools (keep it focused)
  • ❌ Expect tool auto-execution (currently logged only)

Troubleshooting

Tool not being called:

  • Ensure "strict": True is set
  • Make description more specific to your use case
  • Try being more explicit in your query

Invalid tool schema error:

  • Verify JSON schema format in parameters
  • Check all required fields are present
  • Ensure type values are valid JSON Schema types

Best Practices

Input Size Optimization

Captain handles files of any size automatically. You have two options:

  • Standard approach: Use OpenAI SDK with system messages or captain.context - works for any size
  • Multipart upload: Use /v1/chat/completions/upload for very large files if you prefer multipart form data

Streaming vs Non-Streaming

Use Streaming When:

  • You want real-time responses
  • Processing very large documents (reduces perceived latency)
  • Building chat interfaces
  • Need to show progress to users

Use Non-Streaming When:

  • You need the complete response at once
  • Processing in batch jobs
  • Storing responses in databases
  • Simpler implementation needed

Query Optimization

Good Queries:

  • “What are the main conclusions in section 3?”
  • “Summarize the methodology described in the paper”
  • “List all security vulnerabilities mentioned”

Avoid:

  • Vague queries: “Tell me about this”
  • Multiple questions: “What are the themes, findings, and recommendations?”
  • Yes/no questions without context: “Is this good?”

Better Approach:

  • Break complex queries into separate requests
  • Be specific about what sections or topics to focus on
  • Ask for structured output: “List the top 5…”

Use Cases

Process entire legal contracts, case files, or regulatory documents to extract key clauses, risks, and obligations.

Scientific Paper Review

Analyze research papers, extract methodologies, findings, and compare multiple papers simultaneously.

Code Repository Analysis

Upload entire codebases to understand architecture, identify patterns, or find specific implementations.

Financial Report Processing

Process annual reports, 10-Ks, earnings transcripts to extract financial metrics and strategic insights.

Customer Support Ticket Analysis

Analyze thousands of support tickets to identify common issues, trends, and resolution patterns.


Rate Limits

TierRequests per MinuteMax Input Size
Standard1050MB
Premium60Unlimited

Contact support@runcaptain.com to upgrade to Premium tier.



Choosing the Right Endpoint

Use this guide to select the best endpoint for your use case:

/v1/chat/completions (OpenAI SDK)

Best for:

  • Standard contexts (< 1 MB)
  • Contexts that fit in memory
  • Using the official OpenAI Python SDK
  • Familiar OpenAI-compatible interface

Limits:

  • No size limits - Captain handles any file size automatically

Example use case:

1from openai import OpenAI
2
3client = OpenAI(base_url="https://api.runcaptain.com/v1", api_key="...")
4response = client.chat.completions.create(
5 model="captain-voyager-latest",
6 messages=[{"role": "user", "content": "Summarize this"}],
7 extra_body={"captain": {"context": "Your text here..."}}
8)

/v1/chat/completions/upload (Multipart Upload)

Best for:

  • Very large files (1 MB - 100 MB+)
  • Files already on disk
  • Maximum performance for huge contexts
  • Parallel processing of massive texts

Advantages:

  • Direct file upload (no encoding overhead)
  • Optimized for 10 MB+ files
  • Automatic parallel processing
  • Streaming starts immediately after upload

Example use case:

1import requests
2
3with open('bible.txt', 'rb') as f:
4 response = requests.post(
5 "https://api.runcaptain.com/v1/chat/completions/upload",
6 headers={"Authorization": f"Bearer {API_KEY}", "X-Organization-ID": ORG_ID},
7 files={'file': f},
8 data={'messages': '[{"role":"user","content":"Summarize"}]', 'stream': 'true'},
9 stream=True
10 )