Captain API | Captain API Documentation

The Captain API enables you to process large context (millions of tokens) and ask questions using the familiar OpenAI SDK. Unlike traditional LLMs with limited context windows, Captain can handle unlimited context through intelligent chunking, parallel processing, and generative merging.

Key Features

Unlimited Context: Process millions of tokens in a single request
Multiple Input Methods: Inline text or file upload
Real-time Streaming: Get responses as they’re generated via Server-Sent Events
OpenAI SDK Compatible: Drop-in replacement for OpenAI API
Tool Calling Support: Enable LLM to call functions and tools during processing
Automatic Optimization: Large inputs handled automatically behind the scenes
Intelligent Processing: 30-40% chunk overlap for accuracy with parallel LLM processing

Authentication

All requests require authentication:

1 Authorization: Bearer YOUR_API_KEY
2 X-Organization-ID: YOUR_ORG_UUID

Streaming Responses

Enable real-time streaming to receive responses as they’re generated using Server-Sent Events (SSE).

Enable Streaming

Set stream=true in your request:

1 import requests
2 
3 # Configuration
4 API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
5 ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
6 
7 headers = {
8     "Authorization": f"Bearer {API_KEY}",
9     "X-Organization-ID": ORG_ID
10 }
11 
12 response = requests.post(
13     "https://api.runcaptain.com/v1/responses",
14     headers=headers,
15     data={
16         'input': 'Large text content...',
17         'query': 'What are the main themes?',
18         'stream': 'true'
19     },
20     stream=True  # Important: Enable streaming in requests library
21 )
22 
23 # Process streamed chunks
24 for line in response.iter_lines():
25     if line:
26         line_text = line.decode('utf-8')
27         if line_text.startswith('data: '):
28             data = line_text[6:]  # Remove 'data: ' prefix
29             print(data, end='', flush=True)

Stream Response Format

Streaming responses use Server-Sent Events (SSE) format:

data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " 1) Cloud computing evolution..."}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}

Stream Events

Event Type	Description
`chunk`	Content chunk (streamed response text)
`complete`	Stream finished successfully
`error`	Error occurred during processing

POST /v1/responses

Direct HTTP endpoint for infinite context processing. Use this endpoint when making direct HTTP requests without the OpenAI SDK.

Authentication

All requests require authentication via headers:

1 Authorization: Bearer YOUR_API_KEY
2 X-Organization-ID: YOUR_ORG_UUID

Parameters

Parameter	Type	Required	Description
`input`	string	Yes	Your context/document text (unlimited size)
`query`	string	Yes	The question to ask about the context
`stream`	string	No	Enable streaming: “true” or “false” (default: “false”)

Request Example (Python)

1 import requests
2 
3 # Configuration
4 API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
5 ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
6 BASE_URL = "https://api.runcaptain.com"
7 
8 headers = {
9     "Authorization": f"Bearer {API_KEY}",
10     "X-Organization-ID": ORG_ID
11 }
12 
13 # Non-streaming request
14 response = requests.post(
15     f"{BASE_URL}/v1/responses",
16     headers=headers,
17     data={
18         'input': 'Your large document text here...',
19         'query': 'What are the main themes?'
20     }
21 )
22 
23 result = response.json()
24 print(result['response'])

Request Example (JavaScript)

1 const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
2 const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
3 const BASE_URL = 'https://api.runcaptain.com';
4 
5 const response = await fetch(`${BASE_URL}/v1/responses`, {
6   method: 'POST',
7   headers: {
8     'Authorization': `Bearer ${API_KEY}`,
9     'X-Organization-ID': ORG_ID,
10     'Content-Type': 'application/x-www-form-urlencoded'
11   },
12   body: new URLSearchParams({
13     'input': 'Your large document text here...',
14     'query': 'What are the main themes?'
15   })
16 });
17 
18 const result = await response.json();
19 console.log(result.response);

Request Example (cURL)

$ curl -X POST https://api.runcaptain.com/v1/responses \
>   -H "Authorization: Bearer cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
>   -H "X-Organization-ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
>   -d "input=Your large document text here..." \
>   -d "query=What are the main themes?"

Streaming Example

1 response = requests.post(
2     f"{BASE_URL}/v1/responses",
3     headers=headers,
4     data={
5         'input': 'Your large document text here...',
6         'query': 'What are the main themes?',
7         'stream': 'true'
8     },
9     stream=True  # Important: Enable streaming in requests
10 )
11 
12 for line in response.iter_lines():
13     if line:
14         line_text = line.decode('utf-8')
15         if line_text.startswith('data: '):
16             data = line_text[6:]  # Remove 'data: ' prefix
17             print(data, end='', flush=True)

Response Format

Non-Streaming Response:

1 {
2   "status": "success",
3   "response": "The document explores three main themes: cloud computing evolution, security best practices, and cost optimization strategies.",
4   "request_id": "resp_1729876543_a1b2c3d4"
5 }

Streaming Response (SSE):

data: {"type": "chunk", "data": "The document explores"}
data: {"type": "chunk", "data": " three main themes:"}
data: {"type": "chunk", "data": " cloud computing evolution,"}
event: complete
data: {"status": "success", "request_id": "resp_1729876543_a1b2c3d4"}

SDK Integration

Captain supports multiple SDK approaches for different use cases:

Supported SDKs

Python SDK → - Official OpenAI Python SDK with extra_body
JavaScript SDK → - Official OpenAI JavaScript SDK with extra_body (⭐ Recommended for JS/TS)
Vercel AI SDK → - Vercel’s AI SDK with custom header approach
Direct HTTP → - Maximum control with fetch/requests

Python SDK

Use Captain with the official OpenAI Python SDK for a familiar developer experience.

Installation

$ pip install openai

Using Captain with OpenAI SDK ⭐ Recommended

Captain separates your instructions from your context using the OpenAI SDK’s standard message format combined with the extra_body parameter.

Key Concepts:

System messages: Provide instructions to the AI (e.g., “You are a helpful assistant…”)
User messages: Contain your query or question
extra_body.captain.context: Provides your large context/documents (can be millions of tokens)

1 #!/usr/bin/env python3
2 """
3 Captain OpenAI-Compatible Client - Recommended Approach
4 """
5 from openai import OpenAI
6 
7 # Configuration
8 BASE_URL = "https://api.runcaptain.com/v1"
9 API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
10 ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
11 
12 # Initialize OpenAI client pointing to Captain
13 client = OpenAI(
14     base_url=BASE_URL,
15     api_key=API_KEY,
16     default_headers={
17         "X-Organization-ID": ORG_ID
18     }
19 )
20 
21 # Load your large document
22 print("Loading text file...")
23 with open("large_document.txt", "r") as f:
24     context = f.read()
25 
26 print(f"Loaded {len(context):,} characters")
27 print("-" * 50)
28 
29 # Captain approach: instructions in messages, context in extra_body
30 response = client.chat.completions.create(
31     model="captain-voyager-latest",
32     messages=[
33         {"role": "system", "content": "You are a helpful assistant specialized in document analysis."},
34         {"role": "user", "content": "What are the main themes in this document?"}
35     ],
36     stream=True,
37     temperature=0.7,
38     extra_body={
39         "captain": {
40             "context": context  # Large context goes here
41         }
42     }
43 )
44 
45 # Stream the response
46 print("Response: ", end="", flush=True)
47 for chunk in response:
48     if chunk.choices[0].delta.content:
49         print(chunk.choices[0].delta.content, end="", flush=True)
50 
51 print("
52 " + "-" * 50)
53 print("Done!")

Why use this approach:

✅ Clean separation: instructions vs. context
✅ Your system prompts pass directly to the AI (not replaced)
✅ Context properly chunked and processed for unlimited size
✅ Compatible with OpenAI SDK patterns

Alternative: File Upload Endpoint (for very large files)

For files so large that loading them into memory is impractical, use the dedicated multipart upload endpoint:

1 #!/usr/bin/env python3
2 """
3 Captain File Upload for Very Large Contexts (>10MB)
4 """
5 import json
6 import requests
7 
8 # Configuration
9 BASE_URL = "https://api.runcaptain.com/v1"
10 API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
11 ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
12 
13 # Load large file
14 with open("very_large_file.txt", "r", encoding="utf-8") as f:
15     context = f.read()
16 
17 context_size = len(context.encode('utf-8'))
18 print(f"Uploading {context_size:,} bytes via multipart...")
19 
20 # Prepare multipart form data
21 url = f"{BASE_URL}/chat/completions/upload"
22 headers = {
23     "Authorization": f"Bearer {API_KEY}",
24     "X-Organization-ID": ORG_ID
25 }
26 
27 files = {
28     'file': ('context.txt', context.encode('utf-8'), 'text/plain')
29 }
30 
31 data = {
32     'messages': json.dumps([
33         {"role": "user", "content": "What are the main themes?"}
34     ]),
35     'model': 'captain-voyager-latest',
36     'stream': 'true',
37     'temperature': '0.7'
38 }
39 
40 # Make streaming request
41 response = requests.post(url, headers=headers, files=files, data=data, stream=True)
42 
43 # Parse SSE stream
44 print("Response: ", end="", flush=True)
45 for line in response.iter_lines():
46     if line:
47         line = line.decode('utf-8')
48         if line.startswith('data: '):
49             data_str = line[6:]
50             if data_str == '[DONE]':
51                 break
52             try:
53                 chunk_data = json.loads(data_str)
54                 if 'choices' in chunk_data:
55                     delta = chunk_data['choices'][0].get('delta', {})
56                     if 'content' in delta and delta['content']:
57                         print(delta['content'], end="", flush=True)
58             except json.JSONDecodeError:
59                 pass
60 
61 print("\n" + "-" * 50)
62 print("Done!")

When to use this endpoint:

✅ Very large files where loading into memory first is impractical
✅ When you want to upload files directly via multipart form data
✅ For workflows where you’re already using requests library instead of OpenAI SDK

Note: Captain handles files of any size automatically - you only need this endpoint if you want to use multipart uploads instead of the standard OpenAI SDK approach.

Message Roles Explained

System Role ({"role": "system", ...}):

Provides instructions and guidance to the AI (OPTIONAL)
Sets the AI’s behavior and personality
Examples: “You are a legal expert”, “Be concise”, “Focus on security aspects”
Passes directly to the AI model (not replaced by Captain)
Completely optional - omit to use Captain’s default helpful persona
When provided, your instructions take priority over Captain’s defaults

User Role ({"role": "user", ...}):

Contains your query or question
What you want to know about the context
Examples: “What are the main themes?”, “List all vulnerabilities”

Context (extra_body.captain.context):

Your large documents, text, or data to analyze
Can be millions of tokens (unlimited size)
Automatically chunked and processed by Captain
Examples: Full contracts, codebases, research papers

Example 1: Custom System Prompt (Define Your Own Role)

1 response = client.chat.completions.create(
2     model="captain-voyager-latest",
3     messages=[
4         {"role": "system", "content": "You are Dr. Watson, a medical expert specializing in research analysis"},
5         {"role": "user", "content": "What are the key findings?"}
6     ],
7     extra_body={
8         "captain": {
9             "context": medical_research_papers  # Large document(s)
10         }
11     }
12 )
13 # AI responds as Dr. Watson with your custom medical expertise

Example 2: Captain’s Default Persona (No System Prompt)

1 response = client.chat.completions.create(
2     model="captain-voyager-latest",
3     messages=[
4         {"role": "user", "content": "What are the key findings?"}
5     ],
6     extra_body={
7         "captain": {
8             "context": medical_research_papers  # Large document(s)
9         }
10     }
11 )
12 # AI responds with Captain's default helpful, informative persona

Important: Context must be provided via extra_body. Do not place large documents in system or user messages.

JavaScript SDK

Use the official OpenAI JavaScript SDK with Captain - recommended for most TypeScript/JavaScript projects.

Installation

$ npm install openai

Basic Example

1 import OpenAI from 'openai';
2 
3 const client = new OpenAI({
4   apiKey: process.env.CAPTAIN_API_KEY,
5   baseURL: 'https://api.runcaptain.com/v1',
6   defaultHeaders: {
7     'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
8   },
9 });
10 
11 const context = `
12 Company Policies:
13 - Remote work: Allowed 3 days/week
14 - Vacation: 20 days per year
15 `;
16 
17 const response = await client.chat.completions.create({
18   model: 'captain-voyager-latest',
19   messages: [
20     { role: 'user', content: "What's the remote work policy?" }
21   ],
22   extra_body: {
23     captain: {
24       context: context
25     }
26   },
27 });
28 
29 console.log(response.choices[0].message.content);

Streaming

1 const stream = await client.chat.completions.create({
2   model: 'captain-voyager-latest',
3   messages: [
4     { role: 'user', content: "Summarize this document" }
5   ],
6   stream: true,
7   extra_body: {
8     captain: {
9       context: largeDocument
10     }
11   },
12 });
13 
14 for await (const chunk of stream) {
15   process.stdout.write(chunk.choices[0]?.delta?.content || '');
16 }

Vercel AI SDK

Use Vercel’s AI SDK with Captain via custom header approach or upload endpoint.

Note:

For small contexts (<4KB): Use base64-encoded X-Captain-Context header
For large contexts (>4KB): Use /v1/chat/completions/upload endpoint with FormData

Installation

$ npm install @ai-sdk/openai ai

Small Context Example

1 import { createOpenAI } from '@ai-sdk/openai';
2 import { streamText } from 'ai';
3 
4 const context = `
5 Company Policies:
6 - Vacation: 20 days per year
7 - Remote work: 3 days per week
8 `;
9 
10 // Base64 encode the context for header transmission
11 const contextBase64 = Buffer.from(context).toString('base64');
12 
13 const captain = createOpenAI({
14   apiKey: process.env.CAPTAIN_API_KEY,
15   baseURL: 'https://api.runcaptain.com/v1',
16   headers: {
17     'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
18     'X-Captain-Context': contextBase64,
19   },
20 });
21 
22 const { textStream } = await streamText({
23   model: captain.chat('captain-voyager-latest'),
24   messages: [
25     { role: 'user', content: 'What is the vacation policy?' }
26   ],
27 });
28 
29 for await (const chunk of textStream) {
30   process.stdout.write(chunk);
31 }

Large Context Example (Upload Endpoint)

For contexts larger than ~4KB, use the upload endpoint:

1 const largeContext = `...your large document...`;
2 
3 const formData = new FormData();
4 const blob = new Blob([largeContext], { type: 'text/plain' });
5 formData.append('file', blob, 'context.txt');
6 formData.append('messages', JSON.stringify([
7   { role: 'user', content: 'Summarize this document' }
8 ]));
9 formData.append('model', 'captain-voyager-latest');
10 formData.append('stream', 'true');
11 
12 const response = await fetch('https://api.runcaptain.com/v1/chat/completions/upload', {
13   method: 'POST',
14   headers: {
15     'Authorization': `Bearer ${process.env.CAPTAIN_API_KEY}`,
16     'X-Organization-ID': process.env.CAPTAIN_ORG_ID,
17   },
18   body: formData
19 });
20 
21 // Parse SSE stream...

For complete Vercel AI SDK documentation, see Vercel AI SDK Guide.

Direct HTTP Fetch

For maximum control, use direct HTTP requests with the captain parameter.

Basic Example

1 const API_KEY = 'cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
2 const ORG_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
3 
4 const response = await fetch('https://api.runcaptain.com/v1/chat/completions', {
5   method: 'POST',
6   headers: {
7     'Authorization': `Bearer ${API_KEY}`,
8     'X-Organization-ID': ORG_ID,
9     'Content-Type': 'application/json'
10   },
11   body: JSON.stringify({
12     model: 'captain-voyager-latest',
13     messages: [
14       { role: 'user', content: 'What is the vacation policy?' }
15     ],
16     captain: {
17       context: context
18     }
19   })
20 });
21 
22 const result = await response.json();
23 console.log(result.choices[0].message.content);

SDK Parameters

Common parameters across all SDKs:

Parameter	Type	Default	Description
`model`	string	”captain-voyager-latest”	Model to use (currently only captain-voyager-latest)
`messages`	array	Required	Array of message objects with role and content
`temperature`	float	0.7	Randomness (0.0-2.0)
`max_tokens`	integer	16000	Maximum tokens in response
`stream`	boolean	false	Enable streaming responses
`top_p`	float	0.95	Nucleus sampling parameter
`tools`	array	null	Array of tool definitions for function calling
`tool_choice`	string	”auto”	Control tool usage: “auto”, “none”, or tool name

POST /v1/chat/completions

Standard OpenAI-compatible chat completions endpoint. Use this for contexts that fit in memory (< 1 MB) or when using the OpenAI Python SDK.

Parameters

Parameter	Type	Required	Description
`model`	string	No	Model name (default: “captain-voyager-latest”)
`messages`	array	Yes	Array of message objects with role and content
`temperature`	float	No	Sampling temperature 0.0-2.0 (default: 0.7)
`max_tokens`	integer	No	Maximum response tokens (default: 16000)
`stream`	boolean	No	Enable streaming responses (default: false)
`top_p`	float	No	Nucleus sampling 0.0-1.0 (default: 0.95)
`tools`	array	No	Tool definitions for function calling (default: null)
`tool_choice`	string	No	Control tool usage: “auto”, “none” (default: “auto”)

Captain-specific extensions (in extra_body):

Parameter	Type	Description
`captain.context`	string	Large text context (alternative to system messages)

Request Example (Using OpenAI SDK)

See the “OpenAI SDK Integration” section below for complete examples.

Response Format

Streaming response (SSE format):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"Hello"}}]}
data: [DONE]

Non-streaming response:

1 {
2   "id": "chatcmpl-123",
3   "object": "chat.completion",
4   "created": 1234567890,
5   "model": "captain-voyager-latest",
6   "choices": [
7     {
8       "index": 0,
9       "message": {
10         "role": "assistant",
11         "content": "The document explores three main themes..."
12       },
13       "finish_reason": "stop"
14     }
15   ],
16   "usage": {
17     "prompt_tokens": 150000,
18     "completion_tokens": 500,
19     "total_tokens": 150500
20   }
21 }

POST /v1/chat/completions/upload

Note: As of the latest update, the standard /v1/chat/completions endpoint now automatically handles large contexts (>250KB). This upload endpoint is required if you have larger txt files.

Upload extremely large files (1MB+) directly with your chat completion request using multipart form data.

Use this endpoint for:

Explicit multipart file uploads
Legacy integrations requiring direct file upload
When you want manual control over file upload process

Parameters (Multipart Form Data)

Parameter	Type	Required	Description
`file`	file	Yes	Text file to process (supports .txt, can be 100MB+)
`messages`	string	Yes	JSON-encoded array of chat messages
`model`	string	No	Model name (default: “captain-voyager-latest”)
`stream`	string	No	Enable streaming: “true” or “false” (default: “true”)
`temperature`	string	No	Sampling temperature (default: “0.7”)
`max_tokens`	string	No	Maximum response tokens (default: 16000)

Note: Form data values must be strings. Boolean values like stream should be sent as "true" or "false".

Request Example (HTTP Multipart Upload)

Note: This uses direct HTTP requests, not the OpenAI SDK. For OpenAI SDK usage, use the standard /v1/chat/completions endpoint which now handles large contexts automatically.

1 #!/usr/bin/env python3
2 """
3 Captain HTTP API - Multipart File Upload for Large Contexts
4 """
5 import json
6 import requests
7 
8 # Configuration
9 BASE_URL = "https://api.runcaptain.com/v1"
10 API_KEY = "cap_prod_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
11 ORG_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
12 
13 # Load large text file (can be 100MB+)
14 print("Loading text file...")
15 file_path = "large_document.txt"
16 with open(file_path, "r", encoding="utf-8") as f:
17     context = f.read()
18 
19 context_size = len(context.encode('utf-8'))
20 print(f"Loaded {len(context):,} characters ({context_size:,} bytes)")
21 print("-" * 50)
22 
23 # Prepare request
24 url = f"{BASE_URL}/chat/completions/upload"
25 headers = {
26     "Authorization": f"Bearer {API_KEY}",
27     "X-Organization-ID": ORG_ID
28 }
29 
30 messages = [
31     {
32         "role": "user",
33         "content": "What are the main themes in this corpus?",
34     }
35 ]
36 
37 # Prepare multipart form data
38 files = {
39     'file': ('context.txt', context.encode('utf-8'), 'text/plain')
40 }
41 
42 data = {
43     'messages': json.dumps(messages),
44     'model': 'captain-voyager-latest',
45     'stream': 'true',  # Enable streaming
46     'temperature': '0.7'
47 }
48 
49 print(f"Uploading {context_size:,} bytes via multipart...")
50 
51 # Make streaming request
52 response = requests.post(url, headers=headers, files=files, data=data, stream=True)
53 
54 if response.status_code != 200:
55     print(f"Error: {response.status_code}")
56     print(response.text)
57     exit(1)
58 
59 print("Response: ", end="", flush=True)
60 
61 # Parse SSE stream (OpenAI format)
62 for line in response.iter_lines():
63     if line:
64         line = line.decode('utf-8')
65         if line.startswith('data: '):
66             data_str = line[6:]  # Remove 'data: ' prefix
67             if data_str == '[DONE]':
68                 break
69             try:
70                 chunk_data = json.loads(data_str)
71                 if 'choices' in chunk_data and len(chunk_data['choices']) > 0:
72                     delta = chunk_data['choices'][0].get('delta', {})
73                     if 'content' in delta and delta['content']:
74                         print(delta['content'], end="", flush=True)
75             except json.JSONDecodeError:
76                 pass
77 
78 print("\n" + "-" * 50)
79 print("Done!")

How It Works

File Upload: Your large file is processed automatically
Parallel Processing: The file is split into chunks (80k tokens each)
Worker Execution: 15+ parallel workers process chunks simultaneously
Compression: Each worker compresses its chunk to ~8-10k tokens
Response Streaming: The reducer streams the final response in real-time

For a 4.6MB Bible text:

Creates 15 chunks of ~80k tokens each
Runs 15 parallel workers
Each worker processes ~20k tokens
Total processing time: ~10-15 seconds
Streams response as it’s generated

Response Format

Returns streaming response in OpenAI format with [DONE] marker:

data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":"The document"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{"content":" explores"}}]}
data: {"id":"chatcmpl-resp_1729876543_a1b2c3d4","object":"chat.completion.chunk","created":1729876543,"model":"captain-voyager-latest","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Important Notes

Streaming Recommended: Always use stream=true for best experience
Wait Time: First chunk may take 10-30 seconds for very large files (worker startup time)
File Size: Supports files up to 100MB+
Encoding: Files must be UTF-8 encoded text
Timeout: The stream will wait up to 60 seconds for processing to start

Error Responses

400 Bad Request

1 {
2   "status": "error",
3   "message": "Input text is required",
4   "error_code": "MISSING_INPUT"
5 }

401 Unauthorized

1 {
2   "status": "error",
3   "message": "Invalid API key",
4   "error_code": "INVALID_API_KEY"
5 }

413 Payload Too Large

1 {
2   "status": "error",
3   "message": "Input exceeds maximum size. Please use /v1/responses/upload endpoint for files >100MB",
4   "error_code": "PAYLOAD_TOO_LARGE"
5 }

500 Internal Server Error

1 {
2   "status": "error",
3   "message": "Failed to process request",
4   "error_code": "PROCESSING_ERROR",
5   "request_id": "resp_1729876543_a1b2c3d4"
6 }

Tool Calling

Enable the LLM to call functions and tools while processing your documents. Tool calling allows the model to request external operations (calculations, API calls, data lookups) during response generation.

Overview

Tool calling works with both streaming and non-streaming modes. When tools are provided, the LLM autonomously decides whether to use them based on your query and the available context.

Key Requirements:

All tools must have "strict": true in the function definition (Cerebras requirement)
Tools work with any context size (small or infinite)
Compatible with both streaming (stream=true) and non-streaming modes

Basic Example

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.runcaptain.com/v1",
5     api_key="YOUR_API_KEY",
6     default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7 )
8 
9 # Define tools
10 tools = [
11     {
12         "type": "function",
13         "function": {
14             "name": "calculate",
15             "description": "Perform basic arithmetic operations",
16             "parameters": {
17                 "type": "object",
18                 "properties": {
19                     "operation": {
20                         "type": "string",
21                         "enum": ["add", "subtract", "multiply", "divide"]
22                     },
23                     "a": {"type": "number"},
24                     "b": {"type": "number"}
25                 },
26                 "required": ["operation", "a", "b"]
27             },
28             "strict": True  # Required for Captain/Cerebras
29         }
30     }
31 ]
32 
33 # Make request with tools
34 response = client.chat.completions.create(
35     model="captain-voyager-latest",
36     messages=[
37         {"role": "system", "content": "Q1 Revenue: $125,000\nQ2 Revenue: $150,000"},
38         {"role": "user", "content": "What's the total revenue for Q1 and Q2?"}
39     ],
40     tools=tools,
41     stream=True
42 )
43 
44 for chunk in response:
45     if chunk.choices[0].delta.content:
46         print(chunk.choices[0].delta.content, end="", flush=True)

Tool Definition Format

Each tool must follow OpenAI’s function calling schema:

1 {
2     "type": "function",
3     "function": {
4         "name": "tool_name",           # Function name (alphanumeric + underscores)
5         "description": "What it does",  # Clear description for the LLM
6         "parameters": {                 # JSON Schema for parameters
7             "type": "object",
8             "properties": {
9                 "param_name": {
10                     "type": "string|number|boolean|array|object",
11                     "description": "Parameter description"
12                 }
13             },
14             "required": ["param_name"]
15         },
16         "strict": True  # REQUIRED
17 }

Complete Example: Multiple Tools

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.runcaptain.com/v1",
5     api_key="YOUR_API_KEY",
6     default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7 )
8 
9 # Define multiple tools
10 tools = [
11     {
12         "type": "function",
13         "function": {
14             "name": "calculate",
15             "description": "Perform arithmetic calculations",
16             "parameters": {
17                 "type": "object",
18                 "properties": {
19                     "operation": {
20                         "type": "string",
21                         "enum": ["add", "subtract", "multiply", "divide"]
22                     },
23                     "a": {"type": "number"},
24                     "b": {"type": "number"}
25                 },
26                 "required": ["operation", "a", "b"]
27             },
28             "strict": True
29         }
30     },
31     {
32         "type": "function",
33         "function": {
34             "name": "analyze_sentiment",
35             "description": "Analyze sentiment of text (positive, negative, neutral)",
36             "parameters": {
37                 "type": "object",
38                 "properties": {
39                     "text": {
40                         "type": "string",
41                         "description": "Text to analyze"
42                     }
43                 },
44                 "required": ["text"]
45             },
46             "strict": True
47         }
48     }
49 ]
50 
51 # Load large document
52 with open("quarterly_report.txt") as f:
53     context = f.read()
54 
55 # Request with tools
56 response = client.chat.completions.create(
57     model="captain-voyager-latest",
58     messages=[
59         {"role": "system", "content": "You are a financial analysis assistant."},
60         {"role": "user", "content": "Calculate total Q1-Q4 revenue and analyze overall sentiment"}
61     ],
62     tools=tools,
63     stream=True,
64     extra_body={
65         "captain": {
66             "context": context  # Large quarterly report context
67         }
68     }
69 )
70 
71 for chunk in response:
72     if chunk.choices[0].delta.content:
73         print(chunk.choices[0].delta.content, end="", flush=True)

Tool Choice Parameter

Control when the LLM should use tools:

1 response = client.chat.completions.create(
2     model="captain-voyager-latest",
3     messages=[...],
4     tools=tools,
5     tool_choice="auto",  # Let model decide (default)
6     # tool_choice="none",  # Never use tools
7     stream=True
8 )

Streaming vs Non-Streaming with Tools

Streaming Mode:

1 response = client.chat.completions.create(
2     model="captain-voyager-latest",
3     messages=[...],
4     tools=tools,
5     stream=True  # Recommended for real-time feedback
6 )
7 
8 for chunk in response:
9     if chunk.choices[0].delta.content:
10         print(chunk.choices[0].delta.content, end="", flush=True)

Non-Streaming Mode:

1 response = client.chat.completions.create(
2     model="captain-voyager-latest",
3     messages=[...],
4     tools=tools,
5     stream=False  # Get complete response at once
6 )
7 
8 print(response.choices[0].message.content)

Tool Calling with Large Context

Tool calling works seamlessly with Captain’s infinite context processing. Simply provide your large document and tools - Captain handles everything automatically:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.runcaptain.com/v1",
5     api_key="YOUR_API_KEY",
6     default_headers={"X-Organization-ID": "YOUR_ORG_ID"}
7 )
8 
9 # Load very large document (Bible, entire codebase, etc.)
10 with open("very_large_document.txt") as f:
11     context = f.read()  # Captain handles any size automatically
12 
13 # Define tools
14 tools = [{
15     "type": "function",
16     "function": {
17         "name": "calculate",
18         "description": "Perform arithmetic calculations on values in the document",
19         "parameters": {
20             "type": "object",
21             "properties": {
22                 "operation": {"type": "string", "enum": ["add", "subtract", "multiply", "divide"]},
23                 "a": {"type": "number"},
24                 "b": {"type": "number"}
25             },
26             "required": ["operation", "a", "b"]
27         },
28         "strict": True
29     }
30 }]
31 
32 # Use standard OpenAI SDK - Captain handles the large context + tool calling
33 response = client.chat.completions.create(
34     model="captain-voyager-latest",
35     messages=[
36         {"role": "system", "content": "You are a financial analysis assistant."},
37         {"role": "user", "content": "Calculate the total revenue across all quarters"}
38     ],
39     tools=tools,
40     stream=True,
41     extra_body={
42         "captain": {
43             "context": context  # Any size context
44         }
45     }
46 )
47 
48 for chunk in response:
49     if chunk.choices[0].delta.content:
50         print(chunk.choices[0].delta.content, end="", flush=True)

Key Points:

Captain automatically handles large contexts (100M+ tokens)
No special upload needed - just use system messages
Tools + infinite context work together seamlessly
The LLM can use tools while processing massive documents

How Tool Calling Works

LLM Analysis: Model analyzes your query and available tools
Tool Invocation: Model requests tool execution with specific parameters
Result Processing: Tool results are fed back to the model
Final Response: Model generates final answer using tool results

Important Notes:

Tools are logged but not executed by default (you see what the LLM would call)
The LLM autonomously decides which tools to use and when
Multiple tool calls may occur for complex queries
Works seamlessly with infinite context processing

Supported Models

Tool calling is supported on:

captain-voyager-latest (default)

Best Practices for Tool Calling

Do:

✅ Provide clear, descriptive tool names and descriptions
✅ Use specific parameter descriptions
✅ Set "strict": True in all function definitions
✅ Test with streaming mode for better UX
✅ Keep tool parameters simple and well-defined

Don’t:

❌ Forget "strict": True
❌ Use vague tool descriptions
❌ Define too many tools (keep it focused)
❌ Expect tool auto-execution (currently logged only)

Troubleshooting

Tool not being called:

Ensure "strict": True is set
Make description more specific to your use case
Try being more explicit in your query

Invalid tool schema error:

Verify JSON schema format in parameters
Check all required fields are present
Ensure type values are valid JSON Schema types

Best Practices

Input Size Optimization

Captain handles files of any size automatically. You have two options:

Standard approach: Use OpenAI SDK with system messages or captain.context - works for any size
Multipart upload: Use /v1/chat/completions/upload for very large files if you prefer multipart form data

Streaming vs Non-Streaming

Use Streaming When:

You want real-time responses
Processing very large documents (reduces perceived latency)
Building chat interfaces
Need to show progress to users

Use Non-Streaming When:

You need the complete response at once
Processing in batch jobs
Storing responses in databases
Simpler implementation needed

Query Optimization

Good Queries:

“What are the main conclusions in section 3?”
“Summarize the methodology described in the paper”
“List all security vulnerabilities mentioned”

Avoid:

Vague queries: “Tell me about this”
Multiple questions: “What are the themes, findings, and recommendations?”
Yes/no questions without context: “Is this good?”

Better Approach:

Break complex queries into separate requests
Be specific about what sections or topics to focus on
Ask for structured output: “List the top 5…”

Use Cases

Legal Document Analysis

Process entire legal contracts, case files, or regulatory documents to extract key clauses, risks, and obligations.

Scientific Paper Review

Analyze research papers, extract methodologies, findings, and compare multiple papers simultaneously.

Code Repository Analysis

Upload entire codebases to understand architecture, identify patterns, or find specific implementations.

Financial Report Processing

Process annual reports, 10-Ks, earnings transcripts to extract financial metrics and strategic insights.

Customer Support Ticket Analysis

Analyze thousands of support tickets to identify common issues, trends, and resolution patterns.

Rate Limits

Tier	Requests per Minute	Max Input Size
Standard	10	50MB
Premium	60	Unlimited

Contact support@runcaptain.com to upgrade to Premium tier.

Integrate Captain with Datalake - Index cloud storage buckets for persistent querying
API Reference - Complete API documentation
Getting Started - Quick start guide

Choosing the Right Endpoint

Use this guide to select the best endpoint for your use case:

`/v1/chat/completions` (OpenAI SDK)

Best for:

Standard contexts (< 1 MB)
Contexts that fit in memory
Using the official OpenAI Python SDK
Familiar OpenAI-compatible interface

Limits:

No size limits - Captain handles any file size automatically

Example use case:

1 from openai import OpenAI
2 
3 client = OpenAI(base_url="https://api.runcaptain.com/v1", api_key="...")
4 response = client.chat.completions.create(
5     model="captain-voyager-latest",
6     messages=[{"role": "user", "content": "Summarize this"}],
7     extra_body={"captain": {"context": "Your text here..."}}
8 )

`/v1/chat/completions/upload` (Multipart Upload)

Best for:

Very large files (1 MB - 100 MB+)
Files already on disk
Maximum performance for huge contexts
Parallel processing of massive texts

Advantages:

Direct file upload (no encoding overhead)
Optimized for 10 MB+ files
Automatic parallel processing
Streaming starts immediately after upload

Example use case:

1 import requests
2 
3 with open('bible.txt', 'rb') as f:
4     response = requests.post(
5         "https://api.runcaptain.com/v1/chat/completions/upload",
6         headers={"Authorization": f"Bearer {API_KEY}", "X-Organization-ID": ORG_ID},
7         files={'file': f},
8         data={'messages': '[{"role":"user","content":"Summarize"}]', 'stream': 'true'},
9         stream=True
10     )

Key Features

Authentication

Streaming Responses

Enable Streaming

Stream Response Format

Stream Events

POST /v1/responses

Authentication

Parameters

Request Example (Python)

Request Example (JavaScript)

Request Example (cURL)

Streaming Example

Response Format

SDK Integration

Supported SDKs

Python SDK

Installation

Using Captain with OpenAI SDK ⭐ Recommended

Alternative: File Upload Endpoint (for very large files)

Message Roles Explained

JavaScript SDK

Installation

Basic Example

Streaming

Vercel AI SDK

Installation

Small Context Example

Large Context Example (Upload Endpoint)

Direct HTTP Fetch

Basic Example

SDK Parameters

POST /v1/chat/completions

Parameters

Request Example (Using OpenAI SDK)

Response Format

POST /v1/chat/completions/upload

Parameters (Multipart Form Data)

Request Example (HTTP Multipart Upload)

How It Works

Response Format

Important Notes

Error Responses

400 Bad Request

401 Unauthorized

413 Payload Too Large

500 Internal Server Error

Tool Calling

Overview

Basic Example

Tool Definition Format

Complete Example: Multiple Tools

Tool Choice Parameter

Streaming vs Non-Streaming with Tools

Tool Calling with Large Context

How Tool Calling Works

Supported Models

Best Practices for Tool Calling

Troubleshooting

Best Practices

Input Size Optimization

Streaming vs Non-Streaming

Query Optimization

Use Cases

Legal Document Analysis

Scientific Paper Review

Code Repository Analysis

Financial Report Processing

Customer Support Ticket Analysis

Rate Limits

Related APIs

Choosing the Right Endpoint

/v1/chat/completions (OpenAI SDK)

/v1/chat/completions/upload (Multipart Upload)

`/v1/chat/completions` (OpenAI SDK)

`/v1/chat/completions/upload` (Multipart Upload)