Tool Calling Architecture
Technical documentation for Captain's tool calling implementation.
Overview
Captain implements OpenAI-compatible function calling (tool calling) with client-side execution. This means tools are never executed on Captain's servers - instead, the API returns tool call requests that clients execute in their own environment.
Architecture Flow
sequenceDiagram
participant Client
participant API
participant Lambda
participant LLM
Client->>API: POST /v1/chat/completions (with tools)
API->>Lambda: Invoke reducer
Lambda->>LLM: Generate with tools available
LLM->>Lambda: Response with tool_calls
Lambda->>API: Return tool call requests
API->>Client: finish_reason="tool_calls"
Note over Client: Client executes tools locally
Client->>Client: Execute tool(s)
Client->>API: Continue with tool results (optional)
Components
1. API Layer (chat_completions.py)
Location: captain-main-api/app/api/routes/chat_completions.py
Responsibilities: - Receive tool definitions from client - Forward tools to Lambda via Step Functions - Detect tool call requests in Lambda output - Return tool calls in OpenAI-compatible format
Key Code:
# Detect tool calls from Lambda
tool_calls_from_lambda = final_output.get("toolCalls")
requires_tool_execution = final_output.get("requiresToolExecution", False)
if requires_tool_execution and tool_calls_from_lambda:
# Convert to OpenAI format
tool_calls = [
ToolCall(
id=tc.get("id"),
type="function",
function=FunctionCall(
name=tc.get("function", {}).get("name"),
arguments=tc.get("function", {}).get("arguments")
)
)
for tc in tool_calls_from_lambda
]
message = ChatMessage(
role="assistant",
content=response_text or None,
tool_calls=tool_calls
)
finish_reason = "tool_calls"
2. Lambda Reducer (reducer/index.ts)
Location: infinite-responses/lambdas/reducer/index.ts
Responsibilities: - Receive tools from API - Pass tools to generation function - Capture tool call requests - Include tool calls in output
Key Code:
// Pass tools to generation
const result = await generateWithToolCalling(
systemPrompt,
userPrompt,
tools,
16000,
model
);
// Capture tool calls
toolCallsToReturn = (result as any).toolCalls;
requiresToolExecutionFlag = (result as any).requiresToolExecution || false;
// Include in output
const output: ReducerOutput = {
// ... other fields
toolCalls: toolCallsToReturn,
requiresToolExecution: requiresToolExecutionFlag
};
3. Tool Calling Module (tool-calling.ts)
Location: infinite-responses/lambdas/_shared/tool-calling.ts
Responsibilities: - Interface with LLM for tool-augmented generation - Detect when model requests tool calls - Return tool calls instead of executing them - Format tool calls in OpenAI-compatible structure
Key Code:
// When model requests tools and no executor provided
if (!onToolCall) {
console.log(`[ToolCalling] No executor provided - returning tool calls for client-side execution`);
// Track these tool calls
for (const toolCall of assistantMessage.tool_calls) {
allToolCalls.push(toolCall as ToolCall);
}
return {
text: assistantMessage.content || '',
usage: {
promptTokens: totalPromptTokens,
completionTokens: totalCompletionTokens,
totalTokens: totalPromptTokens + totalCompletionTokens
},
toolCallsMade: allToolCalls,
requiresToolExecution: true
};
}
Data Flow
1. Request with Tools
{
"model": "captain-voyager-latest",
"messages": [
{"role": "user", "content": "What is 50 + 25?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform arithmetic",
"parameters": {
"type": "object",
"properties": {
"operation": {"type": "string"},
"a": {"type": "number"},
"b": {"type": "number"}
}
},
"strict": true
}
}
]
}
2. Lambda Processing
// Tool calling parameters passed to Lambda
{
requestId: "chat-abc123",
query: "What is 50 + 25?",
toolsEnabled: true,
tools: [...],
toolChoice: "auto"
}
3. LLM Response
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "calculate",
"arguments": "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
}
}
]
}
4. Lambda Output
{
requestId: "chat-abc123",
summary: "", // Empty when tool calls required
toolCalls: [
{
id: "call_123",
type: "function",
function: {
name: "calculate",
arguments: "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
}
}
],
requiresToolExecution: true
}
5. API Response
{
"id": "chatcmpl-abc123",
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "calculate",
"arguments": "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
}
}
]
}
}
]
}
Design Decisions
Client-Side Execution
Why: Security, flexibility, and control
- ✅ Security: Client controls what code executes and has access to
- ✅ Flexibility: Client can use any tools (databases, APIs, local files)
- ✅ Isolation: Tool failures don't affect Captain infrastructure
- ✅ Performance: No need for Lambda to wait for external API calls
Trade-off: Requires client to handle tool execution logic
Single-Turn Pattern
Current Implementation: Lambda returns tool calls and completes execution
Why: - Lambda functions are stateless - Step Functions execute once per request - Simpler architecture without state management
Multi-Turn Support: Handled by frameworks like Vercel AI SDK with maxSteps parameter
OpenAI Compatibility
Format: Matches OpenAI's function calling API exactly
Benefits: - Works with OpenAI SDK without modifications - Compatible with Vercel AI SDK, LangChain, etc. - Familiar to developers already using OpenAI
Performance Characteristics
Latency
| Scenario | Latency | Notes |
|---|---|---|
| No tools | ~2-5s | Standard generation |
| With tools (no call) | ~2-5s | Same as no tools |
| With tools (call made) | ~3-6s | +1-2s for tool call detection |
Token Usage
- Input tokens: Query + context + tool definitions
- Output tokens: Tool call JSON (typically 20-100 tokens)
- No execution tokens: Tools execute client-side
Scaling
- ✅ Scales horizontally (stateless Lambdas)
- ✅ No tool execution bottleneck
- ✅ Client handles execution concurrency
Implementation Details
Tool Definition Format
Captain uses OpenAI's tool definition format with one requirement:
{
"type": "function",
"function": {
"name": "function_name",
"description": "Clear description",
"parameters": {
"type": "object",
"properties": {...},
"required": [...]
},
"strict": true // REQUIRED: Must be true
}
}
The strict: true parameter is required for compatibility with the underlying LLM (Cerebras).
Message Roles
Captain supports all OpenAI message roles:
system: System instructionsuser: User messagesassistant: Assistant responsestool: Tool result messages (for multi-turn)
Finish Reasons
| Reason | Meaning |
|---|---|
stop |
Normal completion without tools |
tool_calls |
Model wants to call one or more tools |
length |
Hit max token limit |
Error Handling
Tool Definition Errors
# API validates tools before sending to Lambda
if tools:
for tool in tools:
if tool.get("function", {}).get("strict") != True:
raise HTTPException(
status_code=400,
detail="Tool 'strict' parameter must be True"
)
LLM Errors
// Lambda catches generation errors
try {
const result = await generateWithToolCalling(...)
} catch (error) {
console.error('[reducer] Tool calling failed:', error);
// Return error in output
}
Client Errors
# Client handles tool execution errors
try:
result = execute_tool(tool_name, args)
except Exception as e:
result = {"error": str(e)}
Limitations
1. Multi-Turn Continuation
Current: Lambda doesn't support receiving tool results for continuation
Workaround: Use frameworks with automatic multi-turn (Vercel AI SDK with maxSteps)
Future: May implement conversation state storage (Redis/DynamoDB)
2. Tool Parameter Extraction
Issue: Model may occasionally return empty/incomplete parameters
Cause: LLM behavior, not API limitation
Mitigation: - Use clear, explicit tool descriptions - Provide examples in system prompts - Handle empty parameters gracefully client-side
3. Streaming with Tools
Current: Tool calls returned after generation completes
Future: May support streaming tool calls as they're generated
Security Considerations
1. Tool Validation
Client should validate tool calls before execution:
ALLOWED_TOOLS = {"calculate", "query_database"}
def execute_tool(tool_name, args):
if tool_name not in ALLOWED_TOOLS:
raise ValueError(f"Tool {tool_name} not allowed")
# Validate args
# Execute with proper permissions
2. Parameter Sanitization
def sanitize_sql_query(query: str) -> str:
# Remove dangerous SQL
forbidden = ["DROP", "DELETE", "INSERT", "UPDATE"]
for word in forbidden:
if word in query.upper():
raise ValueError(f"Forbidden SQL operation: {word}")
return query
3. Rate Limiting
from functools import lru_cache
from time import time
@lru_cache(maxsize=1000)
def rate_limit_tool(user_id: str, tool_name: str) -> bool:
# Implement rate limiting logic
return check_rate_limit(user_id, tool_name)
Testing
Unit Tests
def test_tool_call_detection():
response = client.chat.completions.create(
model="captain-voyager-latest",
messages=[{"role": "user", "content": "Calculate 5 + 3"}],
tools=[calculate_tool]
)
assert response.choices[0].finish_reason == "tool_calls"
assert len(response.choices[0].message.tool_calls) == 1
assert response.choices[0].message.tool_calls[0].function.name == "calculate"
Integration Tests
import { generateText } from 'ai';
test('tool execution with Vercel AI SDK', async () => {
const result = await generateText({
model: captain.chat('captain-voyager-latest'),
messages: [{ role: 'user', content: 'What is 5 + 3?' }],
tools: { calculate: {...} },
maxSteps: 5
});
expect(result.toolCalls?.length).toBeGreaterThan(0);
expect(result.text).toContain('8');
});
Monitoring
Key Metrics
- Tool call rate: % of requests that trigger tool calls
- Tool execution latency: Time spent in client-side execution
- Tool error rate: % of tool executions that fail
- Multi-turn depth: Average number of tool calls per conversation
Logging
// Lambda logs tool calling activity
console.log(`[ToolCalling] Model requesting ${toolCalls.length} tool call(s)`);
console.log(`[reducer] Tool calls detected - ${toolCalls.length} tool(s) need client-side execution`);
Future Enhancements
Planned
- Streaming tool calls: Stream tool call JSON as generated
- Conversation state: Store conversation state for true multi-turn
- Tool choice enforcement: Better support for
tool_choiceparameter - Parallel tool execution: Request multiple tools simultaneously
Under Consideration
- Server-side tool execution: Optional sandbox for safe tools
- Tool result validation: Validate tool results before continuation
- Tool usage analytics: Detailed metrics on tool performance
References
Support
For technical questions or issues:
- 📧 Email: support@runcaptain.com
- 📖 Documentation: docs.runcaptain.com