Skip to content

Tool Calling Architecture

Technical documentation for Captain's tool calling implementation.

Overview

Captain implements OpenAI-compatible function calling (tool calling) with client-side execution. This means tools are never executed on Captain's servers - instead, the API returns tool call requests that clients execute in their own environment.

Architecture Flow

sequenceDiagram
    participant Client
    participant API
    participant Lambda
    participant LLM

    Client->>API: POST /v1/chat/completions (with tools)
    API->>Lambda: Invoke reducer
    Lambda->>LLM: Generate with tools available
    LLM->>Lambda: Response with tool_calls
    Lambda->>API: Return tool call requests
    API->>Client: finish_reason="tool_calls"

    Note over Client: Client executes tools locally

    Client->>Client: Execute tool(s)
    Client->>API: Continue with tool results (optional)

Components

1. API Layer (chat_completions.py)

Location: captain-main-api/app/api/routes/chat_completions.py

Responsibilities: - Receive tool definitions from client - Forward tools to Lambda via Step Functions - Detect tool call requests in Lambda output - Return tool calls in OpenAI-compatible format

Key Code:

# Detect tool calls from Lambda
tool_calls_from_lambda = final_output.get("toolCalls")
requires_tool_execution = final_output.get("requiresToolExecution", False)

if requires_tool_execution and tool_calls_from_lambda:
    # Convert to OpenAI format
    tool_calls = [
        ToolCall(
            id=tc.get("id"),
            type="function",
            function=FunctionCall(
                name=tc.get("function", {}).get("name"),
                arguments=tc.get("function", {}).get("arguments")
            )
        )
        for tc in tool_calls_from_lambda
    ]

    message = ChatMessage(
        role="assistant",
        content=response_text or None,
        tool_calls=tool_calls
    )
    finish_reason = "tool_calls"

2. Lambda Reducer (reducer/index.ts)

Location: infinite-responses/lambdas/reducer/index.ts

Responsibilities: - Receive tools from API - Pass tools to generation function - Capture tool call requests - Include tool calls in output

Key Code:

// Pass tools to generation
const result = await generateWithToolCalling(
  systemPrompt,
  userPrompt,
  tools,
  16000,
  model
);

// Capture tool calls
toolCallsToReturn = (result as any).toolCalls;
requiresToolExecutionFlag = (result as any).requiresToolExecution || false;

// Include in output
const output: ReducerOutput = {
  // ... other fields
  toolCalls: toolCallsToReturn,
  requiresToolExecution: requiresToolExecutionFlag
};

3. Tool Calling Module (tool-calling.ts)

Location: infinite-responses/lambdas/_shared/tool-calling.ts

Responsibilities: - Interface with LLM for tool-augmented generation - Detect when model requests tool calls - Return tool calls instead of executing them - Format tool calls in OpenAI-compatible structure

Key Code:

// When model requests tools and no executor provided
if (!onToolCall) {
  console.log(`[ToolCalling] No executor provided - returning tool calls for client-side execution`);

  // Track these tool calls
  for (const toolCall of assistantMessage.tool_calls) {
    allToolCalls.push(toolCall as ToolCall);
  }

  return {
    text: assistantMessage.content || '',
    usage: {
      promptTokens: totalPromptTokens,
      completionTokens: totalCompletionTokens,
      totalTokens: totalPromptTokens + totalCompletionTokens
    },
    toolCallsMade: allToolCalls,
    requiresToolExecution: true
  };
}

Data Flow

1. Request with Tools

{
  "model": "captain-voyager-latest",
  "messages": [
    {"role": "user", "content": "What is 50 + 25?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculate",
        "description": "Perform arithmetic",
        "parameters": {
          "type": "object",
          "properties": {
            "operation": {"type": "string"},
            "a": {"type": "number"},
            "b": {"type": "number"}
          }
        },
        "strict": true
      }
    }
  ]
}

2. Lambda Processing

// Tool calling parameters passed to Lambda
{
  requestId: "chat-abc123",
  query: "What is 50 + 25?",
  toolsEnabled: true,
  tools: [...],
  toolChoice: "auto"
}

3. LLM Response

{
  "role": "assistant",
  "content": "",
  "tool_calls": [
    {
      "id": "call_123",
      "type": "function",
      "function": {
        "name": "calculate",
        "arguments": "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
      }
    }
  ]
}

4. Lambda Output

{
  requestId: "chat-abc123",
  summary: "",  // Empty when tool calls required
  toolCalls: [
    {
      id: "call_123",
      type: "function",
      function: {
        name: "calculate",
        arguments: "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
      }
    }
  ],
  requiresToolExecution: true
}

5. API Response

{
  "id": "chatcmpl-abc123",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_123",
            "type": "function",
            "function": {
              "name": "calculate",
              "arguments": "{\"operation\": \"add\", \"a\": 50, \"b\": 25}"
            }
          }
        ]
      }
    }
  ]
}

Design Decisions

Client-Side Execution

Why: Security, flexibility, and control

  • Security: Client controls what code executes and has access to
  • Flexibility: Client can use any tools (databases, APIs, local files)
  • Isolation: Tool failures don't affect Captain infrastructure
  • Performance: No need for Lambda to wait for external API calls

Trade-off: Requires client to handle tool execution logic

Single-Turn Pattern

Current Implementation: Lambda returns tool calls and completes execution

Why: - Lambda functions are stateless - Step Functions execute once per request - Simpler architecture without state management

Multi-Turn Support: Handled by frameworks like Vercel AI SDK with maxSteps parameter

OpenAI Compatibility

Format: Matches OpenAI's function calling API exactly

Benefits: - Works with OpenAI SDK without modifications - Compatible with Vercel AI SDK, LangChain, etc. - Familiar to developers already using OpenAI

Performance Characteristics

Latency

Scenario Latency Notes
No tools ~2-5s Standard generation
With tools (no call) ~2-5s Same as no tools
With tools (call made) ~3-6s +1-2s for tool call detection

Token Usage

  • Input tokens: Query + context + tool definitions
  • Output tokens: Tool call JSON (typically 20-100 tokens)
  • No execution tokens: Tools execute client-side

Scaling

  • ✅ Scales horizontally (stateless Lambdas)
  • ✅ No tool execution bottleneck
  • ✅ Client handles execution concurrency

Implementation Details

Tool Definition Format

Captain uses OpenAI's tool definition format with one requirement:

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "Clear description",
    "parameters": {
      "type": "object",
      "properties": {...},
      "required": [...]
    },
    "strict": true  // REQUIRED: Must be true
  }
}

The strict: true parameter is required for compatibility with the underlying LLM (Cerebras).

Message Roles

Captain supports all OpenAI message roles:

  • system: System instructions
  • user: User messages
  • assistant: Assistant responses
  • tool: Tool result messages (for multi-turn)

Finish Reasons

Reason Meaning
stop Normal completion without tools
tool_calls Model wants to call one or more tools
length Hit max token limit

Error Handling

Tool Definition Errors

# API validates tools before sending to Lambda
if tools:
    for tool in tools:
        if tool.get("function", {}).get("strict") != True:
            raise HTTPException(
                status_code=400,
                detail="Tool 'strict' parameter must be True"
            )

LLM Errors

// Lambda catches generation errors
try {
  const result = await generateWithToolCalling(...)
} catch (error) {
  console.error('[reducer] Tool calling failed:', error);
  // Return error in output
}

Client Errors

# Client handles tool execution errors
try:
    result = execute_tool(tool_name, args)
except Exception as e:
    result = {"error": str(e)}

Limitations

1. Multi-Turn Continuation

Current: Lambda doesn't support receiving tool results for continuation

Workaround: Use frameworks with automatic multi-turn (Vercel AI SDK with maxSteps)

Future: May implement conversation state storage (Redis/DynamoDB)

2. Tool Parameter Extraction

Issue: Model may occasionally return empty/incomplete parameters

Cause: LLM behavior, not API limitation

Mitigation: - Use clear, explicit tool descriptions - Provide examples in system prompts - Handle empty parameters gracefully client-side

3. Streaming with Tools

Current: Tool calls returned after generation completes

Future: May support streaming tool calls as they're generated

Security Considerations

1. Tool Validation

Client should validate tool calls before execution:

ALLOWED_TOOLS = {"calculate", "query_database"}

def execute_tool(tool_name, args):
    if tool_name not in ALLOWED_TOOLS:
        raise ValueError(f"Tool {tool_name} not allowed")

    # Validate args
    # Execute with proper permissions

2. Parameter Sanitization

def sanitize_sql_query(query: str) -> str:
    # Remove dangerous SQL
    forbidden = ["DROP", "DELETE", "INSERT", "UPDATE"]
    for word in forbidden:
        if word in query.upper():
            raise ValueError(f"Forbidden SQL operation: {word}")
    return query

3. Rate Limiting

from functools import lru_cache
from time import time

@lru_cache(maxsize=1000)
def rate_limit_tool(user_id: str, tool_name: str) -> bool:
    # Implement rate limiting logic
    return check_rate_limit(user_id, tool_name)

Testing

Unit Tests

def test_tool_call_detection():
    response = client.chat.completions.create(
        model="captain-voyager-latest",
        messages=[{"role": "user", "content": "Calculate 5 + 3"}],
        tools=[calculate_tool]
    )

    assert response.choices[0].finish_reason == "tool_calls"
    assert len(response.choices[0].message.tool_calls) == 1
    assert response.choices[0].message.tool_calls[0].function.name == "calculate"

Integration Tests

import { generateText } from 'ai';

test('tool execution with Vercel AI SDK', async () => {
  const result = await generateText({
    model: captain.chat('captain-voyager-latest'),
    messages: [{ role: 'user', content: 'What is 5 + 3?' }],
    tools: { calculate: {...} },
    maxSteps: 5
  });

  expect(result.toolCalls?.length).toBeGreaterThan(0);
  expect(result.text).toContain('8');
});

Monitoring

Key Metrics

  • Tool call rate: % of requests that trigger tool calls
  • Tool execution latency: Time spent in client-side execution
  • Tool error rate: % of tool executions that fail
  • Multi-turn depth: Average number of tool calls per conversation

Logging

// Lambda logs tool calling activity
console.log(`[ToolCalling] Model requesting ${toolCalls.length} tool call(s)`);
console.log(`[reducer] Tool calls detected - ${toolCalls.length} tool(s) need client-side execution`);

Future Enhancements

Planned

  1. Streaming tool calls: Stream tool call JSON as generated
  2. Conversation state: Store conversation state for true multi-turn
  3. Tool choice enforcement: Better support for tool_choice parameter
  4. Parallel tool execution: Request multiple tools simultaneously

Under Consideration

  1. Server-side tool execution: Optional sandbox for safe tools
  2. Tool result validation: Validate tool results before continuation
  3. Tool usage analytics: Detailed metrics on tool performance

References

Support

For technical questions or issues: