Skip to content

Getting Started with captain

Welcome to Captain! This guide will help you get started with indexing and querying your data.

Quick Start

Workflow Overview

Prerequisites:

1. Install Python Dependencies

pip install requests
pip install uuid # if idempotency is desired

2. Get Your API Credentials

You'll need an API Key from the Captain API Studio (format: cap_dev_...) and an Organization ID (UUID format) which will also be in the Studio.

Using the API

Step 1: Create a Database

Databases are containers for your indexed files. Each database is scoped to your API key's environment.

response = requests.post(
    f"{BASE_URL}/api/v1/create-database",
    data={
        'organization_id': ORG_ID,
        'api_key': API_KEY,
        'database_name': 'contracts_2024'
    }
)

You can also /delete-database or /list-databases to manage different databases for different users or projects.

Step 2: Index Your S3 Bucket

Upload files from your S3 bucket into your Captain database. If there are any previously indexed files in your Captain database, the /index-all endpoint will remove them and then index all files from the bucket.

from urllib.parse import quote

response = requests.post(
    f"{BASE_URL}/api/v1/index-all",
    data={
        'database_name': 'contracts_2024',
        'bucket_name': 'my-s3-bucket',
        'aws_access_key_id': 'YOUR_AWS_KEY',
        'aws_secret_access_key': quote('YOUR_AWS_SECRET', safe=''),
        'bucket_region': 'us-east-1',
        'api_key': API_KEY,
        'organization_id': ORG_ID,
    }
)

job_id = response.json()['job_id']
print(f"Indexing started! Job ID: {job_id}")

Step 3: Monitor Indexing Progress

Check the status of your indexing job by polling the /indexing-status endpoint.

import time

job_id = "your_job_id_here"

while True:
    response = requests.get(
        f"{BASE_URL}/api/v1/indexing-status/{job_id}",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        }
    )

    result = response.json()
    if result.get('completed'):
        print("Indexing complete!")
        break

    print(f"Status: {result.get('status')}")
    time.sleep(3)

Step 4: Query Your Data

Ask questions about your indexed data using the /query endpoint.

from urllib.parse import quote
import uuid

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/x-www-form-urlencoded",
    "X-Organization-ID": ORG_ID
    "Idempotency-Key": str(uuid.uuid7()),
}

response = requests.post(
    f"{BASE_URL}/api/v1/query",
    headers=headers,
    data={
        'query': quote("What contracts mention termination clauses?"),
        'database_name': 'contracts_2024',
    }
)

print(response.json())

Using the Demo Client

We provide a comprehensive demo client that showcases all Captain features: https://raw.githubusercontent.com/runcaptain/demo/

# Download the demo client
wget https://github.com/runcaptain/demo/main/captain_demo.py

# Run the interactive demo
python captain_demo.py

Important Concepts

Database Names are Unique

Database names must be unique within your organization.

Indexing Behavior

When you re-index a bucket:

  • All previously indexed files will be removed

  • All files from the bucket will be indexed and added to the database

Supported File Types

Captain supports an allow-list of file types including:

  • Documents

  • Images

  • Spreadsheets

  • Code

  • PowerPoints

(see API Reference for the full allow-list)

Unsupported types (like videos) will be individually rejected during indexing. Supported types will continue indexing.

Next Steps

  • Explore the API Reference for detailed endpoint documentation

  • Review authentication best practices

  • Learn about advanced query options

Getting Help

Need assistance? Contact us at support@runcaptain.com or call us at +1 (260) CAP-TAIN.