Getting Started with captain

Welcome to Captain! This guide will help you get started with indexing and querying your data.

Quick Start

Workflow Overview

Prerequisites:

1. Install Python Dependencies

pip install requests
pip install uuid # if idempotency is desired

2. Get Your API Credentials

You'll need an API Key from the Captain API Studio (format: cap_dev_...) and an Organization ID (UUID format) which will also be in the Studio.

Using the API

Step 1: Create a Database

Databases are containers for your indexed files. Each database is scoped to your API key's environment.

response = requests.post(
    f"{BASE_URL}/api/v1/create-database",
    data={
        'organization_id': ORG_ID,
        'api_key': API_KEY,
        'database_name': 'contracts_2024'
    }
)

You can also /delete-database or /list-databases to manage different databases for different users or projects.

Step 2: Index Your S3 Bucket

Upload files from your S3 bucket into your Captain database. If there are any previously indexed files in your Captain database, the /index-all endpoint will remove them and then index all files from the bucket.

from urllib.parse import quote

response = requests.post(
    f"{BASE_URL}/api/v1/index-all",
    data={
        'database_name': 'contracts_2024',
        'bucket_name': 'my-s3-bucket',
        'aws_access_key_id': 'YOUR_AWS_KEY',
        'aws_secret_access_key': quote('YOUR_AWS_SECRET', safe=''),
        'bucket_region': 'us-east-1',
        'api_key': API_KEY,
        'organization_id': ORG_ID,
    }
)

job_id = response.json()['job_id']
print(f"Indexing started! Job ID: {job_id}")

Step 3: Monitor Indexing Progress

Check the status of your indexing job by polling the /indexing-status endpoint.

import time

job_id = "your_job_id_here"

while True:
    response = requests.get(
        f"{BASE_URL}/api/v1/indexing-status/{job_id}",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        }
    )

    result = response.json()
    if result.get('completed'):
        print("Indexing complete!")
        break

    print(f"Status: {result.get('status')}")
    time.sleep(3)

Step 4: Query Your Data

Ask questions about your indexed data using the /query endpoint.

from urllib.parse import quote
import uuid

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/x-www-form-urlencoded",
    "X-Organization-ID": ORG_ID
    "Idempotency-Key": str(uuid.uuid7()),
}

response = requests.post(
    f"{BASE_URL}/api/v1/query",
    headers=headers,
    data={
        'query': quote("What contracts mention termination clauses?"),
        'database_name': 'contracts_2024',
    }
)

print(response.json())

Using the Demo Client

We provide a comprehensive demo client that showcases all Captain features: https://raw.githubusercontent.com/runcaptain/demo/

# Download the demo client
wget https://github.com/runcaptain/demo/main/captain_demo.py

# Run the interactive demo
python captain_demo.py

Important Concepts

Database Names are Unique

Database names must be unique within your organization.

Indexing Behavior

When you re-index a bucket:

All previously indexed files will be removed
All files from the bucket will be indexed and added to the database

Supported File Types

Captain supports an allow-list of file types including:

Documents
Images
Spreadsheets
Code
PowerPoints

(see API Reference for the full allow-list)

Unsupported types (like videos) will be individually rejected during indexing. Supported types will continue indexing.

Next Steps

Explore the API Reference for detailed endpoint documentation
Review authentication best practices
Learn about advanced query options

Getting Help

Need assistance? Contact us at support@runcaptain.com or call us at +1 (260) CAP-TAIN.