Getting Started with captain
Welcome to Captain! This guide will help you get started with indexing and querying your data.
Quick Start
Workflow Overview
Prerequisites:
1. Install Python Dependencies
2. Get Your API Credentials
You'll need an API Key from the Captain API Studio (format: cap_dev_...
) and an Organization ID (UUID format) which will also be in the Studio.
Using the API
Step 1: Create a Database
Databases are containers for your indexed files. Each database is scoped to your API key's environment.
response = requests.post(
f"{BASE_URL}/api/v1/create-database",
data={
'organization_id': ORG_ID,
'api_key': API_KEY,
'database_name': 'contracts_2024'
}
)
You can also /delete-database or /list-databases to manage different databases for different users or projects.
Step 2: Index Your S3 Bucket
Upload files from your S3 bucket into your Captain database. If there are any previously indexed files in your Captain database, the /index-all endpoint will remove them and then index all files from the bucket.
from urllib.parse import quote
response = requests.post(
f"{BASE_URL}/api/v1/index-all",
data={
'database_name': 'contracts_2024',
'bucket_name': 'my-s3-bucket',
'aws_access_key_id': 'YOUR_AWS_KEY',
'aws_secret_access_key': quote('YOUR_AWS_SECRET', safe=''),
'bucket_region': 'us-east-1',
'api_key': API_KEY,
'organization_id': ORG_ID,
}
)
job_id = response.json()['job_id']
print(f"Indexing started! Job ID: {job_id}")
Step 3: Monitor Indexing Progress
Check the status of your indexing job by polling the /indexing-status endpoint.
import time
job_id = "your_job_id_here"
while True:
response = requests.get(
f"{BASE_URL}/api/v1/indexing-status/{job_id}",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
)
result = response.json()
if result.get('completed'):
print("Indexing complete!")
break
print(f"Status: {result.get('status')}")
time.sleep(3)
Step 4: Query Your Data
Ask questions about your indexed data using the /query endpoint.
from urllib.parse import quote
import uuid
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/x-www-form-urlencoded",
"X-Organization-ID": ORG_ID
"Idempotency-Key": str(uuid.uuid7()),
}
response = requests.post(
f"{BASE_URL}/api/v1/query",
headers=headers,
data={
'query': quote("What contracts mention termination clauses?"),
'database_name': 'contracts_2024',
}
)
print(response.json())
Using the Demo Client
We provide a comprehensive demo client that showcases all Captain features: https://raw.githubusercontent.com/runcaptain/demo/
# Download the demo client
wget https://github.com/runcaptain/demo/main/captain_demo.py
# Run the interactive demo
python captain_demo.py
Important Concepts
Database Names are Unique
Database names must be unique within your organization.
Indexing Behavior
When you re-index a bucket:
-
All previously indexed files will be removed
-
All files from the bucket will be indexed and added to the database
Supported File Types
Captain supports an allow-list of file types including:
-
Documents
-
Images
-
Spreadsheets
-
Code
-
PowerPoints
(see API Reference for the full allow-list)
Unsupported types (like videos) will be individually rejected during indexing. Supported types will continue indexing.
Next Steps
-
Explore the API Reference for detailed endpoint documentation
-
Review authentication best practices
-
Learn about advanced query options
Getting Help
Need assistance? Contact us at support@runcaptain.com or call us at +1 (260) CAP-TAIN.