AWS S3 with cross-account IAM role

Captain can read your S3 buckets without you ever sharing AWS access keys. You create an IAM role in your AWS account that trusts a Captain-owned principal, Captain calls sts:AssumeRole on it (with a Captain-issued external ID) when an indexing job runs, and the temporary credentials are discarded the moment the job finishes.

This is the auth method we recommend for production indexing and for customers operating under formal compliance programs (SOC 2, ISO 27001, HIPAA, FedRAMP). It maps cleanly onto AWS’s recommended cross-account access pattern, so most enterprise security reviews approve it on sight.

Before you start, you need to contact Captain to receive an external ID.

The external ID is a short string we generate server-side per organization and pin to your AWS integration. Without it, your IAM role’s trust policy can’t be set up correctly. Email support@runcaptain.com (or message your Captain account manager) with your organization ID. We’ll mint your external ID and send it back, usually same-day.

It looks like cap_org_xxxxxxxxxxxxxxxxxxxxxxxx (24 hex characters after the prefix). Each organization gets a unique one. Don’t reuse another customer’s external ID and don’t generate your own. Captain will reject the indexing call if the value doesn’t match what we have stored.

Why use it

Cross-account role assumption gives you three things:

  • Short-lived session credentials. Captain reads your bucket using AWS STS session credentials it mints on demand and discards when the job finishes. This is the standard AWS cross-account pattern. The role you create on your side is the only durable artifact, and you control it.
  • Full CloudTrail visibility. Every read shows up in your AWS account as an AssumeRole event from a known Captain principal, with a session name traceable back to a specific indexing job, followed by the corresponding s3:GetObject and s3:ListBucket calls.
  • You own the permissions surface. Want to scope to one bucket? One prefix? Layer in KMS conditions? Restrict by source IP or time of day? Standard IAM, all configured in your account.

You can revoke Captain’s access at any time by editing your role’s trust policy or deleting the role outright. No coordination with Captain is required.

How it works

You create a role in your AWS account. Its trust policy names a single Captain principal (captain-cross-account-ingestion) and requires the external ID Captain issued you. When an indexing job runs, Captain calls sts:AssumeRole on your role, gets a short-lived session (≤ 1 hour, AWS-enforced), reads the bucket, and discards the session when the job finishes. Long jobs re-assume automatically as the session approaches expiry. Your trust policy gates assumption on both the Captain principal and the external ID matching, so the role can’t be assumed by anyone else.

Setup

Step 1: Get your external ID from Captain

Email support@runcaptain.com with your Captain organization ID, or ask your account manager directly if you’re already in a pilot conversation. We’ll reply with a value of the form cap_org_<24-hex>, usually same-day. Save it; you’ll paste it into your IAM trust policy in step 2.

Step 2: Create the IAM role in your AWS account

  1. Open the AWS ConsoleIAMRolesCreate role.
  2. Trusted entity type: select Custom trust policy, not “AWS account.” The “AWS account” form doesn’t let you bind to a specific role and external ID cleanly.
  3. Paste the following trust policy verbatim, replacing <EXTERNAL_ID_FROM_STEP_1> with the value Captain sent you:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Principal": {
7 "AWS": "arn:aws:iam::149896762962:role/captain-cross-account-ingestion"
8 },
9 "Action": "sts:AssumeRole",
10 "Condition": {
11 "StringEquals": {
12 "sts:ExternalId": "<EXTERNAL_ID_FROM_STEP_1>"
13 }
14 }
15 }
16 ]
17}

Paste the Principal.AWS value exactly as shown. Don’t substitute your own AWS account ID, a Captain root ARN, or any other principal. This is the only Captain identity we authorize to read customer S3 buckets, and AssumeRole will fail with AccessDenied if your trust policy names anything else.

  1. Click Next. On the “Add permissions” page, click Create policy (this opens a new tab). In the JSON tab, paste the following, replacing <YOUR_BUCKET_NAME> with the bucket(s) you want Captain to index:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Sid": "ListBucket",
6 "Effect": "Allow",
7 "Action": "s3:ListBucket",
8 "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>"
9 },
10 {
11 "Sid": "GetObjects",
12 "Effect": "Allow",
13 "Action": "s3:GetObject",
14 "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
15 }
16 ]
17}

Save this policy as something like CaptainS3ReadAccess. If your bucket uses a customer-managed KMS key, also grant kms:Decrypt on that key’s ARN.

  1. Back in the role-creation tab, refresh the policy list, check the CaptainS3ReadAccess policy you just created, click Next.

  2. Name the role (we suggest CaptainS3ReadRole), review, Create role.

  3. After the role is created, copy its ARN. It looks like arn:aws:iam::123456789012:role/CaptainS3ReadRole.

Step 3: Send Captain your role ARN

Reply to the same Captain thread with the role ARN from step 2. We’ll record it on our side, your integration flips from pending to active, and you can run your first indexing call.

Step 4: Run your first indexing call

Use the role ARN and external ID in place of the aws_access_key_id / aws_secret_access_key fields:

1import requests
2
3BASE_URL = "https://api.runcaptain.com"
4API_KEY = "your_api_key"
5ORG_ID = "your_organization_id"
6
7headers = {
8 "Authorization": f"Bearer {API_KEY}",
9 "X-Organization-ID": ORG_ID,
10 "Content-Type": "application/json"
11}
12
13response = requests.post(
14 f"{BASE_URL}/v2/collections/my_documents/index/s3",
15 headers=headers,
16 json={
17 "bucket_name": "my-documents-bucket",
18 "bucket_region": "us-east-1",
19 "auth": {
20 "type": "assume_role",
21 "role_arn": "arn:aws:iam::123456789012:role/CaptainS3ReadRole",
22 "external_id": "cap_org_xxxxxxxxxxxxxxxxxxxxxxxx"
23 },
24 "processing_type": "advanced",
25 "skip_existing": True
26 },
27 timeout=60.0
28)
29
30if response.status_code in [200, 201]:
31 data = response.json()
32 print(f"Job started! ID: {data['job_id']}")
33else:
34 print(f"Error: {response.status_code}")

The same auth block works on all three S3 endpoints:

  • POST /v2/collections/{name}/index/s3: index everything in a bucket
  • POST /v2/collections/{name}/index/s3/file: index a single object
  • POST /v2/collections/{name}/index/s3/directory: index everything under a path (treated as a prefix)

These endpoints also accept aws_access_key_id + aws_secret_access_key in place of the auth block. Use whichever fits. A single request must pick one shape, not both (sending both returns HTTP 422).

What success looks like

The API returns { "job_id": "...", "status": "pending" } immediately. Track progress via GET /v2/jobs/{job_id}. On success the job reports files indexed, chunks processed, and tokens used.

In your AWS account, every Captain indexing call shows up in CloudTrail as:

  • An AssumeRole event on your role with RoleSessionName=captain-customer-<id>.
  • One or more s3:ListBucket / s3:GetObject events using the assumed session credentials.

You can revoke Captain’s access at any time by deleting your IAM role, or by editing its trust policy to remove the Captain principal. New indexing jobs will fail immediately with AccessDenied; jobs already in flight finish with whatever credentials they already minted (≤ 1 hour).

Troubleshooting

Most failures are trust-policy typos. AWS error messages for AssumeRole are intentionally generic, so you can’t always tell which field is wrong from the error alone. Check these two first:

  • AccessDenied when the indexing job starts: the trust policy on your role doesn’t allow the Captain principal. Check that Principal.AWS is exactly arn:aws:iam::149896762962:role/captain-cross-account-ingestion, with no typos.
  • AccessDenied shortly after: the external_id in your trust policy doesn’t match what Captain sent. Re-paste it exactly (no whitespace, no quotes). If unsure, ask Captain to resend.

s3:GetObject failures mean the role is being assumed correctly but the permissions policy on the role is missing the action. Double-check the bucket name in the policy matches the bucket you’re trying to index (both arn:aws:s3:::<bucket> for ListBucket and arn:aws:s3:::<bucket>/* for GetObject).

KMS-encrypted buckets need kms:Decrypt on the role’s permissions policy, scoped to the KMS key ARN. Without this, s3:GetObject returns the same AccessDenied. Your bucket’s encryption config is in the AWS Console under S3 → your bucket → PropertiesDefault encryption.

Operational notes

  • Rotating the external ID. If you ever need a new external ID (e.g., the old one was posted somewhere it shouldn’t have been), reply to your Captain thread asking for a rotation. We’ll mint a new one; you update the Condition.StringEquals.sts:ExternalId value in your trust policy. The old external ID stops working immediately.
  • Multiple buckets. One IAM role can grant access to multiple buckets; list each bucket’s ARN in the permissions policy. Your indexing call specifies which one to index per request.
  • Multiple AWS accounts. If you have data spread across several AWS accounts, create one role per account using the same external ID. Each indexing call targets one role at a time.
  • Session length. AWS limits each assumed-role session to 1 hour. Captain re-assumes automatically as the session nears expiry, so jobs longer than an hour run without manual intervention.
  • Auditing usage. Filter CloudTrail by userIdentity.arn containing captain-cross-account-ingestion to see every Captain-initiated read on your S3 data.

Access keys are also supported

The aws_access_key_id + aws_secret_access_key flow works on the same endpoints. Some collections can use role-based auth while others use access keys, and you can switch a collection from one to the other at any time. Pick whichever fits the bucket and the team managing it. See the Connect Cloud Storage guide for the access-key flow.