> For a complete page index of the Captain API documentation, fetch https://docs.runcaptain.com/llms.txt?excludeSpec=true

# AWS S3 with cross-account IAM role

> Set up Captain to index your S3 buckets using cross-account role assumption. Captain authenticates to your bucket via short-lived AWS session credentials it mints on demand. Recommended for production indexing.

## Agent Quick Reference: S3 cross-account IAM role auth

* Captain principal customers must trust: `arn:aws:iam::149896762962:role/captain-cross-account-ingestion`.
* Customer-side role's trust policy must (a) name that principal as `Principal.AWS` and (b) require `sts:ExternalId = <captain-issued-id>` as a Condition.
* External ID is **NOT** customer-generated. Captain mints it server-side per organization. Customer must email [support@runcaptain.com](mailto:support@runcaptain.com) (or their account manager) to receive their external\_id before they can finish setup.
* Request shape (auth block replaces aws\_access\_key\_id/aws\_secret\_access\_key):
  ```json
  {
    "bucket_name": "<bucket>",
    "bucket_region": "us-east-1",
    "auth": {
      "type": "assume_role",
      "role_arn": "arn:aws:iam::<CUSTOMER_ACCT>:role/<ROLE_NAME>",
      "external_id": "<captain-issued cap_org_*>"
    },
    "processing_type": "advanced"
  }
  ```
* Endpoints: `POST /v2/collections/{name}/index/s3`, `/index/s3/file`, `/index/s3/directory`. Same `auth` block on all three.
* aws\_access\_key\_id/aws\_secret\_access\_key is still supported on the same endpoints. Mutually exclusive with `auth`; pass exactly one shape (sending both, or neither, is rejected).
* Required IAM on customer role: `s3:ListBucket` on the bucket ARN, `s3:GetObject` on the bucket-contents ARN. Add `kms:Decrypt` if the bucket uses a customer-managed key.

Captain can read your S3 buckets without you ever sharing AWS access keys.
You create an IAM role in **your** AWS account that trusts a Captain-owned
principal, Captain calls `sts:AssumeRole` on it (with a Captain-issued
external ID) when an indexing job runs, and the temporary credentials are
discarded the moment the job finishes.

This is the auth method we recommend for production indexing and for
customers operating under formal compliance programs (SOC 2, ISO 27001,
HIPAA, FedRAMP). It maps cleanly onto AWS's recommended cross-account
access pattern, so most enterprise security reviews approve it on sight.

## Why use it

Cross-account role assumption gives you three things:

* **Short-lived session credentials.** Captain reads your bucket using
  AWS STS session credentials it mints on demand and discards when the
  job finishes. This is the standard AWS cross-account pattern. The
  role you create on your side is the only durable artifact, and you
  control it.
* **Full CloudTrail visibility.** Every read shows up in your AWS account
  as an `AssumeRole` event from a known Captain principal, with a
  session name traceable back to a specific indexing job, followed by
  the corresponding `s3:GetObject` and `s3:ListBucket` calls.
* **You own the permissions surface.** Want to scope to one bucket? One
  prefix? Layer in KMS conditions? Restrict by source IP or time of day?
  Standard IAM, all configured in your account.

You can revoke Captain's access at any time by editing your role's trust
policy or deleting the role outright. No coordination with Captain is
required.

## How it works

```mermaid
flowchart LR
    subgraph captain["Captain's AWS account"]
        principal["arn:aws:iam::149896762962:<br />role/captain-cross-account-ingestion"]
    end
    subgraph customer["Your AWS account"]
        role["arn:aws:iam::YOUR_ACCT:<br />role/CaptainS3ReadRole"]
        bucket[("S3 bucket")]
    end
    principal -- "sts:AssumeRole<br />(with external_id)" --> role
    role -- "s3:ListBucket<br />s3:GetObject" --> bucket
```

You create a role in your AWS account. Its trust policy names a single
Captain principal (`captain-cross-account-ingestion`) and requires the
external ID Captain issued you. When an indexing job runs, Captain calls
`sts:AssumeRole` on your role, gets a short-lived session (≤ 1 hour,
AWS-enforced), reads the bucket, and discards the session when the job
finishes. Long jobs re-assume automatically as the session approaches
expiry. Your trust policy gates assumption on both *the Captain
principal* and *the external ID matching*, so the role can't be
assumed by anyone else.

## Setup

### Step 1: Get your external ID from Captain

Email **[support@runcaptain.com](mailto:support@runcaptain.com)** with your Captain organization ID, or
ask your account manager directly if you're already in a pilot
conversation. We'll reply with a value of the form `cap_org_<24-hex>`,
usually same-day. Save it; you'll paste it into your IAM trust policy
in step 2.

### Step 2: Create the IAM role in your AWS account

1. Open the [AWS Console](https://console.aws.amazon.com/) → **IAM** → **Roles** → **Create role**.
2. **Trusted entity type**: select **Custom trust policy**, not "AWS account." The "AWS account" form doesn't let you bind to a specific role and external ID cleanly.
3. Paste the following trust policy verbatim, replacing `<EXTERNAL_ID_FROM_STEP_1>` with the value Captain sent you:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::149896762962:role/captain-cross-account-ingestion"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<EXTERNAL_ID_FROM_STEP_1>"
        }
      }
    }
  ]
}
```

Paste the `Principal.AWS` value exactly as shown. Don't substitute your
own AWS account ID, a Captain root ARN, or any other principal. This
is the only Captain identity we authorize to read customer S3 buckets,
and AssumeRole will fail with `AccessDenied` if your trust policy names
anything else.

4. Click **Next**. On the "Add permissions" page, click **Create policy** (this opens a new tab). In the JSON tab, paste the following, replacing `<YOUR_BUCKET_NAME>` with the bucket(s) you want Captain to index:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListBucket",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>"
    },
    {
      "Sid": "GetObjects",
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
    }
  ]
}
```

Save this policy as something like `CaptainS3ReadAccess`. If your
bucket uses a customer-managed KMS key, also grant `kms:Decrypt` on
that key's ARN.

5. Back in the role-creation tab, refresh the policy list, check the
   `CaptainS3ReadAccess` policy you just created, click **Next**.

6. Name the role (we suggest `CaptainS3ReadRole`), review, **Create role**.

7. After the role is created, copy its ARN. It looks like
   `arn:aws:iam::123456789012:role/CaptainS3ReadRole`.

### Step 3: Send Captain your role ARN

Reply to the same Captain thread with the role ARN from step 2. We'll
record it on our side, your integration flips from `pending` to `active`,
and you can run your first indexing call.

### Step 4: Run your first indexing call

Use the role ARN and external ID in place of the
`aws_access_key_id` / `aws_secret_access_key` fields:

```python
import requests

BASE_URL = "https://api.runcaptain.com"
API_KEY = "your_api_key"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{BASE_URL}/v2/collections/my_documents/index/s3",
    headers=headers,
    json={
        "bucket_name": "my-documents-bucket",
        "bucket_region": "us-east-1",
        "auth": {
            "type": "assume_role",
            "role_arn": "arn:aws:iam::123456789012:role/CaptainS3ReadRole",
            "external_id": "cap_org_xxxxxxxxxxxxxxxxxxxxxxxx"
        },
        "processing_type": "advanced",
        "skip_existing": True
    },
    timeout=60.0
)

if response.status_code in [200, 201]:
    data = response.json()
    print(f"Job started! ID: {data['job_id']}")
else:
    print(f"Error: {response.status_code}")
```

The same `auth` block works on all three S3 endpoints:

* `POST /v2/collections/{name}/index/s3`: index everything in a bucket
* `POST /v2/collections/{name}/index/s3/file`: index a single object
* `POST /v2/collections/{name}/index/s3/directory`: index everything under a path (treated as a prefix)

These endpoints also accept `aws_access_key_id` + `aws_secret_access_key`
in place of the `auth` block. Use whichever fits. A single request must
pick exactly one shape: provide the `auth` block, or the two access-key
fields, but not both and not neither.

## What success looks like

The API returns `{ "job_id": "...", "status": "pending" }` immediately.
Track progress via `GET /v2/jobs/{job_id}`. On success the job reports
files indexed, chunks processed, and tokens used.

In your AWS account, every Captain indexing call shows up in CloudTrail as:

* An `AssumeRole` event on **your** role with `RoleSessionName=captain-customer-<id>`.
* One or more `s3:ListBucket` / `s3:GetObject` events using the assumed
  session credentials.

You can revoke Captain's access at any time by deleting your IAM role,
or by editing its trust policy to remove the Captain principal. New
indexing jobs will fail immediately with `AccessDenied`; jobs already in
flight finish with whatever credentials they already minted (≤ 1 hour).

## Troubleshooting

Most failures are trust-policy typos. AWS error messages for
`AssumeRole` are intentionally generic, so you can't always tell which
field is wrong from the error alone. Check these in order:

* **`AccessDenied` when the indexing job starts:** the trust policy on
  your role doesn't allow the Captain principal. Check that
  `Principal.AWS` is exactly `arn:aws:iam::149896762962:role/captain-cross-account-ingestion`,
  with no typos.
* **`AccessDenied` shortly after:** the external\_id in your trust
  policy doesn't match what Captain sent. Re-paste it exactly (no
  whitespace, no quotes). If unsure, ask Captain to resend.
* **`s3:GetObject` failures:** the role is being assumed correctly but
  the permissions policy on the role is missing the action. Double-check
  the bucket name in the policy matches the bucket you're trying to
  index (both `arn:aws:s3:::<bucket>` for `ListBucket` and
  `arn:aws:s3:::<bucket>/*` for `GetObject`).
* **KMS-encrypted buckets:** the role needs `kms:Decrypt` on the
  permissions policy, scoped to the KMS key ARN. Without this,
  `s3:GetObject` returns the same `AccessDenied`. Your bucket's
  encryption config is in the AWS Console under **S3** → your bucket →
  **Properties** → **Default encryption**.

## Operational notes

* **Rotating the external ID.** If you ever need a new external ID
  (e.g., the old one was posted somewhere it shouldn't have been), reply
  to your Captain thread asking for a rotation. We'll mint a new one;
  you update the `Condition.StringEquals.sts:ExternalId` value in your
  trust policy. The old external ID stops working immediately.
* **Multiple buckets.** One IAM role can grant access to multiple buckets;
  list each bucket's ARN in the permissions policy. Your indexing call
  specifies which one to index per request.
* **Multiple AWS accounts.** If you have data spread across several AWS
  accounts, create one role per account using the same external ID.
  Each indexing call targets one role at a time.
* **Session length.** AWS limits each assumed-role session to 1 hour.
  Captain re-assumes automatically as the session nears expiry, so jobs
  longer than an hour run without manual intervention.
* **Auditing usage.** Filter CloudTrail by `userIdentity.arn` containing
  `captain-cross-account-ingestion` to see every Captain-initiated read
  on your S3 data.

## Access keys are also supported

The `aws_access_key_id` + `aws_secret_access_key` flow works on the same
endpoints. Some collections can use role-based auth while others use
access keys, and you can switch a collection from one to the other at
any time. Pick whichever fits the bucket and the team managing it. See
the [Connect Cloud Storage](./connect-cloud-storage) guide for the
access-key flow.