AWS S3 with cross-account IAM role
AWS S3 with cross-account IAM role
Captain can read your S3 buckets without you ever sharing AWS access keys.
You create an IAM role in your AWS account that trusts a Captain-owned
principal, Captain calls sts:AssumeRole on it (with a Captain-issued
external ID) when an indexing job runs, and the temporary credentials are
discarded the moment the job finishes.
This is the auth method we recommend for production indexing and for customers operating under formal compliance programs (SOC 2, ISO 27001, HIPAA, FedRAMP). It maps cleanly onto AWS’s recommended cross-account access pattern, so most enterprise security reviews approve it on sight.
Before you start, you need to contact Captain to receive an external ID.
The external ID is a short string we generate server-side per organization and pin to your AWS integration. Without it, your IAM role’s trust policy can’t be set up correctly. Email support@runcaptain.com (or message your Captain account manager) with your organization ID. We’ll mint your external ID and send it back, usually same-day.
It looks like cap_org_xxxxxxxxxxxxxxxxxxxxxxxx (24 hex characters
after the prefix). Each organization gets a unique one. Don’t reuse
another customer’s external ID and don’t generate your own. Captain
will reject the indexing call if the value doesn’t match what we have
stored.
Why use it
Cross-account role assumption gives you three things:
- Short-lived session credentials. Captain reads your bucket using AWS STS session credentials it mints on demand and discards when the job finishes. This is the standard AWS cross-account pattern. The role you create on your side is the only durable artifact, and you control it.
- Full CloudTrail visibility. Every read shows up in your AWS account
as an
AssumeRoleevent from a known Captain principal, with a session name traceable back to a specific indexing job, followed by the correspondings3:GetObjectands3:ListBucketcalls. - You own the permissions surface. Want to scope to one bucket? One prefix? Layer in KMS conditions? Restrict by source IP or time of day? Standard IAM, all configured in your account.
You can revoke Captain’s access at any time by editing your role’s trust policy or deleting the role outright. No coordination with Captain is required.
How it works
You create a role in your AWS account. Its trust policy names a single
Captain principal (captain-cross-account-ingestion) and requires the
external ID Captain issued you. When an indexing job runs, Captain calls
sts:AssumeRole on your role, gets a short-lived session (≤ 1 hour,
AWS-enforced), reads the bucket, and discards the session when the job
finishes. Long jobs re-assume automatically as the session approaches
expiry. Your trust policy gates assumption on both the Captain
principal and the external ID matching, so the role can’t be
assumed by anyone else.
Setup
Step 1: Get your external ID from Captain
Email support@runcaptain.com with your Captain organization ID, or
ask your account manager directly if you’re already in a pilot
conversation. We’ll reply with a value of the form cap_org_<24-hex>,
usually same-day. Save it; you’ll paste it into your IAM trust policy
in step 2.
Step 2: Create the IAM role in your AWS account
- Open the AWS Console → IAM → Roles → Create role.
- Trusted entity type: select Custom trust policy, not “AWS account.” The “AWS account” form doesn’t let you bind to a specific role and external ID cleanly.
- Paste the following trust policy verbatim, replacing
<EXTERNAL_ID_FROM_STEP_1>with the value Captain sent you:
Paste the Principal.AWS value exactly as shown. Don’t substitute your
own AWS account ID, a Captain root ARN, or any other principal. This
is the only Captain identity we authorize to read customer S3 buckets,
and AssumeRole will fail with AccessDenied if your trust policy names
anything else.
- Click Next. On the “Add permissions” page, click Create policy (this opens a new tab). In the JSON tab, paste the following, replacing
<YOUR_BUCKET_NAME>with the bucket(s) you want Captain to index:
Save this policy as something like CaptainS3ReadAccess. If your
bucket uses a customer-managed KMS key, also grant kms:Decrypt on
that key’s ARN.
-
Back in the role-creation tab, refresh the policy list, check the
CaptainS3ReadAccesspolicy you just created, click Next. -
Name the role (we suggest
CaptainS3ReadRole), review, Create role. -
After the role is created, copy its ARN. It looks like
arn:aws:iam::123456789012:role/CaptainS3ReadRole.
Step 3: Send Captain your role ARN
Reply to the same Captain thread with the role ARN from step 2. We’ll
record it on our side, your integration flips from pending to active,
and you can run your first indexing call.
Step 4: Run your first indexing call
Use the role ARN and external ID in place of the
aws_access_key_id / aws_secret_access_key fields:
The same auth block works on all three S3 endpoints:
POST /v2/collections/{name}/index/s3: index everything in a bucketPOST /v2/collections/{name}/index/s3/file: index a single objectPOST /v2/collections/{name}/index/s3/directory: index everything under a path (treated as a prefix)
These endpoints also accept aws_access_key_id + aws_secret_access_key
in place of the auth block. Use whichever fits. A single request must
pick one shape, not both (sending both returns HTTP 422).
What success looks like
The API returns { "job_id": "...", "status": "pending" } immediately.
Track progress via GET /v2/jobs/{job_id}. On success the job reports
files indexed, chunks processed, and tokens used.
In your AWS account, every Captain indexing call shows up in CloudTrail as:
- An
AssumeRoleevent on your role withRoleSessionName=captain-customer-<id>. - One or more
s3:ListBucket/s3:GetObjectevents using the assumed session credentials.
You can revoke Captain’s access at any time by deleting your IAM role,
or by editing its trust policy to remove the Captain principal. New
indexing jobs will fail immediately with AccessDenied; jobs already in
flight finish with whatever credentials they already minted (≤ 1 hour).
Troubleshooting
Most failures are trust-policy typos. AWS error messages for
AssumeRole are intentionally generic, so you can’t always tell which
field is wrong from the error alone. Check these two first:
AccessDeniedwhen the indexing job starts: the trust policy on your role doesn’t allow the Captain principal. Check thatPrincipal.AWSis exactlyarn:aws:iam::149896762962:role/captain-cross-account-ingestion, with no typos.AccessDeniedshortly after: the external_id in your trust policy doesn’t match what Captain sent. Re-paste it exactly (no whitespace, no quotes). If unsure, ask Captain to resend.
s3:GetObject failures mean the role is being assumed correctly but
the permissions policy on the role is missing the action. Double-check
the bucket name in the policy matches the bucket you’re trying to index
(both arn:aws:s3:::<bucket> for ListBucket and
arn:aws:s3:::<bucket>/* for GetObject).
KMS-encrypted buckets need kms:Decrypt on the role’s permissions
policy, scoped to the KMS key ARN. Without this, s3:GetObject returns
the same AccessDenied. Your bucket’s encryption config is in the AWS
Console under S3 → your bucket → Properties → Default encryption.
Operational notes
- Rotating the external ID. If you ever need a new external ID
(e.g., the old one was posted somewhere it shouldn’t have been), reply
to your Captain thread asking for a rotation. We’ll mint a new one;
you update the
Condition.StringEquals.sts:ExternalIdvalue in your trust policy. The old external ID stops working immediately. - Multiple buckets. One IAM role can grant access to multiple buckets; list each bucket’s ARN in the permissions policy. Your indexing call specifies which one to index per request.
- Multiple AWS accounts. If you have data spread across several AWS accounts, create one role per account using the same external ID. Each indexing call targets one role at a time.
- Session length. AWS limits each assumed-role session to 1 hour. Captain re-assumes automatically as the session nears expiry, so jobs longer than an hour run without manual intervention.
- Auditing usage. Filter CloudTrail by
userIdentity.arncontainingcaptain-cross-account-ingestionto see every Captain-initiated read on your S3 data.
Access keys are also supported
The aws_access_key_id + aws_secret_access_key flow works on the same
endpoints. Some collections can use role-based auth while others use
access keys, and you can switch a collection from one to the other at
any time. Pick whichever fits the bucket and the team managing it. See
the Connect Cloud Storage guide for the
access-key flow.