At 1:40 AM on a Thursday, our deployment bot failed for the third time in one week. Same error, different symptom: AccessDenied in staging, missing secret in production, and then a panic rollback because someone had rotated a key manually but forgot to update one repository out of six. We had done what many teams do under pressure, we kept adding CI secrets and telling ourselves we would clean it up later.
That night became our cutover point. We moved to a GitHub Actions OIDC secrets rotation setup with short-lived AWS credentials and automated secret rotation for the few static credentials we still needed. The result was not magical perfection, but it was dramatically calmer operations, fewer urgent Slack threads, and far less credential sprawl.
This guide is the practical playbook we used, including tradeoffs, failure modes, and the checks that kept us honest.
Why we stopped trusting long-lived CI secrets
Long-lived CI secrets are convenient, until they are not. They create three recurring problems:
- Blast radius grows silently: the same key is copied across repos and environments.
- Rotation gets skipped: manual rotations are risky and easy to postpone.
- Attribution becomes fuzzy: when one key is shared broadly, incident forensics gets messy.
GitHub’s OIDC flow allows workflows to exchange a short-lived token for cloud credentials without storing long-lived AWS keys in repository secrets. AWS IAM best-practice guidance also strongly favors temporary credentials for workloads. OWASP’s secrets management guidance aligns with this: centralize, automate rotation, and minimize manual handling.
If you are building on related 7tech guides, this article pairs well with our posts on GitHub Actions deployment safety, OIDC-based AWS deploys, API security essentials, and Linux server hardening.
The architecture that worked for us
We split credentials into two groups:
- Cloud access credentials: replaced with temporary AWS credentials via OIDC.
- Application secrets (DB password, third-party API token): stored in AWS Secrets Manager with scheduled rotation where possible.
That distinction matters. OIDC removes static cloud keys from CI. It does not automatically remove all app-level secrets. Those still need lifecycle management, auditing, and rotation policy.
Step 1: Tight IAM trust policy for GitHub OIDC
The trust policy is your real security boundary. Keep it strict. Avoid broad wildcards unless you have a strong reason and compensating controls.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
}
}
}
]
}
Tradeoff: strict branch-based subject conditions reduce abuse risk, but they can slow down experimentation across temporary branches. Our compromise was separate roles for production and non-production with different constraints.
Step 2: Workflow with id-token permission, no static AWS keys
name: deploy-api
on:
push:
branches: ["main"]
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/gha-prod-deploy
aws-region: ap-south-1
- name: Pull runtime secret metadata
run: |
aws secretsmanager describe-secret \
--secret-id prod/myapp/db \
--query '{ARN:ARN,LastChangedDate:LastChangedDate}'
- name: Deploy
run: ./scripts/deploy.sh
Important detail: id-token: write only grants the workflow permission to request an OIDC token. It does not grant permission to mutate repository content by itself.
Step 3: Secrets rotation for residual static secrets
For database and third-party credentials, we moved to Secrets Manager rotation on a schedule where the target system supported safe password change workflows. For systems that could not rotate safely yet, we added shorter expiration windows and explicit runbooks.
Tradeoff: automatic rotation can break fragile integrations if consumers cache credentials too long. We solved this by setting low but realistic cache TTLs, and by forcing reconnect logic in app clients before enforcing aggressive rotation intervals.
A migration sequence that minimizes outages
- Inventory all secrets in CI, then label each as cloud-auth, app-secret, or obsolete.
- Introduce OIDC role in parallel and run dry deployments in staging.
- Remove static cloud keys from repo/org secrets only after successful parallel runs.
- Enable rotation for app secrets in waves, starting with lower-risk services.
- Add observability checks for STS assume-role failures and secret retrieval failures.
Do not attempt a big-bang migration unless your dependency map is very small. A staged approach catches hidden credential consumers that were never documented.
One habit that helped was writing a short “credential contract” per service: where secrets come from, how refresh happens, and what fallback behavior is acceptable. It sounds bureaucratic, but it eliminated many ambiguous ownership questions during incidents.
Troubleshooting: what broke for us and how we fixed it
1) Not authorized to perform sts:AssumeRoleWithWebIdentity
Cause: mismatch in sub condition (wrong branch pattern or environment subject).
Fix: inspect the actual token claims used by the job context, then align trust policy exactly. Start strict, test, then widen deliberately only if required.
2) Deployment works on main but fails in reusable workflow
Cause: reusable workflow context changed expected subject format.
Fix: define dedicated role conditions for reusable workflow subjects, and avoid overloading one role for every path.
3) App failures right after secret rotation
Cause: application kept old credentials in memory pool; connection pool did not recycle promptly.
Fix: shorten credential cache TTL, add proactive connection refresh, and alert on auth failure spikes immediately after rotation windows.
4) Teams reintroduced secrets through “temporary” fixes
Cause: no policy enforcement on new repository secrets.
Fix: add CI policy checks and review gates for secret creation requests, plus a monthly drift review against your baseline.
FAQ
Q1: Does OIDC completely eliminate secrets from CI/CD?
No. It removes long-lived cloud access keys from CI/CD, which is huge. But app-level credentials can still exist and must be managed with centralized storage and rotation.
Q2: Should I allow wildcard branches in the IAM trust policy?
Only if you truly need it. Wildcards speed developer flow, but they increase exposure. A safer pattern is separate roles: strict for production, broader for ephemeral environments.
Q3: How often should we rotate secrets?
There is no universal number. Choose intervals based on system fragility, credential criticality, and operational readiness. Faster rotation without resilient clients often causes avoidable incidents.
Actionable takeaways
- Implement GitHub Actions OIDC secrets rotation by first removing static cloud keys, then rotating remaining app secrets.
- Lock IAM trust policies to exact
audandsubclaims, and split prod vs non-prod roles. - Use temporary AWS credentials everywhere possible, especially in CI runners and external workloads.
- Treat Secrets Manager rotation as an application reliability project, not only a security checkbox.
- Add guardrails that prevent credential drift back into repositories after migration.
We did not end up with a perfect zero-secret world. We ended up with something better: fewer persistent credentials, clearer ownership, faster incident response, and a deployment pipeline we trust more than we did three months ago. If your CI/CD is still carrying old keys “just for now,” this is one of the highest-leverage cleanups you can make.

Leave a Reply