The Day We Deleted 43 CI Secrets: A Practical Playbook for OIDC and Automated Secret Rotation

GitHub Actions OIDC secrets rotation with temporary AWS credentials

At 1:40 AM on a Thursday, our deployment bot failed for the third time in one week. Same error, different symptom: AccessDenied in staging, missing secret in production, and then a panic rollback because someone had rotated a key manually but forgot to update one repository out of six. We had done what many teams do under pressure, we kept adding CI secrets and telling ourselves we would clean it up later.

That night became our cutover point. We moved to a GitHub Actions OIDC secrets rotation setup with short-lived AWS credentials and automated secret rotation for the few static credentials we still needed. The result was not magical perfection, but it was dramatically calmer operations, fewer urgent Slack threads, and far less credential sprawl.

This guide is the practical playbook we used, including tradeoffs, failure modes, and the checks that kept us honest.

Why we stopped trusting long-lived CI secrets

Long-lived CI secrets are convenient, until they are not. They create three recurring problems:

  • Blast radius grows silently: the same key is copied across repos and environments.
  • Rotation gets skipped: manual rotations are risky and easy to postpone.
  • Attribution becomes fuzzy: when one key is shared broadly, incident forensics gets messy.

GitHub’s OIDC flow allows workflows to exchange a short-lived token for cloud credentials without storing long-lived AWS keys in repository secrets. AWS IAM best-practice guidance also strongly favors temporary credentials for workloads. OWASP’s secrets management guidance aligns with this: centralize, automate rotation, and minimize manual handling.

If you are building on related 7tech guides, this article pairs well with our posts on GitHub Actions deployment safety, OIDC-based AWS deploys, API security essentials, and Linux server hardening.

The architecture that worked for us

We split credentials into two groups:

  1. Cloud access credentials: replaced with temporary AWS credentials via OIDC.
  2. Application secrets (DB password, third-party API token): stored in AWS Secrets Manager with scheduled rotation where possible.

That distinction matters. OIDC removes static cloud keys from CI. It does not automatically remove all app-level secrets. Those still need lifecycle management, auditing, and rotation policy.

Step 1: Tight IAM trust policy for GitHub OIDC

The trust policy is your real security boundary. Keep it strict. Avoid broad wildcards unless you have a strong reason and compensating controls.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
      }
    }
  ]
}

Tradeoff: strict branch-based subject conditions reduce abuse risk, but they can slow down experimentation across temporary branches. Our compromise was separate roles for production and non-production with different constraints.

Step 2: Workflow with id-token permission, no static AWS keys

name: deploy-api
on:
  push:
    branches: ["main"]

permissions:
  id-token: write
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/gha-prod-deploy
          aws-region: ap-south-1

      - name: Pull runtime secret metadata
        run: |
          aws secretsmanager describe-secret \
            --secret-id prod/myapp/db \
            --query '{ARN:ARN,LastChangedDate:LastChangedDate}'

      - name: Deploy
        run: ./scripts/deploy.sh

Important detail: id-token: write only grants the workflow permission to request an OIDC token. It does not grant permission to mutate repository content by itself.

Step 3: Secrets rotation for residual static secrets

For database and third-party credentials, we moved to Secrets Manager rotation on a schedule where the target system supported safe password change workflows. For systems that could not rotate safely yet, we added shorter expiration windows and explicit runbooks.

Tradeoff: automatic rotation can break fragile integrations if consumers cache credentials too long. We solved this by setting low but realistic cache TTLs, and by forcing reconnect logic in app clients before enforcing aggressive rotation intervals.

A migration sequence that minimizes outages

  1. Inventory all secrets in CI, then label each as cloud-auth, app-secret, or obsolete.
  2. Introduce OIDC role in parallel and run dry deployments in staging.
  3. Remove static cloud keys from repo/org secrets only after successful parallel runs.
  4. Enable rotation for app secrets in waves, starting with lower-risk services.
  5. Add observability checks for STS assume-role failures and secret retrieval failures.

Do not attempt a big-bang migration unless your dependency map is very small. A staged approach catches hidden credential consumers that were never documented.

One habit that helped was writing a short “credential contract” per service: where secrets come from, how refresh happens, and what fallback behavior is acceptable. It sounds bureaucratic, but it eliminated many ambiguous ownership questions during incidents.

Troubleshooting: what broke for us and how we fixed it

1) Not authorized to perform sts:AssumeRoleWithWebIdentity

Cause: mismatch in sub condition (wrong branch pattern or environment subject).
Fix: inspect the actual token claims used by the job context, then align trust policy exactly. Start strict, test, then widen deliberately only if required.

2) Deployment works on main but fails in reusable workflow

Cause: reusable workflow context changed expected subject format.
Fix: define dedicated role conditions for reusable workflow subjects, and avoid overloading one role for every path.

3) App failures right after secret rotation

Cause: application kept old credentials in memory pool; connection pool did not recycle promptly.
Fix: shorten credential cache TTL, add proactive connection refresh, and alert on auth failure spikes immediately after rotation windows.

4) Teams reintroduced secrets through “temporary” fixes

Cause: no policy enforcement on new repository secrets.
Fix: add CI policy checks and review gates for secret creation requests, plus a monthly drift review against your baseline.

FAQ

Q1: Does OIDC completely eliminate secrets from CI/CD?

No. It removes long-lived cloud access keys from CI/CD, which is huge. But app-level credentials can still exist and must be managed with centralized storage and rotation.

Q2: Should I allow wildcard branches in the IAM trust policy?

Only if you truly need it. Wildcards speed developer flow, but they increase exposure. A safer pattern is separate roles: strict for production, broader for ephemeral environments.

Q3: How often should we rotate secrets?

There is no universal number. Choose intervals based on system fragility, credential criticality, and operational readiness. Faster rotation without resilient clients often causes avoidable incidents.

Actionable takeaways

  • Implement GitHub Actions OIDC secrets rotation by first removing static cloud keys, then rotating remaining app secrets.
  • Lock IAM trust policies to exact aud and sub claims, and split prod vs non-prod roles.
  • Use temporary AWS credentials everywhere possible, especially in CI runners and external workloads.
  • Treat Secrets Manager rotation as an application reliability project, not only a security checkbox.
  • Add guardrails that prevent credential drift back into repositories after migration.

We did not end up with a perfect zero-secret world. We ended up with something better: fewer persistent credentials, clearer ownership, faster incident response, and a deployment pipeline we trust more than we did three months ago. If your CI/CD is still carrying old keys “just for now,” this is one of the highest-leverage cleanups you can make.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Policy · Contact · Sitemap

© 7Tech – Programming and Tech Tutorials