The Offboarding Incident: Replacing Fragile PAT Scripts with GitHub App Installation Tokens

At 3:17 AM, our deployment bot stopped replying.

Nothing was “down” yet, but releases were piling up and one Slack channel was filling with screenshots of 403 Resource not accessible by personal access token. The fun part: the engineer who originally created that token had left two weeks earlier, and an org policy cleanup finally removed their access. The scripts did not fail loudly, they failed weirdly. One job could still read repos, another could not write checks, and a third failed only in one org.

That incident forced a cleanup we should have done earlier: move automation off long-lived user tokens and onto GitHub App installation tokens.

This guide is the practical path we used, including what broke, what improved, and the tradeoffs you should accept upfront.

Why PAT-based automation fails at the worst time

Personal access tokens are still useful for quick tests, but they age poorly in team automation:

They are tied to a user lifecycle (role changes, offboarding, MFA resets).
They are easy to over-scope, especially with classic PATs.
Ownership is often unclear after six months.

GitHub’s own guidance leans this way too: for long-lived integrations, use a GitHub App; use PATs for short-lived scripts and testing. Fine-grained PATs are better than classic tokens, but they still inherit user coupling and feature limits in some scenarios.

The migration pattern that actually works

We did not rewrite everything. We kept existing workflows and swapped identity first.

Create a GitHub App with minimal permissions per task (do not start with broad write access).
Install it only on repos the automation really needs.
Generate short-lived installation tokens at runtime.
Migrate one workflow family at a time, then remove matching PAT secrets.

That sequence kept blast radius low and made rollback boring.

Design your permissions before you touch workflows

The biggest migration mistake is technical, but it starts as organizational: teams copy PAT-era behavior into app permissions. A better approach is to map each automation job to a small capability list, then map capabilities to app permissions.

Triage bot: issues write, pull requests write, metadata read.
Release note bot: contents read, pull requests read, discussions write (if needed).
Tagging bot: contents write, metadata read.

Do this in a spreadsheet first. It sounds boring, but it saves days of “why is this endpoint denied?” debugging later.

Also separate automations by trust level. If one bot comments on pull requests and another can create releases, they should not share one app identity. Separate apps can feel heavier initially, but they reduce blast radius and make audits easier.

Code pattern 1: Use a GitHub App token inside GitHub Actions

For workflow automation, this is the simplest and safest shape:

name: label-and-comment

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  triage:
    runs-on: ubuntu-latest
    steps:
      - name: Generate installation token
        id: app-token
        uses: actions/create-github-app-token@v1
        with:
          app-id: ${{ secrets.BOT_APP_ID }}
          private-key: ${{ secrets.BOT_APP_PRIVATE_KEY }}

      - name: Label PR
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          gh api \
            -X POST \
            repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/labels \
            -f labels[]='needs-review'

      - name: Add contextual comment
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          gh api \
            -X POST \
            repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments \
            -f body='Automated triage complete. Security checks queued.'

Important detail: this token is short-lived (about one hour for installation tokens), so it naturally limits exposure compared to static secrets.

Code pattern 2: Server script to mint installation tokens for external jobs

If your automation runs outside Actions (self-hosted workers, cron jobs, queue consumers), mint tokens just-in-time:

#!/usr/bin/env bash
set -euo pipefail

APP_ID="${GITHUB_APP_ID}"
INSTALLATION_ID="${GITHUB_INSTALLATION_ID}"
KEY_PATH="${GITHUB_APP_PRIVATE_KEY_PATH}"

b64url() {
  openssl base64 -A | tr '+/' '-_' | tr -d '='
}

now=$(date +%s)
iat=$((now - 60))
exp=$((now + 540))  # keep JWT short

header='{"alg":"RS256","typ":"JWT"}'
payload="{\"iat\":${iat},\"exp\":${exp},\"iss\":\"${APP_ID}\"}"

unsigned="$(printf '%s' "$header" | b64url).$(printf '%s' "$payload" | b64url)"
sig=$(printf '%s' "$unsigned" | openssl dgst -sha256 -sign "$KEY_PATH" | b64url)
jwt="${unsigned}.${sig}"

token_json=$(curl -sS -X POST \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer ${jwt}" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  "https://api.github.com/app/installations/${INSTALLATION_ID}/access_tokens")

token=$(printf '%s' "$token_json" | jq -r '.token')
expires_at=$(printf '%s' "$token_json" | jq -r '.expires_at')

if [[ -z "$token" || "$token" == "null" ]]; then
  echo "failed to mint installation token" >&2
  echo "$token_json" >&2
  exit 1
fi

echo "token expires at: ${expires_at}" >&2
echo "$token"

Use that token for one job unit, then discard it. Do not cache it across workers unless you also handle expiry and invalidation cleanly.

Tradeoffs you should accept before migrating

More setup upfront: app registration, permissions model, installation scoping.
Permission debugging is stricter: which is good for security, annoying on day one.
Some edge APIs still differ: check endpoint compatibility and required permissions before swapping tokens blindly.

But in return, you get cleaner ownership boundaries, short-lived credentials, and automations that survive employee churn.

Troubleshooting: five failures we actually hit

1) “Resource not accessible by integration”

Usually means missing app permission or repo not included in installation scope. Fix in app settings, then reinstall or update installation access.

2) Token works in one repo, fails in another

Installation is likely “selected repositories,” not “all repositories,” and the second repo is missing.

3) Intermittent 401s after long jobs

Installation tokens expire quickly. Regenerate token per job phase or refresh before long-running API batches.

4) Secondary rate limiting during migration scripts

Burst loops can trigger limits. Serialize writes, back off on 429/retry-after, and avoid checking /rate_limit excessively.

5) Private key rotation anxiety

Keep two active keys during transition, rotate one environment at a time, and monitor token-mint success before deleting old key material.

Related 7Tech deep dives

FAQ

Should we delete all PATs immediately?

No. Keep a short transition window. Migrate critical automations first, then revoke PATs in small batches with rollback notes.

Is a fine-grained PAT still acceptable anywhere?

Yes, for short-lived personal scripts or narrow manual tasks. For shared, long-lived, organization automation, GitHub Apps are usually the safer default.

Can we use `GITHUB_TOKEN` instead of a GitHub App token?

Sometimes yes. For repo-local workflows, GITHUB_TOKEN is often enough. Use a GitHub App when you need cross-repo/org access, tighter identity boundaries, or external persistent services.

A practical 7-day migration rollout

If you need momentum without chaos, this pacing works well:

Day 1: inventory PAT-based jobs and mark business-critical ones.
Day 2: create app(s), define minimum permissions, and install on selected repos.
Day 3-4: migrate one workflow family and add explicit token-expiry handling.
Day 5: remove corresponding PAT secrets and verify no hidden dependencies remain.
Day 6-7: run offboarding simulation, rotate keys once, and document recovery steps.

By the end of week one, you should have fewer static credentials, clearer ownership, and fewer “works on my token” incidents.

Actionable takeaways

Pick one automation domain and migrate identity first, logic second.
Grant minimum app permissions and selected repo access, then expand only when needed.
Generate installation tokens just-in-time and design for expiry from day one.
Remove old PAT secrets immediately after each successful migration slice.
Track token-mint failures as a first-class reliability metric.

If your team has ever lost a night to a mystery 403, this migration pays for itself faster than most platform projects.