The Webhook Storm at 2:03 AM: A PHP 8.3 Blueprint for Idempotent, Verifiable, Queue-First Event Ingestion

At 2:03 AM, the payment provider did exactly what it promised. It retried.

Our endpoint had already processed the first delivery, but one downstream API call timed out, and the provider never saw a clean 2xx. So it sent the same event again. And again. By 2:11 AM, three customer records had diverged, one invoice was double-marked in analytics, and support was asking why a “successful payment” triggered two onboarding emails.

If you have ever shipped webhook consumers in a hurry, you know this pattern. The bug is rarely in your happy path. It lives in the boundary between signature verification, deduplication, and background processing. This guide walks through a practical PHP webhook idempotency pattern that has held up in noisy real-world systems.

I am deliberately staying provider-agnostic, but I am grounding key choices in Stripe and GitHub webhook guidance, plus MySQL’s duplicate-key behavior. If you want related reliability context, our posts on partial failures, retry boundaries, and data trustworthiness pair nicely with this one.

The ingestion contract, in one sentence

Verify authenticity, persist a dedupe key atomically, return 2xx fast, then process asynchronously.

That sentence hides tradeoffs:

Fast ack vs strict completion: a fast 2xx protects you from provider retry storms, but it means your queue/worker path becomes the real system of record.
At-least-once delivery vs exactly-once dreams: most webhook providers are at-least-once by design, so idempotency belongs in your app, not in hope.
DB-first durability vs in-memory speed: Redis-only dedupe is fast but can be fragile during restarts. A DB uniqueness constraint is slower but harder to fool.

Why teams still get duplicate effects even with “idempotency enabled”

In postmortems, I repeatedly see four mistakes:

Comparing signatures with normal string equality (timing attack risk, and subtle parsing bugs).
Hashing transformed request bodies after middleware normalization, instead of the raw bytes.
Writing “seen event” state after business side effects, not before.
Treating HTTP request idempotency keys as equivalent to webhook delivery idempotency keys. They solve related but different problems.

GitHub explicitly recommends validating X-Hub-Signature-256 with constant-time compare and using delivery identifiers to defend against replay. Stripe similarly emphasizes signature verification on the raw body and returning success quickly before heavy logic. Those are not optional details, they are design constraints.

A resilient PHP endpoint shape

Below is a compact endpoint skeleton showing webhook signature verification, replay window checks, and atomic dedupe before queue enqueue. Keep this handler boring. Boring is good.

<?php
// webhook.php
use PDO;

function verifyHmacSha256(string $rawBody, string $header, string $secret, int $maxSkewSec = 300): bool {
    // Example header format: t=1714049690,v1=abcdef...
    $parts = [];
    foreach (explode(',', $header) as $chunk) {
        [$k, $v] = array_pad(explode('=', trim($chunk), 2), 2, null);
        if ($k && $v) $parts[$k] = $v;
    }

    if (!isset($parts['t'], $parts['v1'])) return false;

    $ts = (int)$parts['t'];
    if (abs(time() - $ts) > $maxSkewSec) return false; // replay window guard

    $signedPayload = $ts . '.' . $rawBody;
    $expected = hash_hmac('sha256', $signedPayload, $secret);

    // constant-time comparison
    return hash_equals($expected, $parts['v1']);
}

$rawBody = file_get_contents('php://input') ?: '';
$signature = $_SERVER['HTTP_X_PROVIDER_SIGNATURE'] ?? '';
$eventId = $_SERVER['HTTP_X_PROVIDER_DELIVERY'] ?? null; // provider delivery id

if (!$eventId || !verifyHmacSha256($rawBody, $signature, getenv('WEBHOOK_SECRET'))) {
    http_response_code(401);
    exit;
}

$pdo = new PDO(getenv('DB_DSN'), getenv('DB_USER'), getenv('DB_PASS'), [
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
]);

$pdo->beginTransaction();

$insert = $pdo->prepare(
    'INSERT INTO webhook_inbox (provider, delivery_id, payload, received_at)
     VALUES (:provider, :delivery_id, :payload, NOW())
     ON DUPLICATE KEY UPDATE duplicate_count = duplicate_count + 1, last_seen_at = NOW()'
);
$insert->execute([
    ':provider' => 'stripe',
    ':delivery_id' => $eventId,
    ':payload' => $rawBody,
]);

$isNew = ($insert->rowCount() === 1); // 1 inserted, 2 updated in typical MySQL behavior

if ($isNew) {
    $q = $pdo->prepare('INSERT INTO webhook_jobs (delivery_id, status, available_at) VALUES (:id, "queued", NOW())');
    $q->execute([':id' => $eventId]);
}

$pdo->commit();
http_response_code(200);

Important nuance: MySQL reports affected rows differently for insert vs duplicate update. Use that behavior carefully, and test with your client settings. The safer pattern is to store explicit status columns and query by key in worker code rather than over-trusting rowCount() semantics.

Schema that makes duplicates boring

Use a dedicated inbox table, not your business tables, as the first landing zone. This gives you auditability, replay tooling, and cleaner incident response.

CREATE TABLE webhook_inbox (
  id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  provider VARCHAR(32) NOT NULL,
  delivery_id VARCHAR(128) NOT NULL,
  payload JSON NOT NULL,
  received_at DATETIME NOT NULL,
  last_seen_at DATETIME NULL,
  duplicate_count INT NOT NULL DEFAULT 0,
  processed_at DATETIME NULL,
  processing_error TEXT NULL,
  UNIQUE KEY uniq_provider_delivery (provider, delivery_id)
);

CREATE TABLE webhook_jobs (
  id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  delivery_id VARCHAR(128) NOT NULL,
  status ENUM('queued','running','done','dead') NOT NULL DEFAULT 'queued',
  attempts INT NOT NULL DEFAULT 0,
  available_at DATETIME NOT NULL,
  locked_at DATETIME NULL,
  worker_id VARCHAR(64) NULL,
  UNIQUE KEY uniq_delivery_job (delivery_id),
  KEY idx_status_available (status, available_at)
);

The duplicate key lives on (provider, delivery_id). That single constraint does more reliability work than many “advanced” middleware stacks.

Queue-first processing without lying to yourself

Once events are safely in the inbox, worker logic should be idempotent too. Do not assume “queued once” guarantees “applied once.” Worker crashes happen between side effects and commit boundaries.

A practical sequence:

Lock one queued job with FOR UPDATE SKIP LOCKED (or your queue equivalent).
Load inbox record by delivery id.
Apply business mutation behind your own idempotency key (for example, unique invoice_event_id in domain tables).
Mark inbox processed_at and job status done in one transaction.
On failure, increment attempts and backoff with jitter. Dead-letter after threshold.

If you are working through trust boundaries in CI/CD and runtime systems, this complements our artifact verification runbook. The same principle applies: verify inputs early, then gate side effects behind durable checks.

Troubleshooting: what breaks at 3 AM

1) “Signature mismatch” only in production

Likely cause: reverse proxy or framework middleware modifies body encoding/newlines.
Check: log hash of raw bytes at edge and app, compare lengths and checksum.
Fix: verify against exact raw request body before JSON decode; enforce UTF-8 handling expectations from provider docs.

2) Duplicate side effects despite unique inbox key

Likely cause: dedupe exists only at ingress, not in domain mutation layer.
Check: confirm business tables also have unique guards for event-driven writes.
Fix: add domain-level idempotency key and make writes upsert-safe.

3) Retry flood during downstream outage

Likely cause: endpoint waits for heavy processing before returning 2xx.
Check: p95 webhook response time and timeout correlation with provider redeliveries.
Fix: move non-essential logic to worker path, return 2xx right after durable enqueue.

FAQ

Do I need both signature verification and IP allowlisting?

Yes, when feasible. Signature verification is your primary authenticity check. IP allowlisting can reduce noise, but provider IP ranges change and should not be your only control.

Can Redis replace database deduplication?

Redis can be a helpful fast path, but treat it as advisory unless persistence and failover are engineered carefully. For financial or account-critical flows, keep a durable uniqueness constraint in your primary datastore.

Should I drop old delivery IDs to save storage?

Yes, with a retention policy. Keep enough history to cover provider retry windows, audit needs, and incident forensics. Many teams keep full payloads for 7 to 30 days, then archive or compact to metadata-only records.

Actionable takeaways for this week

Implement a single ingress invariant: verify signature on raw bytes, then atomically upsert inbox record.
Add a unique domain idempotency key wherever webhook events create money movement, entitlements, or user comms.
Measure webhook handler p95 and failure rate separately from worker success rate. Different SLOs, different bottlenecks.
Run one replay drill from stored inbox data every sprint so recovery is practiced, not theoretical.
Document provider-specific retry behavior and response timeout expectations next to your runbook.

If your current webhook path mixes verification, business writes, and third-party calls in one request cycle, do not rewrite everything at once. Split ingress and processing first. That single architectural move usually removes the worst reliability pain immediately.