The Launch-Day API Throttle: An ASP.NET Core 9 Runbook for Partitioned Rate Limiting, Retry-After, and Real Backpressure

At 9:02 AM on a Monday, our “simple” partner API did exactly what we asked it to do and still broke the business. CPU looked normal. Database looked healthy. But support tickets exploded: premium customers were getting random 429 responses while a noisy integration from one tenant kept hammering us with retries.

The incident was not a DDoS. It was worse in one way: we were hurting our best users with our own traffic shaping mistakes. We had a global limit, but no meaningful partitioning. We returned 429, but our clients ignored Retry-After. And we had no guardrail to stop retry storms from multiplying the problem.

This runbook is how we fixed it in ASP.NET Core 9 with practical, layered controls: partitioned inbound limits, targeted endpoint policies, and polite outbound retry behavior. If you run a paid API, this is one of the highest-leverage reliability upgrades you can ship this quarter.

The architecture mistake most teams make

Many teams start with “100 requests per minute” and call it done. That seems reasonable until one identity, one token, or one NATed network consumes the whole budget. A single global bucket is easy to configure and hard to operate fairly.

Microsoft’s ASP.NET Core rate limiting middleware supports partitioned limiters so you can shape traffic by meaningful keys (for example API key, tenant ID, or authenticated user) instead of punishing everyone equally. That is the difference between “throttling” and “policy.”

Also important, RFC 6585 defines 429 Too Many Requests and explicitly allows a Retry-After hint. If your server emits it but clients ignore it, your limiter becomes a chaos amplifier rather than a protection layer.

A practical baseline policy for ASP.NET Core 9

Below is a production-style baseline I now prefer:

A global partitioned fixed-window limiter (fairness by API key or user).
A chained per-partition concurrency limiter (backpressure under bursts).
An explicit rejection handler that emits machine-readable guidance.

using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.OnRejected = async (context, ct) =>
    {
        // Lease metadata may contain a retry hint
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out TimeSpan retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                Math.Ceiling(retryAfter.TotalSeconds).ToString();
        }

        context.HttpContext.Response.ContentType = "application/problem+json";
        await context.HttpContext.Response.WriteAsJsonAsync(new
        {
            type = "https://httpstatuses.com/429",
            title = "Too many requests",
            detail = "Rate limit exceeded for this identity. Retry later.",
            status = 429
        }, cancellationToken: ct);
    };

    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
    {
        var identity = httpContext.User.Identity?.IsAuthenticated == true
            ? httpContext.User.Identity!.Name!
            : httpContext.Request.Headers["X-Api-Key"].ToString();

        var partitionKey = string.IsNullOrWhiteSpace(identity)
            ? "anonymous"
            : identity;

        return RateLimitPartition.CreateChained(
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey,
                _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 120,
                    Window = TimeSpan.FromMinutes(1),
                    QueueLimit = 0,
                    AutoReplenishment = true
                }),
            RateLimitPartition.GetConcurrencyLimiter(
                partitionKey,
                _ => new ConcurrencyLimiterOptions
                {
                    PermitLimit = 30,
                    QueueLimit = 20,
                    QueueProcessingOrder = QueueProcessingOrder.OldestFirst
                })
        );
    });

    // Tighter policy for expensive endpoints
    options.AddSlidingWindowLimiter("report-export", limiterOptions =>
    {
        limiterOptions.PermitLimit = 6;
        limiterOptions.Window = TimeSpan.FromMinutes(1);
        limiterOptions.SegmentsPerWindow = 6;
        limiterOptions.QueueLimit = 0;
    });
});

var app = builder.Build();
app.UseRateLimiter();

app.MapGet("/api/export", async () => Results.Ok("export started"))
   .RequireRateLimiting("report-export");

app.Run();

Tradeoff note: fixed windows are simpler and predictable for support teams, while token bucket and sliding window can smooth spikes better. Choose by operational clarity first, then optimize once you have real traffic data.

Don’t stop at inbound controls: make clients behave

One of our worst findings during the incident was a client that retried 429 instantly, five times, from every worker. In effect, it multiplied rejection traffic while doing zero useful work.

The .NET resilience stack (built on Polly) gives you a clean pattern: inspect 429, honor server hints, and keep retry budgets finite.

using System.Net;
using Microsoft.Extensions.Http.Resilience;
using Polly;
using Polly.Retry;

builder.Services.AddHttpClient<PartnerApiClient>()
    .AddResilienceHandler("partner-api", pipeline =>
    {
        pipeline.AddRetry(new RetryStrategyOptions<HttpResponseMessage>
        {
            MaxRetryAttempts = 3,
            ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                .HandleResult(r =>
                    r.StatusCode == HttpStatusCode.TooManyRequests ||
                    (int)r.StatusCode >= 500),
            DelayGenerator = args =>
            {
                // Honor Retry-After when present
                if (args.Outcome.Result?.Headers.RetryAfter?.Delta is TimeSpan serverDelay)
                    return new ValueTask<TimeSpan?>(serverDelay);

                // Fallback to bounded exponential delay
                var seconds = Math.Min(Math.Pow(2, args.AttemptNumber), 8);
                return new ValueTask<TimeSpan?>(TimeSpan.FromSeconds(seconds));
            }
        });
    });

When inbound and outbound policies align, the system recovers quickly instead of oscillating between overload and retry storms.

Rollout plan that does not surprise customers

Rate limits fail most often during rollout, not design. The safe pattern is progressive tightening with observability at each step:

Observe-only phase (24 hours): keep limits permissive, emit what would have been rejected, and inspect partition distribution.
Soft enforcement: enable real 429 only on clearly abusive partitions while preserving a generous baseline for normal traffic.
Targeted hardening: tighten only hotspots (expensive endpoints, known noisy tenants), then re-check support and business metrics.

During rollout, watch four signals together, not in isolation: rejection rate, p95/p99 latency, successful requests per tenant tier, and retry amplification ratio. If rejection rate drops but tail latency rises, you probably over-queued concurrency. If rejection and retries both rise, your client behavior is still fighting your policy.

A practical SLO guardrail I like: if new limiter policy causes more than a small agreed delta in paid-user success rate, auto-rollback to previous limits and page the owner. This prevents “technically correct” policies from silently degrading customer trust.

Tradeoffs that matter in real systems

Per-user fairness vs operational simplicity: partitioning by tenant or API key is fairer, but requires identity hygiene. If identity extraction is inconsistent across gateways and app code, you can create accidental shared buckets.

Fail-fast vs queued behavior: concurrency queues improve throughput for brief bursts, but can hurt interactive latency under sustained load. For user-facing APIs, favor lower queue limits and explicit 429 over long waiting rooms.

App-level limits vs edge limits: ASP.NET Core limiter protects your business logic. CDN/WAF limits protect your perimeter. You usually need both, because each solves a different layer of risk.

Where this fits with your existing 7Tech playbooks

If you already applied cache revalidation patterns from this ASP.NET output caching guide, rate limiting is the next logical layer. Caching reduces avoidable work, while rate limiting protects the unavoidable work.

If your queue and retry logic resembles the failure modes in this Node.js retry spiral post, enforce retry budgets now, before peak traffic does it for you.

For data consistency concerns after throttling, pair this with the outbox/inbox reliability blueprint. And if your deployment cadence causes policy drift, lock release flow with a merge queue workflow.

Troubleshooting: when limits are “configured” but users still suffer

1) Premium users still get random 429s

Likely cause: partition key falls back to shared values (for example all anonymous traffic).
Fix: verify key extraction order and add telemetry tags for partition key source (user, API key, tenant).

2) P99 latency spikes after adding limiter

Likely cause: concurrency queue too deep, causing slow tail behavior.
Fix: lower QueueLimit, prefer fail-fast for interactive APIs, and reserve queueing for idempotent, background-friendly calls.

3) Clients keep hammering after 429

Likely cause: Retry-After not set or ignored.
Fix: emit Retry-After consistently, verify SDK behavior in integration tests, and cap retry attempts.

4) You still fall over during attack-like bursts

Likely cause: app-level limiter is being asked to solve perimeter-level abuse.
Fix: keep app limiter for fairness, but enforce volumetric controls at CDN/WAF or cloud edge.

FAQ

Should I use token bucket or fixed window for APIs?

Use fixed window when you need very clear support semantics (“120 requests per minute”). Use token bucket when you need smoother burst handling. Both are valid if telemetry is in place.

Do I need separate limits per endpoint?

Not for every endpoint. Start with one global partitioned policy, then add tighter named policies only for expensive or abuse-prone routes like export, search, or auth flows.

Is 429 enough for security protection?

No. 429 is an application-level fairness and stability tool. Keep network-level controls (CDN/WAF/DDoS protection) for volumetric abuse, then use app limits for tenant isolation and graceful degradation.

Actionable takeaways for this week

Ship one global partitioned rate limiter in ASP.NET Core 9 by API key or tenant, not by host header alone.
Return consistent 429 + Retry-After responses with machine-readable error bodies.
Add one stricter named policy for your most expensive endpoint (export/report/search).
Update your .NET clients to honor Retry-After header handling and cap retries at a small budget.
Track rejected requests, queue depth, and partition hot spots before and after rollout.

If you take only one idea from this post, take this: rate limiting is not just a middleware checkbox. It is a fairness contract between your API and your customers. The moment you partition limits by identity and teach clients to retry responsibly, incidents get shorter, support gets calmer, and your best users stop paying the price for everyone else’s burst traffic.

Primary keyword: ASP.NET Core 9 rate limiting
Secondary keywords: partitioned rate limiter, Retry-After header handling, token bucket vs concurrency limiter

7Tech – Programming and Tech Tutorials