Cloud in 2026: Build a Zero-Trust Internal API Platform with AWS PrivateLink, mTLS, and Policy-as-Code

Most teams treat “internal APIs” as trusted by default because they run inside a VPC. In 2026, that assumption is risky. Lateral movement after a single credential leak is still one of the fastest ways to escalate an incident. A better pattern is zero-trust for service-to-service traffic: private network paths, strong workload identity, and explicit authorization checks on every call.

In this guide, you will build a practical AWS architecture for internal APIs using PrivateLink, mTLS certificates from ACM Private CA, and policy-based access controls. The stack is designed for teams that need strong isolation without exposing APIs to the public internet.

Architecture overview

  • Provider account: Hosts the internal API behind a Network Load Balancer (NLB).
  • Consumer account(s): Access API over AWS PrivateLink interface endpoints.
  • Identity: Client and server certificates for mTLS using ACM PCA.
  • Authorization: API layer checks SPIFFE-like client identity mapped from cert subject/SAN.
  • Observability: OpenTelemetry traces + structured logs + CloudWatch alarms.

Step 1: Define infrastructure with Terraform

Create a reusable module for the provider side: VPC endpoint service + NLB + target group.

Provider Terraform (simplified)

resource "aws_lb" "api_nlb" {
  name               = "internal-api-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids
}

resource "aws_lb_target_group" "api_tg" {
  name     = "internal-api-tg"
  port     = 8443
  protocol = "TLS"
  vpc_id   = var.vpc_id

  health_check {
    protocol = "HTTPS"
    path     = "/health"
    matcher  = "200"
  }
}

resource "aws_lb_listener" "tls" {
  load_balancer_arn = aws_lb.api_nlb.arn
  port              = 443
  protocol          = "TLS"
  certificate_arn   = aws_acm_certificate.server.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api_tg.arn
  }
}

resource "aws_vpc_endpoint_service" "api" {
  acceptance_required        = true
  network_load_balancer_arns = [aws_lb.api_nlb.arn]

  allowed_principals = [
    "arn:aws:iam::123456789012:root",
    "arn:aws:iam::210987654321:root"
  ]
}

This creates a private endpoint service that only approved consumer accounts can connect to. Keep acceptance_required=true so new endpoint requests are explicitly approved.

Step 2: Enable mTLS identity

Private connectivity is not enough. You also need strong caller identity. Issue short-lived client certificates from ACM Private CA and enforce mTLS at your API runtime (or sidecar proxy like Envoy).

Node.js API mTLS verification

import https from "node:https";
import fs from "node:fs";

const server = https.createServer(
  {
    key: fs.readFileSync("/certs/server.key"),
    cert: fs.readFileSync("/certs/server.crt"),
    ca: fs.readFileSync("/certs/ca.crt"),
    requestCert: true,
    rejectUnauthorized: true
  },
  (req, res) => {
    const cert = req.socket.getPeerCertificate(true);

    if (!req.client.authorized || !cert?.subject?.CN) {
      res.writeHead(401);
      return res.end(JSON.stringify({ error: "unauthorized" }));
    }

    const caller = cert.subject.CN; // e.g. svc.billing-api.prod
    const allowed = ["svc.orders-api.prod", "svc.reports-api.prod"];

    if (!allowed.includes(caller)) {
      res.writeHead(403);
      return res.end(JSON.stringify({ error: "forbidden", caller }));
    }

    res.writeHead(200, { "content-type": "application/json" });
    res.end(JSON.stringify({ ok: true, caller }));
  }
);

server.listen(8443, () => console.log("mTLS API listening on 8443"));

For production, avoid hardcoded allow-lists. Load authorization policy from DynamoDB, AWS AppConfig, or OPA/Rego bundles.

Step 3: Consumer account connection via PrivateLink

In each consumer account, create an interface VPC endpoint pointing to the provider service name.

resource "aws_vpc_endpoint" "provider_api" {
  vpc_id              = var.consumer_vpc_id
  service_name        = var.provider_service_name
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.consumer_subnet_ids
  private_dns_enabled = false
  security_group_ids  = [aws_security_group.endpoint.id]
}

Then publish a private Route 53 record in the consumer VPC, for example orders.internal.company, mapped to the endpoint ENI DNS name.

Step 4: Add policy and runtime guardrails

  1. Use SCPs to prevent accidental public load balancers for internal API accounts.
  2. Enforce TLS 1.2+ and minimum key sizes through deployment checks.
  3. Rotate client certs frequently (7-30 days) and automate revocation workflows.
  4. Require endpoint acceptance approvals through CI/CD, not manual console clicks.

Simple CI gate (bash)

#!/usr/bin/env bash
set -euo pipefail

terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Fail if any aws_lb has internal=false in protected workspaces
jq -e '
  .resource_changes[]
  | select(.type=="aws_lb")
  | .change.after.internal == false
' plan.json > /dev/null && {
  echo "Blocked: public load balancer detected in internal-api stack"
  exit 1
}

echo "Policy checks passed"

Step 5: Observe identity, latency, and denied calls

Track security and reliability together. A zero-trust setup is only successful if it is observable.

  • Security metrics: unauthorized/forbidden count by caller identity.
  • Reliability metrics: p95 latency, handshake failures, endpoint connection errors.
  • Change metrics: cert rotations, policy changes, endpoint accept/reject events.

Emit caller identity into logs for fast incident response, but never log private keys or raw certificate material.

Common pitfalls to avoid

  • Relying on source IP allow-lists as the primary identity model.
  • Skipping cert rotation automation, which turns mTLS into operational debt.
  • Sharing one client certificate across multiple services.
  • Treating PrivateLink as enough without application-level authorization.

Rollout checklist for week one

  • Protect one high-value API first (payments, auth, or billing).
  • Issue separate client certs per service and environment.
  • Enable denial logs and alert on unknown caller identities.
  • Run game-day tests: expired cert, revoked cert, and endpoint denial.
  • Measure p95 impact before and after mTLS enforcement.

Conclusion

This pattern gives you defense in depth for internal APIs: private transport with PrivateLink, cryptographic service identity with mTLS, and explicit authorization at request time. You can start with one critical API, validate latency overhead, then roll it out service by service.

If your team is moving toward multi-account AWS at scale, this architecture is a practical middle ground between basic VPC trust and a full service mesh rollout. It is incremental, auditable, and production-friendly.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Policy · Contact · Sitemap

© 7Tech – Programming and Tech Tutorials