Most teams treat “internal APIs” as trusted by default because they run inside a VPC. In 2026, that assumption is risky. Lateral movement after a single credential leak is still one of the fastest ways to escalate an incident. A better pattern is zero-trust for service-to-service traffic: private network paths, strong workload identity, and explicit authorization checks on every call.
In this guide, you will build a practical AWS architecture for internal APIs using PrivateLink, mTLS certificates from ACM Private CA, and policy-based access controls. The stack is designed for teams that need strong isolation without exposing APIs to the public internet.
Architecture overview
- Provider account: Hosts the internal API behind a Network Load Balancer (NLB).
- Consumer account(s): Access API over
AWS PrivateLinkinterface endpoints. - Identity: Client and server certificates for mTLS using
ACM PCA. - Authorization: API layer checks SPIFFE-like client identity mapped from cert subject/SAN.
- Observability: OpenTelemetry traces + structured logs + CloudWatch alarms.
Step 1: Define infrastructure with Terraform
Create a reusable module for the provider side: VPC endpoint service + NLB + target group.
Provider Terraform (simplified)
resource "aws_lb" "api_nlb" {
name = "internal-api-nlb"
internal = true
load_balancer_type = "network"
subnets = var.private_subnet_ids
}
resource "aws_lb_target_group" "api_tg" {
name = "internal-api-tg"
port = 8443
protocol = "TLS"
vpc_id = var.vpc_id
health_check {
protocol = "HTTPS"
path = "/health"
matcher = "200"
}
}
resource "aws_lb_listener" "tls" {
load_balancer_arn = aws_lb.api_nlb.arn
port = 443
protocol = "TLS"
certificate_arn = aws_acm_certificate.server.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.api_tg.arn
}
}
resource "aws_vpc_endpoint_service" "api" {
acceptance_required = true
network_load_balancer_arns = [aws_lb.api_nlb.arn]
allowed_principals = [
"arn:aws:iam::123456789012:root",
"arn:aws:iam::210987654321:root"
]
}This creates a private endpoint service that only approved consumer accounts can connect to. Keep acceptance_required=true so new endpoint requests are explicitly approved.
Step 2: Enable mTLS identity
Private connectivity is not enough. You also need strong caller identity. Issue short-lived client certificates from ACM Private CA and enforce mTLS at your API runtime (or sidecar proxy like Envoy).
Node.js API mTLS verification
import https from "node:https";
import fs from "node:fs";
const server = https.createServer(
{
key: fs.readFileSync("/certs/server.key"),
cert: fs.readFileSync("/certs/server.crt"),
ca: fs.readFileSync("/certs/ca.crt"),
requestCert: true,
rejectUnauthorized: true
},
(req, res) => {
const cert = req.socket.getPeerCertificate(true);
if (!req.client.authorized || !cert?.subject?.CN) {
res.writeHead(401);
return res.end(JSON.stringify({ error: "unauthorized" }));
}
const caller = cert.subject.CN; // e.g. svc.billing-api.prod
const allowed = ["svc.orders-api.prod", "svc.reports-api.prod"];
if (!allowed.includes(caller)) {
res.writeHead(403);
return res.end(JSON.stringify({ error: "forbidden", caller }));
}
res.writeHead(200, { "content-type": "application/json" });
res.end(JSON.stringify({ ok: true, caller }));
}
);
server.listen(8443, () => console.log("mTLS API listening on 8443"));For production, avoid hardcoded allow-lists. Load authorization policy from DynamoDB, AWS AppConfig, or OPA/Rego bundles.
Step 3: Consumer account connection via PrivateLink
In each consumer account, create an interface VPC endpoint pointing to the provider service name.
resource "aws_vpc_endpoint" "provider_api" {
vpc_id = var.consumer_vpc_id
service_name = var.provider_service_name
vpc_endpoint_type = "Interface"
subnet_ids = var.consumer_subnet_ids
private_dns_enabled = false
security_group_ids = [aws_security_group.endpoint.id]
}Then publish a private Route 53 record in the consumer VPC, for example orders.internal.company, mapped to the endpoint ENI DNS name.
Step 4: Add policy and runtime guardrails
- Use SCPs to prevent accidental public load balancers for internal API accounts.
- Enforce TLS 1.2+ and minimum key sizes through deployment checks.
- Rotate client certs frequently (7-30 days) and automate revocation workflows.
- Require endpoint acceptance approvals through CI/CD, not manual console clicks.
Simple CI gate (bash)
#!/usr/bin/env bash
set -euo pipefail
terraform plan -out=tfplan
terraform show -json tfplan > plan.json
# Fail if any aws_lb has internal=false in protected workspaces
jq -e '
.resource_changes[]
| select(.type=="aws_lb")
| .change.after.internal == false
' plan.json > /dev/null && {
echo "Blocked: public load balancer detected in internal-api stack"
exit 1
}
echo "Policy checks passed"Step 5: Observe identity, latency, and denied calls
Track security and reliability together. A zero-trust setup is only successful if it is observable.
- Security metrics: unauthorized/forbidden count by caller identity.
- Reliability metrics: p95 latency, handshake failures, endpoint connection errors.
- Change metrics: cert rotations, policy changes, endpoint accept/reject events.
Emit caller identity into logs for fast incident response, but never log private keys or raw certificate material.
Common pitfalls to avoid
- Relying on source IP allow-lists as the primary identity model.
- Skipping cert rotation automation, which turns mTLS into operational debt.
- Sharing one client certificate across multiple services.
- Treating PrivateLink as enough without application-level authorization.
Rollout checklist for week one
- Protect one high-value API first (payments, auth, or billing).
- Issue separate client certs per service and environment.
- Enable denial logs and alert on unknown caller identities.
- Run game-day tests: expired cert, revoked cert, and endpoint denial.
- Measure p95 impact before and after mTLS enforcement.
Conclusion
This pattern gives you defense in depth for internal APIs: private transport with PrivateLink, cryptographic service identity with mTLS, and explicit authorization at request time. You can start with one critical API, validate latency overhead, then roll it out service by service.
If your team is moving toward multi-account AWS at scale, this architecture is a practical middle ground between basic VPC trust and a full service mesh rollout. It is incremental, auditable, and production-friendly.

Leave a Reply