If your Java services still rely on dual writes to a database and Kafka, you are one deployment away from inconsistent data and painful incident calls. In this guide, you will build a production-ready event-driven order service using Spring Boot 3.4, Apache Kafka, PostgreSQL, and the outbox pattern so events are durable, replayable, and safe under retries. By the end, you will have an architecture that scales throughput without sacrificing correctness, plus practical code you can adapt to your own microservices.
Why the Outbox Pattern Still Matters in 2026
Event-driven systems promise loose coupling, but many teams still publish events directly inside request handlers. That works until one write succeeds and the other fails. The outbox pattern solves this by storing domain changes and event records in the same database transaction, then publishing outbox rows asynchronously.
It gives you:
- Atomicity, the business row and event row commit together.
- Retry safety, failed publishes can be retried without losing data.
- Auditability, every emitted event has a durable source of truth.
Target Architecture
Request Path
- API receives
POST /orders. - Service writes
ordersandoutbox_eventsin one transaction. - API returns quickly with order ID.
Async Publisher Path
- Background worker polls unprocessed outbox rows.
- Rows are published to Kafka with idempotent producer settings.
- Rows are marked processed only after broker acknowledgment.
This is similar in reliability spirit to robust queue pipelines you may have seen in Node.js background jobs and PHP notification pipelines, but here we optimize for Java + Kafka semantics.
Database Design for Reliable Event Publishing
Use a dedicated outbox table. Keep payload format stable (JSON schema or versioned contract), and include enough metadata for replay and debugging.
CREATE TABLE orders (
id UUID PRIMARY KEY,
customer_id UUID NOT NULL,
amount_cents BIGINT NOT NULL,
status VARCHAR(32) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE outbox_events (
id UUID PRIMARY KEY,
aggregate_type VARCHAR(64) NOT NULL,
aggregate_id UUID NOT NULL,
event_type VARCHAR(128) NOT NULL,
payload JSONB NOT NULL,
headers JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
processed_at TIMESTAMPTZ,
retry_count INT NOT NULL DEFAULT 0
);
CREATE INDEX idx_outbox_unprocessed
ON outbox_events (created_at)
WHERE processed_at IS NULL;Key practice, avoid deleting outbox rows too early. Keep at least a retention window so forensic analysis remains possible.
Spring Boot Implementation
Transactional Write (Order + Outbox)
The service method should be the single source of truth for both writes. If one fails, both rollback.
@Service
@RequiredArgsConstructor
public class OrderService {
private final OrderRepository orderRepository;
private final OutboxRepository outboxRepository;
@Transactional
public UUID createOrder(CreateOrderRequest req) {
UUID orderId = UUID.randomUUID();
Order order = new Order(orderId, req.customerId(), req.amountCents(), "CREATED");
orderRepository.save(order);
OutboxEvent event = OutboxEvent.builder()
.id(UUID.randomUUID())
.aggregateType("Order")
.aggregateId(orderId)
.eventType("OrderCreated.v1")
.payload(Map.of(
"orderId", orderId,
"customerId", req.customerId(),
"amountCents", req.amountCents(),
"status", "CREATED"))
.headers(Map.of("traceId", MDC.get("traceId")))
.build();
outboxRepository.save(event);
return orderId;
}
}Publisher Worker with Idempotent Kafka Producer
Use a scheduled worker (or Spring Batch) that picks rows with processed_at IS NULL using lock-aware pagination. Publish and mark processed in a small transaction boundary.
@Component
@RequiredArgsConstructor
public class OutboxPublisher {
private final OutboxRepository outboxRepository;
private final KafkaTemplate<String, String> kafkaTemplate;
private final ObjectMapper mapper;
@Scheduled(fixedDelayString = "${outbox.poll-ms:500}")
public void publishBatch() {
List<OutboxEvent> batch = outboxRepository.findUnprocessedForUpdate(100);
for (OutboxEvent e : batch) {
try {
String key = e.getAggregateId().toString();
String value = mapper.writeValueAsString(e.getPayload());
kafkaTemplate.send("orders.events", key, value).get();
outboxRepository.markProcessed(e.getId());
} catch (Exception ex) {
outboxRepository.incrementRetry(e.getId());
}
}
}
}Set Kafka producer properties for reliability:
acks=allenable.idempotence=truemax.in.flight.requests.per.connection=5- Retry backoff tuned for broker latency profile
Operational Hardening Checklist
1) Backpressure and Alerting
Monitor outbox lag, unprocessed row count, and publish error rate. Alert on sustained lag growth. This pairs well with lessons from Linux eBPF monitoring.
2) Schema Evolution
Version event types, for example OrderCreated.v1, OrderCreated.v2. Keep consumers backward compatible during migration windows.
3) Security and Integrity
Use topic ACLs, encrypted transport, and signed build provenance in CI. If your pipeline runs on GitHub, apply hardened OIDC patterns from secure GitHub Actions workflows and follow token safety principles from modern token replay defense.
4) Disaster Recovery
Because events are persisted in PostgreSQL first, you can replay from outbox after broker incidents. Add a replay CLI with date-range and aggregate filters to support controlled recovery.
Common Pitfalls to Avoid
- Publishing in the same transaction as DB write, this recreates dual-write risk.
- Marking rows processed before broker acknowledgment.
- No deduplication strategy on consumers.
- Lack of partition-key consistency, causing out-of-order processing.
FAQ
Is CDC with Debezium better than a polling outbox worker?
Both are valid. Polling is simpler to start and often enough for moderate throughput. CDC improves latency and scales better when event volume becomes very high, but it adds infrastructure and operational complexity.
How do I prevent duplicate event handling?
Make consumers idempotent. Store a processed-event ID or business key and ignore repeats. Even with Kafka idempotence, end-to-end exactly-once semantics usually require consumer-side safeguards.
Can I use this pattern without Kafka?
Yes. The outbox pattern is transport-agnostic. You can publish to RabbitMQ, NATS, or cloud event buses. The critical guarantee is still atomic DB write plus deferred publish.
What is a good first SLO for this system?
Start with 99% of outbox events published within 30 seconds and a hard error budget tied to failed publishes. This gives teams a measurable reliability target while scaling traffic.
When you implement this Java outbox pattern carefully, you get the best of both worlds, strong consistency at write time and scalable asynchronous event processing. That is the foundation for dependable microservices in 2026.

Leave a Reply