If you've been building with Event Sourcing, you've likely run into a wall: what happens when a business operation spans multiple aggregates or services? A single @Transactional annotation won't save you anymore. The database boundary is gone, and with it, the safety net of ACID transactions.
The instinctive answer is two-phase commit (2PC): a distributed protocol that coordinates a commit across multiple participants. It works on paper. In practice, it introduces tight coupling, blocks resources during the coordination phase, and turns a single slow node into a system-wide bottleneck. Most modern distributed systems reject it outright.
The alternative is the Saga pattern: a sequence of local transactions, each one publishing an event or message that triggers the next step. If any step fails, a series of compensating transactions rolls back the work already done. No global lock. No blocking coordinator. No two-phase commit.
This article builds a concrete Saga implementation in Spring Boot, covering both coordination strategies and the compensating transaction mechanism that makes eventual consistency safe.
The problem: distributed transactions across service boundaries
Consider a simple e-commerce checkout flow:
- Reserve inventory for the ordered items
- Charge the customer's payment method
- Create the shipment record
- Confirm the order
In a monolith with a single database, this is one transaction. In a microservices architecture (or even a well-modularized monolith with separate aggregates), each step touches a different service or bounded context. If step 3 fails after step 2 has already charged the customer, you have an inconsistency. The charge happened, the shipment didn't.
This is where Sagas come in.
Choreography vs Orchestration: two coordination strategies
Before writing any code, you need to choose how the saga will be coordinated. The choice has significant architectural implications.
Choreography
In choreography, there is no central coordinator. Each service listens for events and decides what to do next. When InventoryReserved is published, the payment service picks it up and charges the customer. When PaymentCharged is published, the shipment service creates the record. And so on.
InventoryService → [InventoryReserved] → PaymentService
PaymentService → [PaymentCharged] → ShipmentService
ShipmentService → [ShipmentCreated] → OrderService
This is simple to implement and naturally decoupled. The downside is that the saga logic is distributed across services: understanding the full flow requires reading multiple codebases. Debugging failures is hard. Adding a new step means touching multiple services.
Orchestration
In orchestration, a central Saga Orchestrator drives the flow. It sends commands to each service, waits for replies, and decides what to do next, including triggering compensating transactions on failure.
OrderSaga (Orchestrator)
→ command: ReserveInventory → InventoryService
← reply: InventoryReserved
→ command: ChargePayment → PaymentService
← reply: PaymentCharged
→ command: CreateShipment → ShipmentService
← reply: ShipmentCreated (or ShipmentFailed)
→ command: ReleaseInventory ← compensate on failure
The saga logic lives in one place. Failures are handled explicitly. The flow is readable and testable in isolation. This is the approach we'll implement.
The project structure
src/
├── saga/
│ ├── OrderSaga.java # Orchestrator: drives the flow
│ ├── OrderSagaState.java # Persistent state of each saga instance
│ └── OrderSagaRepository.java
├── command/
│ ├── ReserveInventoryCommand.java
│ ├── ChargePaymentCommand.java
│ ├── CreateShipmentCommand.java
│ └── compensating/
│ ├── ReleaseInventoryCommand.java
│ └── RefundPaymentCommand.java
├── reply/
│ ├── InventoryReserved.java
│ ├── InventoryReservationFailed.java
│ ├── PaymentCharged.java
│ ├── PaymentFailed.java
│ ├── ShipmentCreated.java
│ └── ShipmentFailed.java
└── service/
├── InventoryService.java
├── PaymentService.java
└── ShipmentService.java
Step 1: Model the saga state
A saga instance is a long-running process. Its state must be persisted: if the application restarts mid-saga, you need to resume where you left off.
public enum SagaStep {
STARTED,
INVENTORY_RESERVED,
PAYMENT_CHARGED,
SHIPMENT_CREATED,
COMPLETED,
COMPENSATING,
COMPENSATED,
FAILED
}
@Entity
@Table(name = "order_sagas")
public class OrderSagaState {
@Id
private String sagaId;
private String orderId;
@Enumerated(EnumType.STRING)
private SagaStep currentStep;
private boolean inventoryReserved;
private boolean paymentCharged;
private BigDecimal amount;
private String customerId;
private String address;
private String failureReason;
private Instant startedAt;
private Instant lastUpdatedAt;
// getters, setters, constructors
}
The important thing here is tracking which compensating actions are needed. inventoryReserved and paymentCharged are flags the orchestrator checks when deciding which compensations to trigger.
Step 2: Define commands and replies
Commands are instructions sent to a participant service. Replies are the outcomes the orchestrator receives.
// Commands
public record ReserveInventoryCommand(String sagaId, String orderId, List<OrderItem> items) {}
public record ChargePaymentCommand(String sagaId, String orderId, BigDecimal amount, String customerId) {}
public record CreateShipmentCommand(String sagaId, String orderId, String address) {}
// Compensating commands
public record ReleaseInventoryCommand(String sagaId, String orderId) {}
public record RefundPaymentCommand(String sagaId, String orderId, BigDecimal amount) {}
// Replies
public sealed interface SagaReply permits
InventoryReserved, InventoryReservationFailed,
PaymentCharged, PaymentFailed,
ShipmentCreated, ShipmentFailed {}
public record InventoryReserved(String sagaId) implements SagaReply {}
public record InventoryReservationFailed(String sagaId, String reason) implements SagaReply {}
public record PaymentCharged(String sagaId) implements SagaReply {}
public record PaymentFailed(String sagaId, String reason) implements SagaReply {}
public record ShipmentCreated(String sagaId) implements SagaReply {}
public record ShipmentFailed(String sagaId, String reason) implements SagaReply {}
Again, sealed interface gives us exhaustive pattern matching when processing replies; the compiler will tell us if we've forgotten a case.
Step 3: Build the orchestrator
The OrderSaga is the heart of this pattern. It drives the forward flow and, when something fails, initiates compensation in reverse order.
@Service
@Transactional
public class OrderSaga {
private final OrderSagaRepository repository;
private final CommandBus commandBus;
public OrderSaga(OrderSagaRepository repository, CommandBus commandBus) {
this.repository = repository;
this.commandBus = commandBus;
}
// ---- Start ----
public String start(String orderId, List<OrderItem> items, BigDecimal amount, String customerId, String address) {
var sagaId = UUID.randomUUID().toString();
var state = new OrderSagaState(sagaId, orderId, SagaStep.STARTED);
repository.save(state);
commandBus.send(new ReserveInventoryCommand(sagaId, orderId, items));
return sagaId;
}
// ---- Forward flow ----
public void onInventoryReserved(InventoryReserved reply) {
var state = load(reply.sagaId());
state.setInventoryReserved(true);
state.setCurrentStep(SagaStep.INVENTORY_RESERVED);
commandBus.send(new ChargePaymentCommand(
state.getSagaId(),
state.getOrderId(),
state.getAmount(),
state.getCustomerId()
));
}
public void onPaymentCharged(PaymentCharged reply) {
var state = load(reply.sagaId());
state.setPaymentCharged(true);
state.setCurrentStep(SagaStep.PAYMENT_CHARGED);
commandBus.send(new CreateShipmentCommand(
state.getSagaId(),
state.getOrderId(),
state.getAddress()
));
}
public void onShipmentCreated(ShipmentCreated reply) {
var state = load(reply.sagaId());
state.setCurrentStep(SagaStep.COMPLETED);
// Publish OrderCompleted event, notify downstream systems, etc.
}
// ---- Compensation ----
public void onInventoryReservationFailed(InventoryReservationFailed reply) {
var state = load(reply.sagaId());
state.setCurrentStep(SagaStep.FAILED);
state.setFailureReason(reply.reason());
// Nothing to compensate: inventory was never reserved
}
public void onPaymentFailed(PaymentFailed reply) {
var state = load(reply.sagaId());
state.setCurrentStep(SagaStep.COMPENSATING);
state.setFailureReason(reply.reason());
// Payment failed release the inventory that was already reserved
if (state.isInventoryReserved()) {
commandBus.send(new ReleaseInventoryCommand(state.getSagaId(), state.getOrderId()));
}
}
public void onShipmentFailed(ShipmentFailed reply) {
var state = load(reply.sagaId());
state.setCurrentStep(SagaStep.COMPENSATING);
state.setFailureReason(reply.reason());
// Shipment failed refund payment and release inventory, in reverse order
if (state.isPaymentCharged()) {
commandBus.send(new RefundPaymentCommand(state.getSagaId(), state.getOrderId(), state.getAmount()));
}
if (state.isInventoryReserved()) {
commandBus.send(new ReleaseInventoryCommand(state.getSagaId(), state.getOrderId()));
}
}
private OrderSagaState load(String sagaId) {
return repository.findById(sagaId)
.orElseThrow(() -> new IllegalStateException("Saga not found: " + sagaId));
}
}
Notice the compensation logic: it's explicit and ordered. When ShipmentFailed, we compensate in reverse: refund first, then release inventory. The flags isPaymentCharged() and isInventoryReserved() prevent us from compensating actions that never actually happened.
Step 4: Implement participant services
Each participant service handles a command, performs its local transaction, and sends a reply. The important thing is that each local transaction is atomic: if it succeeds, the reply is sent; if it fails, the reply is a failure message.
@Service
@Transactional
public class InventoryService {
private final InventoryRepository inventory;
private final ReplyBus replyBus;
public void handle(ReserveInventoryCommand command) {
try {
var items = command.items();
items.forEach(item -> {
var stock = inventory.findByProductId(item.productId())
.orElseThrow(() -> new InsufficientStockException(item.productId()));
if (stock.getQuantity() < item.quantity()) {
throw new InsufficientStockException(item.productId());
}
stock.reserve(item.quantity());
});
replyBus.send(new InventoryReserved(command.sagaId()));
} catch (InsufficientStockException e) {
replyBus.send(new InventoryReservationFailed(command.sagaId(), e.getMessage()));
}
}
public void handle(ReleaseInventoryCommand command) {
inventory.releaseReservationFor(command.orderId());
// No reply needed for compensating commands or send a simple acknowledgment
}
}
@Service
@Transactional
public class PaymentService {
private final PaymentGateway gateway;
private final ReplyBus replyBus;
public void handle(ChargePaymentCommand command) {
try {
gateway.charge(command.customerId(), command.amount(), command.orderId());
replyBus.send(new PaymentCharged(command.sagaId()));
} catch (PaymentException e) {
replyBus.send(new PaymentFailed(command.sagaId(), e.getMessage()));
}
}
public void handle(RefundPaymentCommand command) {
gateway.refund(command.orderId(), command.amount());
}
}
Step 5: Wire the reply handler
The orchestrator needs to receive replies. In a simple in-process implementation, this is a direct method call. In a distributed system, replies come through a message broker (Kafka, RabbitMQ). Either way, the routing logic is the same:
@Service
public class SagaReplyHandler {
private final OrderSaga saga;
public SagaReplyHandler(OrderSaga saga) {
this.saga = saga;
}
public void handle(SagaReply reply) {
switch (reply) {
case InventoryReserved r -> saga.onInventoryReserved(r);
case InventoryReservationFailed r -> saga.onInventoryReservationFailed(r);
case PaymentCharged r -> saga.onPaymentCharged(r);
case PaymentFailed r -> saga.onPaymentFailed(r);
case ShipmentCreated r -> saga.onShipmentCreated(r);
case ShipmentFailed r -> saga.onShipmentFailed(r);
}
}
}
If you're using Spring Kafka, this becomes a @KafkaListener. If you're using RabbitMQ, a @RabbitListener. The saga logic itself doesn't change, only the transport layer.
Compensating transactions: the semantics matter
A compensating transaction is not a rollback. This distinction is critical.
A database rollback undoes work as if it never happened. A compensating transaction acknowledges that the work did happen and explicitly reverses its business effect. If a payment was charged, a compensating transaction is a refund: a new financial event, not an erasure of the original one.
This means compensating transactions must be:
Idempotent: the same compensation may be triggered more than once if there's a network failure or a retry. Your refund endpoint must be safe to call multiple times with the same orderId.
Semantically meaningful: they produce real business events. An InventoryReleased event may need to trigger downstream notifications (e.g., "back in stock" alerts). Design compensations as first-class domain operations.
Monitored: if a compensating transaction itself fails, you have a serious problem. Track saga state in the database and build tooling to detect and alert on stuck sagas.
Choreography vs Orchestration: when to use which
Both strategies are valid. The choice depends on your team and your system.
Choose choreography when your flows are simple and unlikely to change, your services are already event-driven, and you want to keep each service fully autonomous. The lack of a central coordinator is a feature, not a bug: each service can evolve independently.
Choose orchestration when the flow is complex, involves many steps, or has non-trivial compensation logic. The orchestrator gives you a single place to read the full saga definition, add monitoring, handle retries, and detect stuck instances. For anything beyond three or four steps, orchestration pays for itself.
In practice, many systems end up with both: choreography for simple, stable flows and orchestration for complex, business-critical ones.
What you gain and what you give up
You gain:
- No distributed locks: each step is a local transaction. Services don't block each other.
- Failure isolation: a slow or failed service doesn't freeze the whole system.
- Explicitness: every possible outcome (including failures) is modeled as a first-class case.
- Auditability: saga state is persisted. You can query exactly where any given order is in its flow.
You give up:
- Immediate consistency: the system is eventually consistent. Between steps, the data is in an intermediate state. Design your read models and UI accordingly.
- Simplicity: this is significantly more complex than a
@Transactionalservice method. Only adopt it where the complexity is justified. - Isolation: unlike a database transaction, other processes can see intermediate states (inventory reserved but not yet charged). This sometimes requires careful UX design.
The Saga pattern is not the right default for every operation. Use it when you genuinely need to coordinate across service boundaries and the business impact of inconsistency is non-trivial.
Going further: production-ready Saga implementations
This implementation is deliberately minimal. In production, you'd want to consider:
- Idempotency keys: ensure that replaying a command (due to retries) doesn't execute the same operation twice. Store processed command IDs and check before executing.
- Saga timeouts: a saga that doesn't complete within a reasonable window is likely stuck. Add a scheduled job that detects and compensates timed-out sagas.
- Dead letter queues: if a compensating transaction itself fails, the message should land in a DLQ for manual intervention. Don't silently discard compensation failures.
- Axon Framework: if you want a production-grade, opinionated Saga infrastructure for the JVM, Axon handles orchestration, event routing, compensation, and deadlines out of the box.
The code in this article is the foundation. The hard part, as always, starts when you hit production.
Additional Resources
- Saga Pattern: Chris Richardson's definitive reference on the pattern
- Choreography vs Orchestration: Temporal's in-depth comparison of both approaches
- Compensating Transactions: Microsoft's architecture reference on the semantics of compensation
- Axon Framework: mature JVM framework for Sagas, Event Sourcing and CQRS
- Implementing Event Sourcing in Spring Boot: the previous article in this series
Happy Coding!