microservicesdomain-driven-designevent-driven-architecturesystem-modernisation

From Monolith to Microservices: How to Decompose Without Destroying Everything

Microservices promise scalability, team autonomy, and resilience - but they also introduce distributed systems complexity that can cripple teams who aren't ready for it. The real question isn't whether to adopt microservices, it's whether your domain understanding is deep enough to decompose safely. This article explores how domain-driven design, event-driven patterns like sagas and process managers, and disciplined team topology work together to make decomposition a structured programme rather than a guessing game. We're honest about the costs and clear about when a well-structured monolith remains the better answer.

There is a well-worn pattern in enterprise technology: a monolith becomes painful, leadership hears the word "microservices", and a rewrite begins. Eighteen months later, the organisation has a distributed monolith - all the complexity of a distributed system, none of the autonomy benefits - and the original pain is worse.

The problem is rarely the architecture. It is the order of operations. Teams reach for a service boundary before they understand their domain boundaries. This article sets out a more disciplined path.

When Microservices Are the Wrong Answer

Before discussing how to decompose, it is worth being direct about when you should not.

If your team has fewer than 20 engineers, a microservices architecture will almost certainly slow you down. The operational overhead - service discovery, distributed tracing, inter-service authentication, independent deployment pipelines - consumes engineering capacity that a small team cannot afford.

If your domain model is still evolving rapidly, drawing service boundaries too early locks in decisions you will regret. A boundary that makes sense today becomes a seam that resists change tomorrow. A well-structured modular monolith, with clear internal boundaries, is frequently the better answer at this stage.

If your organisation has not yet invested in platform engineering - container orchestration, observability, CI/CD automation - adopting microservices before that foundation exists is building on sand.

The honest position: microservices are an organisational scaling tool as much as a technical one. If your team topology and platform maturity are not ready, the architecture will fight you.

Domain-Driven Design as the Decomposition Lens

For organisations that are ready, domain-driven design (DDD) provides the most reliable framework for drawing service boundaries. The core concept is the bounded context: a boundary within which a particular domain model is consistent and coherent.

Consider an e-commerce platform. "Customer" means something different in the order management context (a shipping address, a payment method) than in the marketing context (a segment, a preference profile) or the support context (a ticket history, an entitlement). These are not the same model. Forcing them into a single shared entity creates coupling that will surface as coordination overhead between teams and fragility in the codebase.

The practical starting point is an Event Storming workshop: a structured session where domain experts and engineers map the business events that flow through the system. What happens, in what order, and who cares? The output is a domain map that reveals natural boundaries, aggregates, and the language each part of the business actually uses.

This is slower than jumping to an architecture diagram. It is also the reason some decompositions succeed and most do not.

Sagas and Process Managers: Coordinating Across Boundaries

Once you have service boundaries, you face a new problem: business processes that span multiple services. A checkout flow might involve inventory, payments, fulfilment, and notifications. In a monolith, a transaction handles consistency. In a distributed system, you have no such guarantee.

This is where sagas come in. A saga is a sequence of local transactions, each publishing an event that triggers the next step. If a step fails, compensating transactions roll back the previous steps.

There are two implementation styles worth understanding:

  1. Choreography: each service listens for events and reacts independently. Simple to start with, but causality becomes hard to trace as the number of services grows. Debugging a failed checkout across six event streams without a clear orchestrator is genuinely painful.
  2. Orchestration (process managers): a dedicated process manager service owns the saga state and issues commands to participants. Harder to build initially, but the business logic lives in one place and is testable in isolation.

The discussion in the practitioner community right now, including active threads comparing sagas and process managers, reflects a real tension: choreography feels elegant in theory but operationally complex at scale. Orchestration feels heavyweight but is far easier to observe, debug, and evolve. For long-running business processes with multiple failure modes, the process manager pattern is usually the right call.

// Simplified process manager state machine (pseudocode)
class CheckoutProcessManager {
  handle(event: OrderPlaced) {
    this.state = 'RESERVING_INVENTORY'
    emit(ReserveInventoryCommand { orderId: event.orderId })
  }

  handle(event: InventoryReserved) {
    this.state = 'CHARGING_PAYMENT'
    emit(ChargePaymentCommand { orderId: event.orderId })
  }

  handle(event: PaymentFailed) {
    this.state = 'COMPENSATING'
    emit(ReleaseInventoryCommand { orderId: event.orderId })
  }

  handle(event: PaymentCharged) {
    this.state = 'FULFILLING'
    emit(CreateFulfilmentCommand { orderId: event.orderId })
  }
}

The state is explicit. Every transition is observable. When something goes wrong at 2am, you know exactly where the process stalled.

Event Sourcing and CQRS: Powerful, But Choose Deliberately

Event sourcing stores the history of state changes as a sequence of events rather than the current state. Your order is not a row in a database; it is a log of things that happened to it: created, item added, payment captured, shipped.

The benefits are real. You get a complete audit trail by default, the ability to replay events to rebuild state or populate new read models, and a natural fit with domain-driven design aggregates. The football match example that has been circulating recently illustrates this well: a match is not a final score, it is a series of events (kick-off, goal, foul, substitution) from which any state can be derived.

The costs are also real. Querying event-sourced systems requires CQRS (Command Query Responsibility Segregation): you separate the write model (events) from the read model (projections optimised for queries). This means maintaining projection code, handling eventual consistency in your UI, and dealing with schema evolution as events change over time.

Use event sourcing where audit, temporal queries, or event replay provide clear business value. Do not apply it universally simply because you are adopting an event-driven architecture. A payment or compliance domain benefits enormously. A product catalogue probably does not.

Team Topology and Service Ownership

Service boundaries should reflect team boundaries, not the other way around. This is Conway's Law in practice: the architecture you produce will mirror your communication structure. If you draw service boundaries without aligning them to how your teams are organised and how they communicate, you will spend more time in cross-team coordination than you save in independent deployability.

A practical approach:

  1. Map your domains and bounded contexts first (Event Storming, domain mapping).
  2. Design team structures around those domains - stream-aligned teams that own a bounded context end to end.
  3. Then draw service boundaries to match team ownership.
  4. Platform engineering (the "platform team" in Team Topologies terminology) provides the container orchestration, observability, and deployment tooling that stream-aligned teams consume without having to build themselves.

This sequence matters. Teams that own a domain context can make autonomous decisions about their service internals. Teams that are handed a service boundary they did not help define will fight it.

Service Mesh and Operational Readiness

As the number of services grows, cross-cutting concerns - mutual TLS between services, traffic management, retries, circuit breaking, distributed tracing - become unmanageable if each team handles them individually.

A service mesh (Istio and Linkerd are the most common choices) moves these concerns to the infrastructure layer. Each service gets a sidecar proxy that handles communication, leaving application code to focus on business logic.

This is powerful, but it is also another layer of complexity that requires platform engineering expertise to operate. Introducing a service mesh before your teams are comfortable with container orchestration and observability is a common mistake. Sequence the investment: get containerisation and CI/CD stable first, add observability second, introduce a service mesh when inter-service communication complexity justifies it.

A Structured Decomposition Programme

Pulling this together, a reliable decomposition looks like this:

  1. Domain mapping: run Event Storming workshops with domain experts to identify bounded contexts, aggregates, and domain events. Produce a context map showing relationships and integration patterns.
  2. Identify the seam: find the bounded context with the clearest boundary, the most independent team, and the highest business value from autonomy. Decompose this first. Do not attempt a big-bang rewrite.
  3. Strangle the monolith: use the strangler fig pattern - route traffic for the new bounded context to the new service while the monolith continues to handle everything else. Incrementally migrate, validate, and retire monolith code.
  4. Define your event contracts: establish how services communicate - synchronous REST or gRPC for queries, asynchronous events via a broker (Kafka, RabbitMQ) for state changes. Make event schemas explicit and versioned from the start.
  5. Build observability before you need it: distributed tracing (OpenTelemetry), structured logging, and service-level dashboards must be in place before you decompose. You cannot debug a distributed system without them.
  6. Align team ownership: ensure each extracted service has a clear owning team with the autonomy to deploy independently.

How modernise.io Can Help

modernise.io works with enterprise and scale-up organisations through the full decomposition journey, starting where it has to start: domain understanding. We run structured Event Storming and context-mapping workshops that produce a domain map your architects, engineers, and product teams can all work from. From there, we design team topologies aligned to bounded contexts, define event contracts and saga patterns for cross-domain processes, and establish the platform engineering foundations - containerisation, service mesh, observability - that make independent deployment practical rather than theoretical. We implement CQRS and event sourcing where the domain justifies it, and we are direct when it does not. If your organisation is feeling the pain of a growing monolith but is not sure whether microservices are the right next step, or how to start without a costly false start, we can help you make that assessment and design a programme that matches your domain maturity, team structure, and risk appetite.