How to Destroy a Monolith?

How do you decommission a 500,000-line monolith with zero downtime? It starts with migrating the data, not refactoring the code.

#1about 2 minutes

The challenge of a high-traffic monolithic system

The system handles user flows for major brands with one million daily sessions, managing user data and provisioning it to partners.

#2about 3 minutes

Why the monolith was difficult to maintain

The application's 500,000 lines of code, heavily customized frameworks, and data center limitations made even small changes disproportionately difficult.

#3about 2 minutes

Why the code-first refactoring approach failed

An initial attempt at incremental refactoring and service extraction failed due to tight coupling with the legacy code and a complex data model.

#4about 3 minutes

Adopting a data-first migration strategy

The successful strategy involved migrating data to a new location first, which enabled rebuilding components and drastically simplifying the data model.

#5about 3 minutes

Implementing event-driven data synchronization with AWS

An event-driven pattern using AWS SQS and Lambda was created to sync every data change from the old system to the new database in parallel.

#6about 1 minute

Replacing user flows with a SaaS solution

Instead of rebuilding authentication and authorization, a software as a service (SaaS) solution like Ory was used to leverage expert knowledge and accelerate development.

#7about 1 minute

Building and securing the new microservices architecture

Splitting the monolith into microservices allowed for mature, granular control over security, including authentication at the network, path, and field level.

#8about 2 minutes

Executing a zero-downtime switchover process

A phased rollout and parallel systems enabled a switch to the new registration and login flows with zero downtime, changing the single source of truth seamlessly.

#9about 3 minutes

Adopting a modern tech stack for faster development

Using a cloud-native framework like Quarkus and focusing on writing "glue code" between services drastically reduced implementation time and improved maintainability.

#10about 1 minute

Testing against production with continuous deployment

The team tests new components against production services in the pipeline, relying on extensive monitoring, alerting, and a fast, continuous deployment process.

#11about 2 minutes

Balancing consistency with rebuilding over refactoring

Maintaining a consistent tech stack is a trade-off with innovation, and for complex legacy systems, rebuilding components is often more effective than refactoring.