SaaS Rewrite vs Refactor: How to Decide What to Do With a Broken Codebase
When should you rewrite your SaaS from scratch vs refactor what you have? A practical framework for founders facing a broken or slow codebase.
The rewrite conversation usually starts the same way. A founder comes to me and says something like "our codebase is a mess, we need to start over." Their developer has been saying "this is not scalable" for six months. Every new feature takes twice as long as it should. The team is demoralized.
And sometimes the rewrite is the right call. But most of the time it is not, and the founders who go that route without a clear framework end up six months later with the same product, half the runway, and a new codebase that is different but not better.
Here is how I think about this decision.
Why Rewrites Are So Appealing (and Often Wrong)
The appeal of a rewrite is obvious. You get to start clean. The mistakes of the past stay in the past. You pick a better stack, design better abstractions, and build the thing you wish you had built the first time.
The problem is that a rewrite does not just rebuild the code. It also rebuilds the bugs, the edge cases, the business logic embedded in ugly-looking code, the workarounds that were added for a specific customer three years ago. Most of that institutional knowledge lives in the existing codebase, not in documentation, not in anyone's memory. When you rewrite, you unknowingly throw it away and then spend the next year discovering what you lost.
Joel Spolsky wrote about this in 2000 and called it one of the worst mistakes a software company can make. The core argument holds: the old code is ugly, but it is also full of hard-won fixes you do not know about yet.
None of this means never rewrite. It means the bar should be higher than "the code is messy."
The Real Questions to Ask
Before deciding between a rewrite and a refactor, you need to answer these honestly.
What is actually slow or broken?
"The codebase is a mess" is not a technical diagnosis. What specifically cannot be done? New features take too long? Performance is bad in specific places? The deployment process is unreliable? The database schema makes certain queries impossible?
The specificity matters. A codebase can be ugly and still be entirely functional for the next two years. If you cannot point to concrete blockers, the problem might not be the code. It might be that the team is tired and the code is getting blamed.
What is the risk of the current system?
There is a difference between "painful to work in" and "one incident away from losing customer data." A codebase that is messy but stable is a very different situation from one that has fundamental security flaws, no backups, or architecture that makes it genuinely unsafe to run in production.
Risk-level issues sometimes do justify a rewrite or significant architectural replacement. Messiness alone does not.
How much of the architecture is the problem?
Sometimes the problem is not the code, it is the data model or the system design. If the database schema was designed by someone who did not understand the domain, you might not be able to fix it incrementally. Some architectural decisions are so deeply embedded that refactoring around them costs more than replacing them.
But architecture problems are usually narrow. It is rarely the entire codebase that needs to change. More often it is one area, one service, one table.
What is the cost of being down or slow for 6-12 months?
A full rewrite typically takes longer than anyone expects and delivers a product that is behind the original for a long time. If you are in a competitive market, those are months your competitors are shipping while you are rebuilding. That is a real strategic cost.
A Decision Framework
Here is how I think about the actual decision:
| Situation | Recommendation |
|---|---|
| Code is messy but ship pace is reasonable | Do not rewrite, improve incrementally |
| Specific features are slow or broken | Targeted refactor of those areas |
| Database schema is wrong for the domain | Migrate the data model, keep the rest |
| Security vulnerabilities in core auth/data handling | Fix those systems specifically |
| Team cannot explain how the system works | Rewrite is premature, document first |
| Stack is fundamentally outdated (e.g., Rails 4, no longer maintained) | Incremental upgrade or targeted rewrite of affected modules |
| System has fundamental design flaws (e.g., sync architecture for an async problem) | Architectural replacement, not necessarily a full rewrite |
| Codebase is under 18 months old | Almost never rewrite |
The pattern here is that most situations that feel like they need a full rewrite actually call for something more targeted. Full rewrites are justified when the existing system is fundamentally incapable of doing what the business needs, not when the code is hard to read.
When a Rewrite Is Actually the Right Answer
There are real situations where starting over makes sense.
When the tech debt is load-bearing. If every attempt to fix a problem in area A creates a new problem in area B, and you have traced this back to a structural decision that touches the entire system, you may genuinely be at a point where incremental improvement is not possible.
When the business domain has fundamentally changed. Some companies build their MVP for one use case and then discover their actual customers have completely different needs. The original codebase was built around assumptions that are now wrong. Refactoring toward a different domain model is sometimes harder than rebuilding.
When the codebase is abandoned. If the original developer is gone, the code is not documented, no one on the current team understands it, and it is failing in production regularly, starting over can be faster than reverse-engineering what you have.
When you are switching platforms. Moving from a monolith to microservices, from a server-rendered app to an API-first architecture, or from one database engine to a fundamentally different one sometimes requires something that looks more like a rewrite than a refactor.
Note what is not on this list: "the code is hard to read," "it uses an older version of the framework," "a new developer said they are not comfortable with it."
The Strangler Fig Pattern (What Usually Works Better)
The most effective way to replace a broken system is to do it incrementally, while the existing system keeps running.
The strangler fig pattern works like this: you build the new version alongside the old one, gradually routing traffic to the new implementation as each piece is finished. The old system does not get turned off until the new one has fully replaced it.
This approach is slower and less satisfying than a clean break. But it has a much higher success rate. You keep the product running for customers throughout the process. You discover edge cases in the old code before you throw it away. You can stop and change course at any point rather than committing to a 6-month black box.
Most successful "rewrites" in practice are really strangler fig migrations. The team that did it often calls it a rewrite because the end state looks different. But they did not actually stop shipping while they built it.
What to Do Right Now
If you are facing this decision, here is the sequence I would follow:
-
Get a codebase audit first. Before deciding anything, have someone independent review the code with fresh eyes. The people closest to it have the most distorted view of how bad it actually is. An outside perspective usually reframes the problem.
-
Define the specific blockers. Write down the three things the current system cannot do that you need it to do. If you cannot write a specific list, you do not have enough information to make the decision yet.
-
Estimate both paths. How long would it take to fix the specific blockers through targeted refactoring? How long would a full rewrite take, and who would do it? Compare those honestly, including the opportunity cost of not shipping during the rewrite period.
-
Consider the strangler fig. Almost always there is a way to start improving the highest-pain area without committing to rebuilding everything.
-
Do not make this decision when morale is low. Rewrite decisions made in the middle of a rough patch are often really about team frustration, not technical necessity. Give it a month.
The founders I have seen build durable products have all learned the same lesson at some point: shipping incrementally, even in an imperfect codebase, beats pausing to build the ideal one. Clean architecture does not help your customers. Working software does.
If you are not sure whether your situation calls for a rewrite or a targeted fix, a technical audit usually clarifies it fast.