Database Changes in Live Systems.
Learner, Love to make things simple, Full Stack Developer, StackOverflower, Passionate about using machine learning, deep learning and AI
Why schema changes break production and how experienced teams avoid it.
Introduction: Databases remember everything
You can redeploy code in minutes.
You can roll back services.
You can toggle feature flags.
But your database?
It remembers every decision you ever made.
Most production failures related to databases don’t come from performance or capacity.
They come from schema changes that didn’t respect existing data.
If APIs punish careless change, databases are unforgiving.
Why database changes are uniquely dangerous
Database changes affect:
Running code
Historical data
Batch jobs
Reports
External integrations
And they do it all at once.
Unlike code, database changes are often:
Irreversible
Shared across services
Hard to test realistically
This is why experienced engineers treat database evolution as a slow, deliberate process.
The biggest misconception: “It’s just a small migration”
There is no such thing as a small migration in production.
A single ALTER TABLE can:
Lock tables
Break reads
Corrupt assumptions
Expose hidden bugs
If a change touches existing data, it is not small.
Example 1: The dangerous NOT NULL
You add a new column:
ALTER TABLE orders
ADD COLUMN source_type VARCHAR(20) NOT NULL;
Looks reasonable.
But:
Old rows don’t have this value
Inserts fail
Reads fail
Reports fail
The system breaks in places you didn’t expect.
The safer approach (always)
Add the column as nullable
Deploy code that handles
NULLBackfill existing data
Add
NOT NULLconstraint later
This is not overengineering.
This is production hygiene.
Expand → Migrate → Contract (non-negotiable pattern)
This pattern should be burned into memory.
Step 1: Expand
Add new schema elements without breaking existing ones.
ADD COLUMN source_type VARCHAR(20);
Step 2: Migrate
Gradually update:
Old data
Old code paths
Old queries
Step 3: Contract
Remove or restrict only after verification.
ALTER TABLE orders
ALTER COLUMN source_type SET NOT NULL;
Skipping steps is how outages happen.
Example 2: Renaming a column (don’t)
Renaming feels harmless:
ALTER TABLE users
RENAME COLUMN mobile TO phone;
But:
Old code still queries
mobileOld reports fail
Old scripts break silently
Safer alternative
Add new column
phoneKeep both in sync
Update code gradually
Remove
mobilelater
Yes, it’s extra work.
So is recovering from production incidents.
Historical data is the real enemy
Most schema changes fail because of old data, not new code.
Old data:
Has missing fields
Has unexpected values
Was created under different rules
Your new logic must handle:
Nulls
Defaults
Inconsistent formats
Example 3: New validation on old data
You add validation:
email must not be empty
But historical data contains:
email = ""
Now:
Reads fail
Jobs crash
Admin screens break
New rules must coexist with old reality.
Zero-downtime migrations are a discipline
Zero-downtime is not a tool.
It’s a mindset.
It means:
No locks that block traffic
No assumptions about clean data
No “quick fixes” during business hours
Rule:
If a migration cannot be safely paused or rolled back, it’s not ready.
Anti-patterns that hurt the most
Anti-pattern 1: Coupling code deploy with schema change
Deploying code and schema together assumes:
No rollback
No partial failure
Perfect timing
Reality disagrees.
Schema first.
Code second.
Cleanup last.
Anti-pattern 2: Deleting columns too early
Columns are rarely “unused”.
They are just used quietly.
Before deleting:
Log access
Search queries
Check reports
Wait longer than you think
Deletion is easy.
Recovery is not.
Anti-pattern 3: One migration script for everything
Large migrations fail mid-way.
Split them:
One change per script
One responsibility per migration
Clear rollback steps
Small migrations fail less catastrophically.
How experienced teams treat database changes
They:
Treat migrations as production code
Review them carefully
Test them on real-like data
Run them slowly and deliberately
They assume:
Data is messy
Code will be rolled back
Something will go wrong
And they plan accordingly.
A simple checklist before database changes
Before touching the database, ask:
What old data exists?
What code reads this table?
Can this change be additive?
What happens if this runs slowly?
How do we roll back safely?
If rollback is unclear, stop.
Final thought: databases punish arrogance
Databases are honest.
They don’t care about deadlines, intentions, or confidence.
They only care about correctness.
If you respect existing data, databases will serve you well.
If you rush change, they will expose it—usually in production.
More such articles:
https://www.youtube.com/@maheshwarligade


