Skip to main content

Command Palette

Search for a command to run...

Database Changes in Live Systems.

Updated
M

Learner, Love to make things simple, Full Stack Developer, StackOverflower, Passionate about using machine learning, deep learning and AI

Why schema changes break production and how experienced teams avoid it.

A diagram of a server and a server

Introduction: Databases remember everything

You can redeploy code in minutes.
You can roll back services.
You can toggle feature flags.

But your database?

It remembers every decision you ever made.

Most production failures related to databases don’t come from performance or capacity.
They come from schema changes that didn’t respect existing data.

If APIs punish careless change, databases are unforgiving.


Why database changes are uniquely dangerous

Database changes affect:

  • Running code

  • Historical data

  • Batch jobs

  • Reports

  • External integrations

And they do it all at once.

Unlike code, database changes are often:

  • Irreversible

  • Shared across services

  • Hard to test realistically

This is why experienced engineers treat database evolution as a slow, deliberate process.


The biggest misconception: “It’s just a small migration”

There is no such thing as a small migration in production.

A single ALTER TABLE can:

  • Lock tables

  • Break reads

  • Corrupt assumptions

  • Expose hidden bugs

If a change touches existing data, it is not small.


Example 1: The dangerous NOT NULL

You add a new column:

ALTER TABLE orders
ADD COLUMN source_type VARCHAR(20) NOT NULL;

Looks reasonable.

But:

  • Old rows don’t have this value

  • Inserts fail

  • Reads fail

  • Reports fail

The system breaks in places you didn’t expect.

The safer approach (always)

  1. Add the column as nullable

  2. Deploy code that handles NULL

  3. Backfill existing data

  4. Add NOT NULL constraint later

This is not overengineering.
This is production hygiene.


Expand → Migrate → Contract (non-negotiable pattern)

This pattern should be burned into memory.

Step 1: Expand

Add new schema elements without breaking existing ones.

ADD COLUMN source_type VARCHAR(20);

Step 2: Migrate

Gradually update:

  • Old data

  • Old code paths

  • Old queries

Step 3: Contract

Remove or restrict only after verification.

ALTER TABLE orders
ALTER COLUMN source_type SET NOT NULL;

Skipping steps is how outages happen.


Example 2: Renaming a column (don’t)

Renaming feels harmless:

ALTER TABLE users
RENAME COLUMN mobile TO phone;

But:

  • Old code still queries mobile

  • Old reports fail

  • Old scripts break silently

Safer alternative

  1. Add new column phone

  2. Keep both in sync

  3. Update code gradually

  4. Remove mobile later

Yes, it’s extra work.
So is recovering from production incidents.


Historical data is the real enemy

Most schema changes fail because of old data, not new code.

Old data:

  • Has missing fields

  • Has unexpected values

  • Was created under different rules

Your new logic must handle:

  • Nulls

  • Defaults

  • Inconsistent formats

Example 3: New validation on old data

You add validation:

email must not be empty

But historical data contains:

email = ""

Now:

  • Reads fail

  • Jobs crash

  • Admin screens break

New rules must coexist with old reality.


Zero-downtime migrations are a discipline

Zero-downtime is not a tool.
It’s a mindset.

It means:

  • No locks that block traffic

  • No assumptions about clean data

  • No “quick fixes” during business hours

Rule:
If a migration cannot be safely paused or rolled back, it’s not ready.


Anti-patterns that hurt the most

Anti-pattern 1: Coupling code deploy with schema change

Deploying code and schema together assumes:

  • No rollback

  • No partial failure

  • Perfect timing

Reality disagrees.

Schema first.
Code second.
Cleanup last.


Anti-pattern 2: Deleting columns too early

Columns are rarely “unused”.
They are just used quietly.

Before deleting:

  • Log access

  • Search queries

  • Check reports

  • Wait longer than you think

Deletion is easy.
Recovery is not.


Anti-pattern 3: One migration script for everything

Large migrations fail mid-way.

Split them:

  • One change per script

  • One responsibility per migration

  • Clear rollback steps

Small migrations fail less catastrophically.


How experienced teams treat database changes

They:

  • Treat migrations as production code

  • Review them carefully

  • Test them on real-like data

  • Run them slowly and deliberately

They assume:

  • Data is messy

  • Code will be rolled back

  • Something will go wrong

And they plan accordingly.


A simple checklist before database changes

Before touching the database, ask:

  1. What old data exists?

  2. What code reads this table?

  3. Can this change be additive?

  4. What happens if this runs slowly?

  5. How do we roll back safely?

If rollback is unclear, stop.


Final thought: databases punish arrogance

Databases are honest.

They don’t care about deadlines, intentions, or confidence.
They only care about correctness.

If you respect existing data, databases will serve you well.
If you rush change, they will expose it—usually in production.

More such articles:

https://medium.com/techwasti

https://www.youtube.com/@maheshwarligade

https://techwasti.com/series/spring-boot-tutorials

https://techwasti.com/series/go-language

More from this blog

T

techwasti

276 posts

TechWasti is a community where we are sharing thoughts, concepts, ideas, and codes.