Database Migration Strategies for Zero-Downtime Deployments: A Step-by-Step Guide

Devops & Infrastructure, Tips & Tricks, and Tutorials

Database Migration Strategies for Zero-Downtime Deployments: A Step-by-Step Guide

Database schema changes are among the most nerve-wracking aspects of deploying applications. While your code can be swapped instantly, your database carries persistent state that can't simply be replaced. A single poorly planned migration can bring down your entire application, corrupt data, or leave you scrambling to rollback at 3 AM.

This guide walks you through proven strategies for executing database migrations without interrupting service, complete with practical examples and step-by-step instructions you can apply to your next deployment.

Why Database Migrations Are Different

When you deploy new application code, the old version stops and the new version starts. Simple enough. But databases don't work that way. Your database must:

  • Remain available while changes are applied
  • Support both old and new application code during the transition
  • Preserve data integrity throughout the process
  • Allow rollback if something goes wrong

The fundamental challenge is that your application code deploys in seconds, but database migrations can take minutes, hours, or even days for large tables. During that window, your old code and new code may both be running against a schema that's in flux.

The Golden Rule: Backward Compatibility

Before diving into specific strategies, understand this principle: every migration must be backward compatible with the currently running application code.

This means your migration should complete successfully while the old version of your application is still running. Only after the migration finishes should you deploy the new application code. This decoupling is what makes zero-downtime deployments possible.

Strategy 1: The Expand-Contract Pattern

The expand-contract pattern (sometimes called "parallel change") is the most reliable approach for complex schema changes. It breaks dangerous migrations into safe, incremental steps.

How It Works

Instead of making a breaking change in one step, you expand the schema to support both old and new structures, migrate data, update the application, and then contract by removing the old structure.

Example: Renaming a Column

Let's say you need to rename user_name to username in your users table.

The Wrong Way (Causes Downtime)

ALTER TABLE users RENAME COLUMN user_name TO username;

This immediately breaks any running code that references user_name.

The Right Way (Zero Downtime)

Phase 1: Expand

Add the new column alongside the old one:

ALTER TABLE users ADD COLUMN username VARCHAR(255);

Phase 2: Migrate and Sync

Copy existing data and set up triggers to keep both columns in sync:

-- Copy existing data
UPDATE users SET username = user_name WHERE username IS NULL;

-- Create trigger to sync writes (PostgreSQL example)
CREATE OR REPLACE FUNCTION sync_username()
RETURNS TRIGGER AS $$
BEGIN
  IF NEW.user_name IS DISTINCT FROM OLD.user_name THEN
    NEW.username := NEW.user_name;
  ELSIF NEW.username IS DISTINCT FROM OLD.username THEN
    NEW.user_name := NEW.username;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sync_username_trigger
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_username();

Phase 3: Update Application

Deploy new application code that reads from username but writes to both columns. Then deploy again to read and write only to username.

Phase 4: Contract

Once you're confident the old column is no longer needed:

DROP TRIGGER sync_username_trigger ON users;
DROP FUNCTION sync_username();
ALTER TABLE users DROP COLUMN user_name;

When to Use Expand-Contract

This pattern is ideal for:

  • Renaming columns or tables
  • Changing column types
  • Splitting or merging tables
  • Any change that would break existing queries

Strategy 2: Additive-Only Migrations

The simplest zero-downtime migrations are purely additive. If you're only adding new things, the old application code simply ignores them.

Safe Additive Changes

These operations are inherently safe:

-- Adding a new table
CREATE TABLE notifications (...);

-- Adding a new nullable column
ALTER TABLE users ADD COLUMN phone_number VARCHAR(20);

-- Adding a new index (with CONCURRENTLY in PostgreSQL)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Adding a new column with a default value (PostgreSQL 11+, MySQL 8.0.12+)
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';

The Nullable Column Trick

When adding a required field, don't add it with NOT NULL immediately:

Phase 1: Add as nullable

ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);

Phase 2: Deploy code that writes to the new column

Phase 3: Backfill existing records

UPDATE orders SET tracking_number = 'LEGACY-' || id WHERE tracking_number IS NULL;

Phase 4: Add the constraint

ALTER TABLE orders ALTER COLUMN tracking_number SET NOT NULL;

Strategy 3: Online Schema Migration Tools

For large tables where even ALTER TABLE causes locks, specialized tools perform migrations without blocking reads or writes.

pt-online-schema-change (Percona Toolkit for MySQL)

Creates a shadow table, copies data in chunks, and swaps tables atomically:

pt-online-schema-change \
  --alter "ADD COLUMN preferences JSON" \
  --execute \
  D=myapp,t=users

gh-ost (GitHub Online Schema Migration for MySQL)

Uses the binary log to capture changes, avoiding triggers:

gh-ost \
  --alter="ADD COLUMN preferences JSON" \
  --database="myapp" \
  --table="users" \
  --execute

pg_repack (PostgreSQL)

Repacks tables to reclaim space and rebuild indexes without locks:

pg_repack --table users --jobs 4 myapp

How These Tools Work

  1. Create a new table with the desired schema

  2. Copy data from the original table in small batches

  3. Capture changes to the original table during the copy

  4. Apply captured changes to the new table

  5. Atomically swap the tables

This approach keeps the original table fully operational throughout the migration.

Strategy 4: Blue-Green Database Deployments

For the most critical migrations, maintain two complete database environments and switch between them.

The Setup

Blue Environment: Your current production database
Green Environment: A replica that will receive the new schema

The Process

Step 1: Set up replication from Blue to Green

Step 2: Stop replication and apply migrations to Green

Step 3: Verify Green has the correct schema and data

Step 4: Switch application traffic to Green

Step 5: Green becomes the new Blue for future deployments

Considerations

This approach requires:

  • Double the database infrastructure (temporarily)
  • Careful handling of writes during the switch
  • A reliable way to pause writes momentarily during cutover

It's typically reserved for major version upgrades or fundamental schema overhauls where other strategies are too risky.

Strategy 5: Feature Flags for Database Changes

Combine feature flags with your migrations to control the rollout and enable quick rollbacks.

Implementation Example (Ruby on Rails)

class User < ApplicationRecord
  def display_name
    if FeatureFlag.enabled?(:use_new_username_column)
      username
    else
      user_name
    end
  end
end

Rollout Sequence

  1. Deploy migration adding the new column
  2. Deploy code with feature flag (flag off by default)
  3. Enable flag for a small percentage of users
  4. Monitor for errors
  5. Gradually increase percentage to 100%
  6. Remove the flag and old column in a future deployment

This approach gives you a kill switch if something goes wrong with the new schema.

Handling Foreign Keys and Constraints

Constraints require special care because they can block concurrent operations.

Adding Foreign Keys Without Locking

PostgreSQL:

-- Add the constraint as NOT VALID (doesn't check existing rows)
ALTER TABLE orders 
ADD CONSTRAINT fk_orders_users 
FOREIGN KEY (user_id) REFERENCES users(id) 
NOT VALID;

-- Validate in the background
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_users;

MySQL (8.0+):

-- Use ALGORITHM=INPLACE, LOCK=NONE where possible
ALTER TABLE orders 
ADD CONSTRAINT fk_orders_users 
FOREIGN KEY (user_id) REFERENCES users(id),
ALGORITHM=INPLACE, LOCK=NONE;

Removing Foreign Keys Safely

Always remove the constraint before removing the referenced column:

-- Step 1: Drop the constraint
ALTER TABLE orders DROP CONSTRAINT fk_orders_users;

-- Step 2: Later, drop the column if needed
ALTER TABLE users DROP COLUMN deprecated_field;

Large Data Migrations

When you need to transform millions of rows, batch processing prevents table locks and memory issues.

Batched Updates

# Ruby/Rails example
User.where(status: nil).find_in_batches(batch_size: 1000) do |batch|
  User.where(id: batch.map(&:id)).update_all(status: 'active')
  sleep(0.1) # Brief pause to reduce database load
end
-- Pure SQL with batching (PostgreSQL)
DO $$
DECLARE
  batch_size INT := 1000;
  affected INT;
BEGIN
  LOOP
    UPDATE users 
    SET status = 'active' 
    WHERE id IN (
      SELECT id FROM users 
      WHERE status IS NULL 
      LIMIT batch_size
      FOR UPDATE SKIP LOCKED
    );

    GET DIAGNOSTICS affected = ROW_COUNT;
    EXIT WHEN affected = 0;

    COMMIT;
    PERFORM pg_sleep(0.1);
  END LOOP;
END $$;

Background Jobs for Large Migrations

For very large tables, run migrations as background jobs:

class BackfillUserStatusJob < ApplicationJob
  def perform(start_id, end_id)
    User.where(id: start_id..end_id, status: nil)
        .update_all(status: 'active')
  end
end

# Enqueue jobs for each batch
User.select(:id).find_in_batches(batch_size: 10_000) do |batch|
  BackfillUserStatusJob.perform_later(batch.first.id, batch.last.id)
end

Creating Your Migration Checklist

Before every migration, answer these questions:

Pre-Migration

  • [ ] Is this migration backward compatible with the current application code?
  • [ ] Have you tested the migration on a production-like dataset?
  • [ ] Do you have a rollback plan?
  • [ ] Have you estimated how long the migration will take?
  • [ ] Are there any table locks, and if so, for how long?
  • [ ] Have you scheduled the migration during low-traffic periods?

During Migration

  • [ ] Are you monitoring database performance?
  • [ ] Is the application still responding normally?
  • [ ] Are there any unexpected locks or slow queries?

Post-Migration

  • [ ] Has the migration completed successfully?
  • [ ] Is the application functioning correctly?
  • [ ] Are there any orphaned records or data inconsistencies?
  • [ ] Have you documented what changed and when?

Integrating with Your Deployment Pipeline

Zero-downtime database migrations work best when integrated into your deployment workflow. Here's how to structure your pipeline:

  1. Run pre-deployment migrations: Changes that add new structures (expand phase)
  2. Deploy application code: New code that works with both old and new schemas
  3. Verify deployment: Confirm the application is healthy
  4. Run post-deployment migrations: Changes that remove old structures (contract phase)

Using DeployHQ for Database Migrations

DeployHQ's deployment pipeline supports running migrations at the right point in your deployment:

SSH Commands: Execute migration scripts on your server before or after file deployment

Build Commands: Run migrations as part of your build process for platforms like Heroku

Deployment Notifications: Alert your team when migrations start and complete

For example, you might configure a pre-deployment SSH command:

cd /var/www/myapp && php artisan migrate --force

Or for Rails:

cd /var/www/myapp && RAILS_ENV=production bundle exec rake db:migrate

The key is separating backward-compatible migrations (run before deployment) from cleanup migrations (run after deployment is verified).

Common Pitfalls and How to Avoid Them

Pitfall 1: Not Testing with Production Data Volume

A migration that takes 2 seconds on your development database might take 2 hours in production. Always test with realistic data volumes.

Pitfall 2: Ignoring Index Creation Time

Creating indexes on large tables can lock the table for extended periods. In PostgreSQL, always use CREATE INDEX CONCURRENTLY. In MySQL, check if your version supports ALGORITHM=INPLACE, LOCK=NONE.

Pitfall 3: Forgetting About Read Replicas

If you use read replicas, ensure your application can handle replication lag during migrations. The replica might temporarily have a different schema than the primary.

Pitfall 4: Not Having a Rollback Plan

Every migration should have a corresponding rollback script. Test your rollback procedure before deploying.

Pitfall 5: Running Migrations During Peak Traffic

Even "safe" migrations consume database resources. Schedule them during low-traffic periods when possible.

Framework-Specific Considerations

Ruby on Rails

Rails provides strong migration tooling but doesn't enforce zero-downtime patterns. Consider using the strong_migrations gem to catch unsafe migrations:

# Gemfile
gem 'strong_migrations'

This will warn you about potentially dangerous operations and suggest safer alternatives.

Laravel

Laravel's migrations are flexible but require discipline for zero-downtime deployments. Use the --pretend flag to preview SQL before running:

php artisan migrate --pretend

Django

Django's migration system handles dependencies well. For zero-downtime deployments, use RunSQL with separate forward and reverse operations:

migrations.RunSQL(
    sql="ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active'",
    reverse_sql="ALTER TABLE users DROP COLUMN status"
)

Conclusion

Zero-downtime database migrations require more planning than simply running ALTER TABLE, but the investment pays off. By following the expand-contract pattern, using additive-only changes where possible, and leveraging specialized tools for large tables, you can deploy database changes with confidence.

The key principles to remember:

  • Decouple migrations from application deployments
  • Ensure backward compatibility at every step
  • Use appropriate tools for your database and table sizes
  • Always have a rollback plan
  • Monitor throughout the process

With these strategies in your toolkit, database migrations become a routine part of deployment rather than a source of anxiety. Your users will never notice the sophisticated choreography happening behind the scenes—and that's exactly the point.


Looking to streamline your deployment pipeline and automate database migrations? DeployHQ helps development teams deploy code and run migrations reliably, with SSH command support for executing migration scripts at the perfect point in your deployment workflow.

A little bit about the author

Facundo | CTO | DeployHQ | Continuous Delivery & Software Engineering Leadership - As CTO at DeployHQ, Facundo leads the software engineering team, driving innovation in continuous delivery. Outside of work, he enjoys cycling and nature, accompanied by Bono 🐶.