data:database:migration

About

Code Shipping - Change and Deployment Pipeline - Development Lifecycle / Workflow with SQL database.

Database migration is the process of changing the structure of the database in an other with mostly DDL statement.

The flow sometimes will block any change that risks data-loss.

Articles Related

Methods

Their is several methods:

the creation of script that changes specifically a part of the schema (alter table add column)
the creation of a ghost table created with the full new SQL definition and filled up through database log or triggers.
the creation of a delta patch by the analysis of two schema (through SQL grammar or specific)

Script

Type

Versioned script. Each script contains a delta statement, and should be executed only once in the proper sequence.
Repeatable script. Rerun (overwrite when digest change) are typically used for
- (Re-)creating views/procedures/functions/packages/…
- Bulk reference data reinserts
Postprocessing scripts (compile, grant script)
Preprocessing script (data backup)
Patch script (out of order versioned script, first execution next migration)
Test script (test data)
Database specific script (for a specific database)
Undo script (undo the versioned script with the same version.)

Order of execution

Pre processing
Patch (executing all patch scripts out of order)
Versioned
Repeatable
Post processing. If a post-processing script is modified, all of them are executed again.
Test data (if in a test environment)

Runner

Native (Sqlplus,cmdline,…)
or code (JDBC, ODBC) with Error handler to customize the behavior when errors happen

Type:

Dry run: run without applying the changes.
Validate run: Create/copy in a validate schema and apply the changes on it
Out of order run: apply also the script not applied with a lower version number

Features

Unlike typing commands into an interactive SQL window or storing SQL scripts, migration tools keep a detailed history of how your database schema evolved

Methodology

One project for the migrations
Deploy Schema & Data Change-sets
You can think of each migration as being a new 'version' of the database.
A schema starts off with nothing in it, and each migration modifies it to add or remove tables, columns, or entries.

On databases that support transactions with statements that change the schema, migrations are wrapped in a transaction. If the database does not support this then when a migration fails the parts of it that succeeded will not be rolled back. You will have to rollback the changes that were made by hand.

Script (alter)

Script

Name

Prefix of the migration script: The only benefit of a timestamp over a number is that it almost eliminates the risk of “collision” and makes unnecessary to “reserve” a number over chat before creating the migration.

Rebase

By creating a script for each modification, if you want to retrieve the state of a database for a certain version, it can add a lot of file.

the first create statement
and all alter statement.

The list of alter statement can be so long that when you want to speed up the process of building a database, you are merging them in a create statement. With an automatic schema compare, every hundred or two alter file, the initial creation script is recreated (not concatenate all the script, an actual clean creation).

Structure

Deploy: A deploy directory containing all deploy script. Bv: alter table myTable add column
Revert: A revert directory containing the revert script. Bv: alter table myTable remove column
Test: A test script to test the deploy and revert action: select column from my Table = Succes

Language

Some tools are using a generic language in order to be database agnostic but most of the DBA wants to see the SQL scripts themselves.
Macro-language built-in for supporting conditional-compilation in your SQL.

Idempotent

Each migration itself is written to be idempotent, e.g. “don’t try to add the column if it’s already there”. Therefore, a specific order isn't necessary needed in the database migration script.

Column

Add: Almost all database engines can add columns for “free” because they don't go mutate existing rows.
Drop: Some can drop columns for “free” too by marking the field as obsolete and only bothering to remove it if the rows are touched.
Rename: Don’t do that.

Step

multi-step transition

A multi-step transition is

1. Add new database structure (new columns, new tables, whatever) but leave all the old structure in place
2. Update all servers with code that writes in the new format but understands how to read both the new and old structures
3. Migrate the data that only exists in the old structure
4. Get rid of the old stuff from the database
5. Get rid of the code that is responsible for reading the old format

Conceptually it's straightforward but it can take a long time in calendar days depending on the deployment schedule, it can be tough to keep track of what's been migrated, and the data migration will cause performance issues if you don't plan it properly (e.g. trying to do a migration that locks an important table).

You just have do it in a way where each individual change is backward compatible and you don't move on to the next change until the previous one is rolled out everywhere.

transaction

Most of the migration script are done in a transaction because nobody want a migration to partially succeed but some database transaction are automatically committing any change. Ex: CREATE INDEX CONCURRENTLY in Postgres databases can't run in a transaction.

Flyway do everything in a transaction but in the non transnational cases, the migration must be done manually and update Flyway's schema_version table, which is annoyingly complex (11 columns, some with no obvious purpose).

Rollback

By deploying small and often, rollback are barely needed. It’s often quicker to fix code and deploy than reverse a migration (especially across hundreds of databases). If we needed to reverse something, we could just push another migration negating whatever we did that went boom.

Why roll back when you can roll forward?
Nick Craver (Stack Overflow)

Method

Delta in release / Definition in trunk

The truth is always in the release. The trunk has always errors. Example

Error(2739,15): PL/SQL: ORA-00904: "WAT_EVER_COLUMN": invalid identifier

Tool

Framework	Script	Language	Desc
My Batis	Native	Java	Version management in the script name (undo management and prod script creation)
Flyway 2013	Native	Java	Version management in the script name
LiquiBase	Xml	Java	Database agnostic through the use of an XML file the databaseChangeLogFile
Ghost from Github			gh-ost is a triggerless online schema migration solution for MySQL using the binary log. Must see doc
DbDeploy	Native script	Java, C#	SQL delta scripts management with versioning in the file name
DBMaintain	Native	Java	Script with version in the file name
Roundhouse	Native	C#

DbMaestro,
iBatis Migrator,
Rails Migrations
AliaSql
Microsoft SSDT for SQL Server
Sqitch
https://github.com/realtymaps/dbsync - Javascript based on http://knexjs.org/

Log

Change Log is generally saved in a table within the target database.

Example for DbDeploy:

Column	Data Type	Description
ChangeId	INT	Auto-incrementing unique ID for each change entry
Folder	VARCHAR(256)	Versioned folder name (Example: v1.0).
ScriptNumber	SMALLINT	Sequential script number within folder.
ScriptName	VARCHAR(512)	File name including extension.
StartDate	DATETIME	Date and time script started.
CompleteDate	DATETIME	Date and time script ended even if it failed.
AppliedBy	VARCHAR(128)	User account that ran the script.
ScriptStatus	TINYINT	0 = Failure, 1 = Success, 2 = Problem Resolved, 3 = Started
ScriptOutput	VARCHAR(MAX)	Full output of the script execution.

Rule

The rules that I will search in a database migration tool:

No version in the file name. It means that you can apply any naming convention. Example flyway makes mandatory to have the number in the file name.
Version must be a metadata value in the sql file. A time for instance.
No rollback capability. If there is an error, stop the release, change the queries, commit them in the version repository and restart. If we needed to reverse something, we just push another migration negating whatever we did.
Compare the set of sql and their timestamp on the file system and in the log table before applying.

Documentation / Reference

https://news.ycombinator.com/item?id=11623784

Table of Contents

Database - Migration (Deployment|Versioning|Change Management)

About

Articles Related

Methods

Script

Type

Order of execution

Runner

Features

Methodology

Script (alter)

Script

Name

Rebase

Structure

Language

Idempotent

Column

Step

multi-step transition

transaction

Rollback

Method

Delta in release / Definition in trunk

Tool

Log

Rule

Documentation / Reference