About
Code Shipping - Change and Deployment Pipeline - Development Lifecycle / Workflow with SQL database.
Database migration is the process of changing the structure of the database in an other with mostly DDL statement.
The flow sometimes will block any change that risks data-loss.
See also: Data - Migration
Articles Related
Methods
Their is several methods:
- the creation of script that changes specifically a part of the schema (alter table add column)
- the creation of a ghost table created with the full new SQL definition and filled up through database log or triggers.
- the creation of a delta patch by the analysis of two schema (through SQL grammar or specific)
Script
Type
- Versioned script. Each script contains a delta statement, and should be executed only once in the proper sequence.
- Repeatable script. Rerun (overwrite when digest change) are typically used for
- (Re-)creating views/procedures/functions/packages/…
- Bulk reference data reinserts
- Postprocessing scripts (compile, grant script)
- Preprocessing script (data backup)
- Patch script (out of order versioned script, first execution next migration)
- Test script (test data)
- Database specific script (for a specific database)
- Undo script (undo the versioned script with the same version.)
Order of execution
- Pre processing
- Patch (executing all patch scripts out of order)
- Versioned
- Repeatable
- Post processing. If a post-processing script is modified, all of them are executed again.
- Test data (if in a test environment)
see also: RoundhousE-Script-Order
Runner
- Native (Sqlplus,cmdline,…)
- or code (JDBC, ODBC) with Error handler to customize the behavior when errors happen
Type:
- Dry run: run without applying the changes.
- Validate run: Create/copy in a validate schema and apply the changes on it
- Out of order run: apply also the script not applied with a lower version number
Features
- Unlike typing commands into an interactive SQL window or storing SQL scripts, migration tools keep a detailed history of how your database schema evolved
Methodology
- One project for the migrations
- Deploy Schema & Data Change-sets
- You can think of each migration as being a new 'version' of the database.
- A schema starts off with nothing in it, and each migration modifies it to add or remove tables, columns, or entries.
On databases that support transactions with statements that change the schema, migrations are wrapped in a transaction. If the database does not support this then when a migration fails the parts of it that succeeded will not be rolled back. You will have to rollback the changes that were made by hand.
Script (alter)
Script
Name
Prefix of the migration script: The only benefit of a timestamp over a number is that it almost eliminates the risk of “collision” and makes unnecessary to “reserve” a number over chat before creating the migration.
Rebase
By creating a script for each modification, if you want to retrieve the state of a database for a certain version, it can add a lot of file.
- the first create statement
- and all alter statement.
The list of alter statement can be so long that when you want to speed up the process of building a database, you are merging them in a create statement. With an automatic schema compare, every hundred or two alter file, the initial creation script is recreated (not concatenate all the script, an actual clean creation).
Structure
- Deploy: A deploy directory containing all deploy script. Bv: alter table myTable add column
- Revert: A revert directory containing the revert script. Bv: alter table myTable remove column
- Test: A test script to test the deploy and revert action: select column from my Table = Succes
Language
- Some tools are using a generic language in order to be database agnostic but most of the DBA wants to see the SQL scripts themselves.
- Macro-language built-in for supporting conditional-compilation in your SQL.
Idempotent
Each migration itself is written to be idempotent, e.g. “don’t try to add the column if it’s already there”. Therefore, a specific order isn't necessary needed in the database migration script.
Column
- Add: Almost all database engines can add columns for “free” because they don't go mutate existing rows.
- Drop: Some can drop columns for “free” too by marking the field as obsolete and only bothering to remove it if the rows are touched.
- Rename: Don’t do that.
Step
multi-step transition
A multi-step transition is
- 1. Add new database structure (new columns, new tables, whatever) but leave all the old structure in place
- 2. Update all servers with code that writes in the new format but understands how to read both the new and old structures
- 3. Migrate the data that only exists in the old structure
- 4. Get rid of the old stuff from the database
- 5. Get rid of the code that is responsible for reading the old format
Conceptually it's straightforward but it can take a long time in calendar days depending on the deployment schedule, it can be tough to keep track of what's been migrated, and the data migration will cause performance issues if you don't plan it properly (e.g. trying to do a migration that locks an important table).
You just have do it in a way where each individual change is backward compatible and you don't move on to the next change until the previous one is rolled out everywhere.
transaction
Most of the migration script are done in a transaction because nobody want a migration to partially succeed but some database transaction are automatically committing any change. Ex: CREATE INDEX CONCURRENTLY in Postgres databases can't run in a transaction.
Flyway do everything in a transaction but in the non transnational cases, the migration must be done manually and update Flyway's schema_version table, which is annoyingly complex (11 columns, some with no obvious purpose).
Rollback
By deploying small and often, rollback are barely needed. It’s often quicker to fix code and deploy than reverse a migration (especially across hundreds of databases). If we needed to reverse something, we could just push another migration negating whatever we did that went boom.
Why roll back when you can roll forward?
Method
Delta in release / Definition in trunk
The truth is always in the release. The trunk has always errors. Example
Error(2739,15): PL/SQL: ORA-00904: "WAT_EVER_COLUMN": invalid identifier
Tool
Framework | Script | Language | Desc |
---|---|---|---|
My Batis | Native | Java | Version management in the script name (undo management and prod script creation) |
Flyway 2013 | Native | Java | Version management in the script name |
LiquiBase | Xml | Java | Database agnostic through the use of an XML file the databaseChangeLogFile |
Ghost from Github | gh-ost is a triggerless online schema migration solution for MySQL using the binary log. Must see doc | ||
DbDeploy | Native script | Java, C# | SQL delta scripts management with versioning in the file name |
DBMaintain | Native | Java | Script with version in the file name |
Roundhouse | Native | C# |
- DbMaestro,
- iBatis Migrator,
- Rails Migrations
- Microsoft SSDT for SQL Server
- https://github.com/realtymaps/dbsync - Javascript based on http://knexjs.org/
Log
Change Log is generally saved in a table within the target database.
Example for DbDeploy:
Column | Data Type | Description |
---|---|---|
ChangeId | INT | Auto-incrementing unique ID for each change entry |
Folder | VARCHAR(256) | Versioned folder name (Example: v1.0). |
ScriptNumber | SMALLINT | Sequential script number within folder. |
ScriptName | VARCHAR(512) | File name including extension. |
StartDate | DATETIME | Date and time script started. |
CompleteDate | DATETIME | Date and time script ended even if it failed. |
AppliedBy | VARCHAR(128) | User account that ran the script. |
ScriptStatus | TINYINT | 0 = Failure, 1 = Success, 2 = Problem Resolved, 3 = Started |
ScriptOutput | VARCHAR(MAX) | Full output of the script execution. |
Rule
The rules that I will search in a database migration tool:
- No version in the file name. It means that you can apply any naming convention. Example flyway makes mandatory to have the number in the file name.
- Version must be a metadata value in the sql file. A time for instance.
- No rollback capability. If there is an error, stop the release, change the queries, commit them in the version repository and restart. If we needed to reverse something, we just push another migration negating whatever we did.
- Compare the set of sql and their timestamp on the file system and in the log table before applying.