Server database storage symbolise data migration with Node.js PostgreSQL.

Migrations with Node.js and PostgreSQL

Estimated reading time: 7 minutes

Author: Raphael Pigulla

Every project that persists data has to inevitably face the fact that the structure of that data is not static and needs to adapt. The reasons for this are manifold: business requirements change, entities grow too large and need to be broken apart or performance issues require additional indices or denormalization. Today we will be looking at how to tackle the problem of managing structural changes in the context of Node.js and PostgreSQL.

Managing changes to the database and executing them safely is essential – a botched migration can result in inconsistencies, loss of data or even bring down the entire system. For this reason, we establish the following hard constraints:

All or nothing
Just like most business operations, a migration must be done in an all-or-nothing fashion: either the entire data set must be migrated, or no changes must be applied at all. While practically all relational databases are ACID-compliant and support transactions, not all do so for structural changes. For instance, the widely popular MySQL does not.
Rolling back
Even if a migration was run successfully, it could still be necessary to quickly revert the changes made. Suppose that a migration introduced a database constraint that does not properly align with the business requirements or the assumptions made in other parts in the code, resulting in users not being able to actually save some records. Restoring the database to a recent backup is a risky option as it will probably cause the loss of data unrelated to the issue at hand. Instead, a down migration that cleanly reverses the applied changes is preferable.
Infrastructure as Code
The structure of the database, i.e. the tables, columns, foreign keys, constraints and so on, is very closely coupled to the application itself. Even if object-relational mapping (ORM) is employed, and the persistence implementation is cleanly decoupled from the rest of the code base, a structural change to the database will typically require at least some small change to the application itself or its configuration. For this reason, the migrations should be part of the code base so that the (desired) state of the database always aligns with the implementation for any point in time.

The Weapon of Choice

Using an ORM like Sequelize usually gets us a migration framework for free, and many Database Abstraction Layers (DBAL) or query builders like knex also come with migration support baked in. However, unless supporting multiple databases is actually a requirement or desired feature for the application, doing so will slow down development for little tangible benefit. Switching databases halfway through a project rarely happens, and when it does it never is quite as seamless as the marketing slides of the abstraction layers make us believe.

Consequently, in environments where support for or switching between databases is not an immediate concern, using non-generic tools leads to fewer layers of abstraction. This, in turn, results in code that is easier to understand and simpler to debug. On top of that, database-specific features, which typically require falling back to raw SQL when using a DBAL, are much less awkward to use.

In our example we will be using PostgreSQL which is not only a very mature RDBMS but also widely available as a managed or even serverless solution on all major cloud platforms. Needless to say, it satisfies all the previously defined constraints, making it a rock-solid choice for our use case. On the software side of things, we go with pg-migrate which integrates well with our fictive TypeScript application.

Setting up pg-migrate

After installing the library, we must first configure how it connects to the database. This can be done in a variety of ways, such as defining a DATABASE_URL environment variable, using node-config or dotenv. While pg-migrate has built-in support for the latter two, they must be installed explicitly – if this is omitted, the library will not complain but just silently ignore the corresponding files.

Going forward, we will be using node-config and a PostgreSQL Docker container with its default settings and postgres as the password:

config/default.json

JSON configuration for PostgreSQL database migration. Migration Node.js PostgreSQL

The main ingredient here is the database URL (in a real-world scenario we wouldn’t store credentials in a plain-text config file, but that is a different story). Luckily, pg-migrate comes with support for TypeScript right out of the box, all we have to do is point it to the right configuration file. Finally, merely as a cosmetic change, we want migration files to be prefixed with human-readable date and time string instead of the default Unix timestamp.

Getting started

At its core, the way migrations work is pretty straight forward:

each migration is a file in the /migrations directory which exports an up() and optionally a down() function
a special pgmigrations table in the database (which is created automatically) keeps track of which migrations were already applied
executing or reverting migrations simply means sequentially invoking the required up() and down() methods, respectively

Behind the scenes, pg-migrate takes care of enumerating and loading the migration files, detecting which of them (if any) need to run, and wrapping everything in a transaction. Once the set up is done, writing migrations becomes a very easy task.

Our application requires a simple user table, so let’s set this up by creating our first migration:

Command to create a user-table migration in Node.js pg-migrate. Migration Node.js PostgreSQL

Note that all this does is create a scaffold for our migration, so we should see a new file in the /migrations folder. The user-table argument is only used for generating the file name, and we can change it as desired until the migration is actually run. As a matter of fact, at this point pg-migrate has not yet talked to the database at all!

After removing some cruft, our migration file will look something like this:

Migration script for Node.js and PostgreSQL shows code for table creation.

The scaffolded down() function was intentionally removed for reasons that will be explained in a minute. As we can see, everything revolves around the MigrationBuilder that the library provides to us. There is a lot that can be done with it but let’s start with something very basic:

Code example for the migration to Node.js PostgreSQL.

This creates a new users table with three columns. The value id is a convenient shortcut for creating a primary key column of type serial. Not explicitly setting notNull to true makes date_of_birth nullable by default. It is important to understand that the command only instructs the library to create a table, it does not actually execute the corresponding statement directly (which is why we don’t have to await it).

Now let’s fire up a Docker container and test the migration:

Docker command for executing the database migration with PostgreSQL. Migration Node.js PostgreSQL.

The up command runs all pending migrations, which in our case is only one. If we try to run it again, pg-migrate will detect that it has already been applied and do nothing:

Node.js migration completely without migrations to be executed. Migration Node.js PostgreSQL

What goes up…

Let’s assume that after deploying everything to production, the business team decides that they want to enable users to optionally add their real name to their profiles. As far as migrations are concerned, all we need to do is create a new one:

Command and code to add a column name in PostgreSQL with Node.js.

After applying the new migration – just as we did previously -, we now have an additional name column that we can use. Great success!

…must come down

When testing the new feature on the staging environment, the business team realized that it would be much better to store given and family name separately. As it turns, this is actually not trivial so the entire feature was scrapped for the time being. Luckily, rolling this back in the database is easy as pie:

Script to undo the migration for Node.js PostgreSQL, including deletion of the name column from the users table.

Note that while up runs all pending migrations, down only reverts the most recent one.

But wait a minute – we never implemented the down() function! What is happening? As it turns out, many methods like pgm.createTable() or pgm.addColumn() “know” how to undo their own changes which enables pg-migrate to conveniently generate the reverse migration for us. Unfortunately, many of the more frequently used functions, most prominently pgm.alterColumn() and pgm.alterTable(), do not fall under that category. Attempting to migrate down will fail, it is in these circumstances that one has to manually implement the down() function:

Exports the downward function as false for migration Node.js PostgreSQL.

A special case here is pgm.addConstraint() which does not support inferring the down migration if we rely on the auto-generated constraint name, but will work just fine if we don’t. There are also some features that are not yet supported by the MigrationBuilder (such as granting permissions or inserting, deleting or updating rows) in which case we have to resort to raw SQL statements with pgm.sql(). Executing arbitrary SQL obviously means that it is impossible for the library to automatically infer the down migration, so it will again throw an error and tell us as much if we ask it to do so. As a side note, pg-migrate does not come with support for escaping so using an additional library such as pg-escape is recommended.

Sometimes it is plain impossible to undo a migration, for instance if you drop a column or a table and there is no way to reconstruct the data once it is gone. In this case we can inform the library like so:

Conclusion

If PostgreSQL is the database-basket you put your eggs in and you don’t use an ORM that comes with a migration framework, pg-migrate is a rock-solid, lightweight and simple to use choice. There are still a few issues to solve in practice, such as how and when to actually run the migrations. While this not trivial to answer in general terms, finding a practical solution in a concrete project with a known architecture is usually pretty straight forward.

Autor: Raphael Pigulla

Raphael Pigulla has been working in various areas of web development for over 20 years and has accompanied a large number of different projects both in the frontend and backend, at start-ups and large corporations as Principal Architect. In addition to agile software development, he is also passionate about training and further education, where he has been particularly involved in recent years, both professionally and on a voluntary basis.