Database migration – do not trust framework

In theory working with different databases is not a big problem, because framework can handle everything for you. In real world scenarios, there a lot of strange caveats. Last month’s my team works with huge and extraordinarily complex database migration. Process looked and something impossible, but “mission impossible” is exactly something for us. But of course, not challenge itself was the most important in this situation. Thanks to this migration we learned a lot and after we are much more cautious when we work with databases, especially when we use some backend frameworks.

You can ask: why? Framework should do almost everything for us and using different database, if it’s the same type, should be quite easy and smooth. It is not always truth. We can say in most of cases yes, but if you will trust framework completely, you will finish with terrible issues, delays and even some down periods. It is not worth it to do that, so use our methods and check, step by step check anything you can check to discover problem and fix them before you will release updated version. OK, let’s move to some details and real-world examples to explain better many situations.

Autoincrements

We must remember not all databases offer auto increments functionality. It’s because they are just difficult to implement and can cause a lot of strange issues, especially when we need to have more complex replication solutions. We should not rely on this option but use something alternative – like UUIDs generated on application level. If we will forget about that, inserting new record will fail because primary key cannot be empty / null. If we will try to duplicate value, it will also fail because all values in this column must be unique.

Real world example

MySQL handles autoincrement without issues and we can use integer-based columns for such purposes. The problem starts when we want to migrate database. Other engines do not over this and for example. Cockroach DB, Postgres-based database offers only SERIAL function. It provides unique, integer value for all new records, but they are not sequential and can be really big. Instead of 1, 2, 3 you can see for example 3837564536272.

It caused other problem: in some legacy places API sends these ids as. Integers from backend to JavaScript front end. Web app did not allow such big numbers and automatically shortened them. In effect, some records were not available “because did not exist” – of course everything was fine, only one problem was too long int version.

It was not possible to handle all places i.e. guarantee they will send string instead of integer, so team decided to use MySQL-similar auto increment emulation – during saving scripts looks for next value, also uses semaphore system to lock similar operations to avoid parallel saving issues.

Migrations and data types

Other issue is related to data types. Frameworks like Laravel offer elegant way to create all migrations, columns and also even possibility to convert them. Reality is much more complex. It is because one type in Laravel can be represented by completely several types in different database engines – it depends what options they offer. If you will ignore this fact, you can hit the wall during bigger migration and a lot of additional changes will be required.

Real world examples

Someone created table with column status using boolean type. Looks fine. System used MySQL so instead of boolean, TINYINT type has been used. It’s still fine. But… During app development it was required to add additional status and I effect, application was able to save values 0, 1 and 2. For current MySQL system it was fully acceptable, but during conversion to POSTGRESQL there was an error because… Boolean accepts only true (1) or false (0) and does not allow to use any other values (2). Of course, it’s about legacy stuff issue, but shows some types can cause different and strange issues when we will not know what is under the hood.

Other example: timestamps. This is a bit magic in Laravel. In theory nice, in reality not especially, because can be used as totally different option, with different ranges depended on database. From API perspective “it does not matter” because Laravel will use Carbon for all of them, but for database, it is big difference. We cannot say it is safe to use this method. Better option is to clearly use for example dateTime because it is always the same. The same applies to softDeletes option – nice from many developer’s perspective, but in reality, you should avoid that in migrations. Why? Because under the hood, also timestamp is used and you will not know, what effect you will have on different databases. If you will add soft delete column manually, you can control this approach.

And the last one: uuid. Previously I wrote about autoincrements and suggested to use them instead of integers, but it does not mean you should use this method from Laravel. Why? Because you do not know type. Under the hood, it is char36 and it can be fine… but it depends on situation. In our scenario, because of backward compatibility, it was required to sometimes write some shorter, but also unique values. What does it mean – after retrieving such values from database, they had additional empty spaces… And it caused a lot of troubles. Better option was to use varchar36 so we have jumped into it. Again: probably it is a bit edge case, but clear type definition makes everything much simpler.

In the end of this section, I will add something additional related to changing type of existing tables – a lot of developers love this option, because it is just simple. You created a int column and need to change that to string? No problem! Wrong approach… MySQL without strict mode will allow you to do that, but many other database engines not, because such conversion on-the-fly is always risky. This direction looks safe, but try to invert this… string to int? What will be saved into row? Hard to say. Better approach is just to make such changes step by step, so:

  • First add new column to existing table
  • Then copy data from existing to new – convert it in safety way using your codebase, not database
  • Provide saving to two sources at the same time
  • Switch reading from old to new source
  • After, remove saving to old column and finally remove it

Such approach is just safe. It requires more time, for sure, but do you focus on speed, or quality?

Driver impact

We should also consider differences between database drivers on very low level we even cannot control in any way. If someone thinks framework can provide totally abstract and fully compatible layer… Such person makes a big mistake because it’s not truth. Because of these even small differences, a lot of things can work differently or do not work at all in some scenarios.

Real world example

Application has a lot of unit tests, and they did not use real database, but of course application creation was required. Main database was MySQL, and everything worked correctly. After migration to Postgres unit tests executed on CI level stopped working because… Lack of database connection. At the same time, they worked correctly on local environment with all databases enabled and also failed of new Postgres database has been disabled.

After deep investigation, team did not find any test with required real database access situation was clear: MySQL connection used mydqlnd extension from PDO level and used lazy connections I. E. connection was not started before real usages. In effect, all tests worked correctly without database because application even did not try to connect. With Postgres PDO driver everything changed: this driver connects automatically and does not use lazy connections. In effect, tests without available databases started to fail, even if they completely did not require database.

Unfortunately, this behavior is out of control. It. Is not possible to configure it on framework config or even PHP settings because everything is under the hood, in database drivers. The only way was to reconfigure tests config to use sqlite in-memory as “fake” database.