Docker is amazing tool and can make developer work simpler, faster and more efficient. It has a lot of advantages and I have been using it very actively for some time. Unfortunately, I use it on Mac, and it has some big drawbacks. From my perspective, the biggest is performance. Docker for Mac and Docker of Windows uses small virtual machine in the middle, between your host and your containers. It’s ok and necessary, but always causes some performance degradation, especially on networking and any I/O operations. This post is about volumes cachings methods – we can use them to speed up Docker in any platform and should be used, if we can do this.
If you use Docker and you have ever experienced any performance issues, you probably know, what is volumes caching. We use a lot of volumes typically – to share config, to share code, to provide backups, to link some resources. There are plenty of use cases for volumes. But, it’s I/O, sometimes light, sometimes heavy and intensive. It means, it’s also performance degradation, especially on Windows and Mac (it can work much better on Linux with native kernel and many system files). The problem is, that in default configuration, volumes always ensure data consistency between host and containers: if you save something on host or container, changes are immediately reflected on second side, it also protects against data loss.
Docker provides two methods for caching and speed up: both doesn’t provide such consistency but can improve performance a lot. These two methods are “cached” and “delegated”. According to official Docker documentation (April 2020):
The delegated configuration provides the weakest set of guarantees. For directories mounted with delegated the container’s view of the file system is authoritative, and writes performed by containers may not be immediately reflected on the host file system. In situations such as NFS asynchronous mode, if a running container with a delegated bind mount crashes, then writes may be lost.
The cached configuration provides all the guarantees of the delegated configuration, and some additional guarantees around the visibility of writes performed by containers. As such, cached typically improves the performance of read-heavy workloads, at the cost of some temporary inconsistency between the host and the container.
Ok, we can use them simply as volume option, for example:
- my-delegated-volume:/var/volume2:delegated - my-cached-volume:/var/volume1:cached
Differences and choice
And it’s all, all I/O operations on both volumes should be now much faster and with less impact on overall host and Docker performance. The question is: what method should we use? Of course, it depends. The documentation should be clear, but doesn’t provide any real usage examples, so I decided to explain that. First case: we have database container (maybe MySQL, maybe Mongo, maybe PostgreSQL) and we store all data on volume – named to save data in case of container restart, or host because we want to specify path manually, it doesn’t matter. Maybe we want backup this data? It’s read-only operation, we will not do any changes from host perspective, so we don’t need to reflect them on containers. The best option here is “delegated”. Containers will make and see all changes immediately, for host is not important and we can skip that.
Second case is when we modify code frequently on host and read these changes on our containers. In this scenario, the best option is to use “cached” method. Why? Because read-only is mostly on containers side, and we make a lot of writes from host. Differences between “cached” and “delegated” will be not huge, but may be noticed, also, it’s great benefit in comparison to default (of course the best, because safe) method. Do you know any other performance tips for Docker? Please add comment and discuss about them.