Kubernetes: liveness & readiness probes with Nginx and PHP-FPM

Last time I wrote about linking PHP-FPM and Nginx on Kubernetes (but also Docker). It is very common configuration because PHP is so popular as backend and Nginx offers great performance with low resources usage. If we use such mix, we should also think more about proper health check in our cluster. Without that, it is possible clients will “hit the wall” if some of services will stop work or when we will add some important hotfix during normal work. Fortunately, Kubernetes offers us build-in solution to quickly determine state of our pods and containers inside them. Thanks to this, we can periodically check state and decide if something requires attention or automatic actions from infrastructure level. Let’s check them in term of mentioned PHP and Nginx configuration.

At the beginning I would like to point out I will not write about Startup Probe – it is because this element is very dependent on our application, requirements on startup and even additional external factors. In multiple PHP apps it is even not required, because they do not need any startup / boot time. Instead, we will focus only on liveness and readiness probes, because they are completely fine for our needs. Also, in term of this specific mix, it can be a challenging issue – many other similar engines and solutions will work almost identical, so this simple tutorial can be reused for them.

Table of Contents

Liveness Probe

First, we should talk about liveness probe. This one is crucial because determines if we can mark pod as healthy. If not, it will not be used, it will be marked as degraded and after some time replaced by new one created automatically. What healthy means? It depends on the service we want to use of course. For example, in term of Nginx it will be web server status: is config valid? Is server working? In term of PHP-FPM it is about interpreter status. Healthy one will be able to run even super simple PHP script without issues.

In such scenario first idea is to use some commands to check status, so exec. It looks fine, but can trigger some issues, especially zombie processes described by data dog engineers in their scenario. Exec will be slower in most of the cases and if possible, should be avoided, so we can use simple httpGet and just try to handle for example prepared Nginx endpoint and also very simple PHP script.

There is an example of Nginx endpoint:

  location /health {
    return 200 'no content';
  }

And there is a PHP script, let’s say called healthcheck.php and placed into public directory:

<?php

declare(strict_types=1);

echo 'OK';

Finally liveness configuration:

# Nginx
livenessProbe:
  httpGet:
    path: /health
    port: nginx-port
  initialDelaySeconds: 5
  timeoutSeconds: 1
  periodSeconds: 15
  failureThreshold: 3


# PHP-FPM
livenessProbe:
  httpGet:
    path: /healthcheck.php
    port: 80
  initialDelaySeconds: 15
  timeoutSeconds: 1
  periodSeconds: 15
  failureThreshold: 3

You can notice one important thing: if we will use both containers at the same pod, checking PHP will also check Nginx – it’s OK, and not redundant. If we will have issues with web server, PHP can also restart which is not bad, but if we will have issue with PHP-FPM (for example memory leaks) and use liveness probe only in Nginx, system will restart only Nginx container, which will not resolve any issue. It’s tricky because we use web server as proxy, PHP is behind it. You can consider using exec to check only PHP status, but of course PHP command will not be the same as PHP-FPM used by Nginx so you will not check the proper part of the system.

Readiness Probe

According to documentation, readiness probe is used to determine if pod is able to receive traffic, if is ready. What does it mean in reality? Let say in our example Nginx works fine, PHP-FPM also works fine, but for some reason, our application cannot properly handle requests. Maybe it is database connection issue, maybe it is Redis, maybe some issue with our application logic – overall, it cannot handle requests and in effect, should not be used. If readiness probe will fail, such pod will be moved back, and load balancer will not send any traffic into it. It will still work, but do not handle requests.

So, what should we use to determine app state? Here we can use app heath check and it can be even related to external dependencies like mentioned database or caching system. If everything is tight connected and required, we can determine if application is functional or not. Simple endpoint inside app which will check be enough.

Where should we put readiness probe? It depends. In this situation it does not matter: in both scenarios we will use httpGet method, so we can do that on Nginx or PHP-FPM container without issue. Also, if just one readiness probe fail, the whole pod will be marked as unready, so we do not need to copy and reuse it in different places, in our scenario one will be completely fine. Of course, code will be very similar to liveness we used before:

# PHP-FPM
readinessProbe:
  httpGet:
    path: /healthcheck-app.php
    port: 80
  initialDelaySeconds: 20
  timeoutSeconds: 2
  periodSeconds: 30
  failureThreshold: 3

Thing will look differently with different system and other configurations. Let’s say we will have node.js app which will not require additional web server or even we will decide to use FrankenPHP instead of Nginx. In such situation we need to use readiness probe directly in container with application.

Also, please check differences in config between liveness and readiness probes: both use the same failure threshold because 3 attempts are fine to determine something is wrong. The biggest difference is periodSeconds element – health check on application level can be costly because it requires to execute multiple script, checking etc. Remember with replicas is different story: if we will use 5 seconds period and we have 10 pods, it means we will execute the same script 120 times within a minute. If we have more and more pods, it will be even worse – such internal traffic can cause some load on our nodes, and it does not do any business logic for our clients. The key is to find proper balance and avoid exec if possible.