Gateway health probes

The health status for a standalone gateway is available at {zeebe-gateway}:8080/actuator/health

The following health indicators are enabled by default

Gateway Started - checks whether the gateway is running (i.e. not currently starting and not yet shut down)
Gateway Responsive - checks whether the gateway can handle a request within a given timeout
Gateway Cluster Awareness - checks whether the gateway is aware of other nodes in the cluster
Gateway Partition Leader Awareness - checks whether the gateway is aware of partition leaders in the cluster
Disk Space - checks that the free disk space is greater than 10 MB
Memory - checks that at least 10% of max memory (heap) are still available

Health indicators are set to sensible defaults. For specific use cases, it might be necessary to customize health probes.

Startup probe

The started probe is available at {zeebe-gateway}:8080/actuator/health/startup

In the default configuration this is merely an alias for the Gateway Started health indicator. Other configurations are possible (see below).

Liveness probe

The liveness probe is available at {zeebe-gateway}:8080/actuator/health/liveness

It is based on the health indicators mentioned above.

In the default configuration, the liveness probe is comprised of the following health indiactors:

Gateway Started - checks whether the gateway is running (i.e. not currently starting and not yet shut down)
Liveness Gateway Responsive - checks whether the gateway can handle a request within an ample timeout, but will only report a DOWN health status after the underlying health indicator is down for more than 10 minutes
Liveness Gateway Cluster Awareness - based on gateway cluster awareness, but will only report a DOWN health status after the underlying health indicator is down for more than 5 minutes
Liveness Gateway Partition Leader Awareness - based on gateway partition leader awareness, but will only report a DOWN health status after the underlying health indicator is down for more than 5 minutes
Liveness Disk Space - checks that the free disk space is greater than 1 MB
Liveness Memory - checks that at least 1% of max memory (heap) are still available

Note that health indicators with the liveness prefix are intended to be customized for the liveness probe. This allows defining tighter thresholds (e.g. for free memory 1% for liveness vs. 10% for health), as well as adding tolerance for short downtimes (e.g. gateway has no awareness of other nodes in the cluster for more than 5 minutes).

Customizing health probes

Global settings for all health indicators:

management.health.defaults.enabled=true - enables (default) or disables all health indicators
management.endpoint.health.show-details=always/never - toggles whether a summary or details (default) of the health indicators will be returned

Startup probe

Settings for started probe:

management.endpoint.health.group.startup.show-details=never - toggles whether a summary (default) or details of the startup probe will be returned
management.endpoint.health.group.startup.include=gatewayStarted - defines which health indicators are included in the startup probe

Liveness probe

Settings for liveness probe:

management.endpoint.health.group.liveness.show-details=never - toggles whether a summary (default) or details of the liveness probe will be returned
management.endpoint.health.group.liveness.include=gatewayStarted,livenessGatewayResponsive,livenessGatewayClusterAwareness,livenessGatewayPartitionLeaderAwareness,livenessDiskSpace,livenessMemory - defines which health indicators are included in the liveness probe

Note that the individual contributing health indicators of the liveness probe can be configured as well (see below).

Gateway started

Settings for gateway started health indicator:

management.health.gateway-started.enabled=true - enables (default) or disables this health indicator

Gateway responsive

Settings for gateway responsiveness health indicator:

management.health.gateway-responsive.enabled=true - enables (default) or disables this health indicator
management.health.gateway-responsive.requestTimeout=500ms - defines the timeout for the request; if the test completes before the timeout, the health status is UP, otherwise it is DOWN
management.health.liveness.gateway-responsive.requestTimeout=5s - defines the timeout for the request for liveness probe; if the request completes before the timeout, the health status is UP
management.health.liveness.gateway-responsive.maxdowntime=10m - - defines the maximum downtime before the liveness health indicator for responsiveness will flip

Gateway cluster awareness

Settings for gateway cluster awareness health indicator:

management.health.gateway-clusterawareness.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
management.health.liveness.gateway-clusterawareness.maxdowntime=5m - defines the maximum downtime before the liveness health indicator for cluster awareness will flip. In other words: this health indicator will report DOWN after the gateway was unaware of other members in the cluster for more than 5 minutes

Gateway partition leader awareness

Settings for gateway partition leader awareness health indicator:

management.health.gateway-partitionleaderawareness.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
management.health.liveness.gateway-partitionleaderawareness.maxdowntime=5m - defines the maximum downtime before the liveness health indicator for partition leader awareness will flip. In other words: this health indicator will report DOWN after the gateway was unaware of partition leaders for more than 5 minutes

Disk space

This is arguably the least critical health indicator given that the standalone gateway does not write to disk. The only exception may be the writing of log files, which depend on the log configuration.

Settings for disk space health indicator:

management.health.diskspace.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
management.health.diskspace.threshold=10MB - defines the threshold for the required free disk space
management.health.diskspace.path=. - defines the path for which the free disk space is examined
management.health.liveness.diskspace.threshold=1MB - defines the threshold for the required free disk space for liveness
management.health.liveness.diskspace.path=. - defines the path for which the free disk space for liveness is examined

Memory

This health indicator examines free memory (heap).

Settings for memory health indicator:

management.health.memory.enabled=true - enables (default) or disables this health indicator (and its liveness counterpart)
management.health.memory.threshold=0.1 - defines the threshold for the required free memory. The default is 0.1 which is interpreted as 10% of max memory
management.health.liveness.memory.threshold=0.01 - defines the threshold for the required free memory for liveness. The default is 0.01 which is interpreted as 10 of max memory

Startup probe​

Liveness probe​

Customizing health probes​

Startup probe​

Liveness probe​

Gateway started​

Gateway responsive​

Gateway cluster awareness​

Gateway partition leader awareness​

Disk space​

Memory​

Startup probe

Liveness probe

Customizing health probes

Startup probe

Liveness probe

Gateway started

Gateway responsive

Gateway cluster awareness

Gateway partition leader awareness

Disk space

Memory