Gateway health probes
The health status for a standalone gateway is available at {zeebe-gateway}:8080/actuator/health
The following health indicators are enabled by default
- Gateway Started - checks whether the gateway is running (i.e. not currently starting and not yet shut down)
- Gateway Responsive - checks whether the gateway can handle a request within a given timeout
- Gateway Cluster Awareness - checks whether the gateway is aware of other nodes in the cluster
- Gateway Partition Leader Awareness - checks whether the gateway is aware of partition leaders in the cluster
- Disk Space - checks that the free disk space is greater than 10 MB
- Memory - checks that at least 10% of max memory (heap) are still available
Health indicators are set to sensible defaults. For specific use cases, it might be necessary to customize health probes.
Startup probe​
The started probe is available at {zeebe-gateway}:8080/actuator/health/startup
In the default configuration this is merely an alias for the Gateway Started health indicator. Other configurations are possible (see below).
Liveness probe​
The liveness probe is available at {zeebe-gateway}:8080/actuator/health/liveness
It is based on the health indicators mentioned above.
In the default configuration, the liveness probe is comprised of the following health indiactors:
- Gateway Started - checks whether the gateway is running (i.e. not currently starting and not yet shut down)
- Liveness Gateway Responsive - checks whether the gateway can handle a request within an ample timeout, but will only report a
DOWN
health status after the underlying health indicator is down for more than 10 minutes - Liveness Gateway Cluster Awareness - based on gateway cluster awareness, but will only report a
DOWN
health status after the underlying health indicator is down for more than 5 minutes - Liveness Gateway Partition Leader Awareness - based on gateway partition leader awareness, but will only report a
DOWN
health status after the underlying health indicator is down for more than 5 minutes - Liveness Disk Space - checks that the free disk space is greater than 1 MB
- Liveness Memory - checks that at least 1% of max memory (heap) are still available
Note that health indicators with the liveness prefix are intended to be customized for the liveness probe. This allows defining tighter thresholds (e.g. for free memory 1% for liveness vs. 10% for health), as well as adding tolerance for short downtimes (e.g. gateway has no awareness of other nodes in the cluster for more than 5 minutes).
Customizing health probes​
Global settings for all health indicators:
management.health.defaults.enabled=true
- enables (default) or disables all health indicatorsmanagement.endpoint.health.show-details=always/never
- toggles whether a summary or details (default) of the health indicators will be returned
Startup probe​
Settings for started probe:
management.endpoint.health.group.startup.show-details=never
- toggles whether a summary (default) or details of the startup probe will be returnedmanagement.endpoint.health.group.startup.include=gatewayStarted
- defines which health indicators are included in the startup probe
Liveness probe​
Settings for liveness probe:
management.endpoint.health.group.liveness.show-details=never
- toggles whether a summary (default) or details of the liveness probe will be returnedmanagement.endpoint.health.group.liveness.include=gatewayStarted,livenessGatewayResponsive,livenessGatewayClusterAwareness,livenessGatewayPartitionLeaderAwareness,livenessDiskSpace,livenessMemory
- defines which health indicators are included in the liveness probe
Note that the individual contributing health indicators of the liveness probe can be configured as well (see below).
Gateway started​
Settings for gateway started health indicator:
management.health.gateway-started.enabled=true
- enables (default) or disables this health indicator
Gateway responsive​
Settings for gateway responsiveness health indicator:
management.health.gateway-responsive.enabled=true
- enables (default) or disables this health indicatormanagement.health.gateway-responsive.requestTimeout=500ms
- defines the timeout for the request; if the test completes before the timeout, the health status isUP
, otherwise it isDOWN
management.health.liveness.gateway-responsive.requestTimeout=5s
- defines the timeout for the request for liveness probe; if the request completes before the timeout, the health status isUP
management.health.liveness.gateway-responsive.maxdowntime=10m
- - defines the maximum downtime before the liveness health indicator for responsiveness will flip
Gateway cluster awareness​
Settings for gateway cluster awareness health indicator:
management.health.gateway-clusterawareness.enabled=true
- enables (default) or disables this health indicator (and its liveness counterpart)management.health.liveness.gateway-clusterawareness.maxdowntime=5m
- defines the maximum downtime before the liveness health indicator for cluster awareness will flip. In other words: this health indicator will reportDOWN
after the gateway was unaware of other members in the cluster for more than 5 minutes
Gateway partition leader awareness​
Settings for gateway partition leader awareness health indicator:
management.health.gateway-partitionleaderawareness.enabled=true
- enables (default) or disables this health indicator (and its liveness counterpart)management.health.liveness.gateway-partitionleaderawareness.maxdowntime=5m
- defines the maximum downtime before the liveness health indicator for partition leader awareness will flip. In other words: this health indicator will reportDOWN
after the gateway was unaware of partition leaders for more than 5 minutes
Disk space​
This is arguably the least critical health indicator given that the standalone gateway does not write to disk. The only exception may be the writing of log files, which depend on the log configuration.
Settings for disk space health indicator:
management.health.diskspace.enabled=true
- enables (default) or disables this health indicator (and its liveness counterpart)management.health.diskspace.threshold=10MB
- defines the threshold for the required free disk spacemanagement.health.diskspace.path=.
- defines the path for which the free disk space is examinedmanagement.health.liveness.diskspace.threshold=1MB
- defines the threshold for the required free disk space for livenessmanagement.health.liveness.diskspace.path=.
- defines the path for which the free disk space for liveness is examined
Memory​
This health indicator examines free memory (heap).
Settings for memory health indicator:
management.health.memory.enabled=true
- enables (default) or disables this health indicator (and its liveness counterpart)management.health.memory.threshold=0.1
- defines the threshold for the required free memory. The default is 0.1 which is interpreted as 10% of max memorymanagement.health.liveness.memory.threshold=0.01
- defines the threshold for the required free memory for liveness. The default is 0.01 which is interpreted as 10 of max memory