Upgrade Zeebe
This section describes how to upgrade Zeebe to a new version.
Currently, we are facing an issue that can corrupt the data when upgrading to a new version. The issue affects the reprocessing (i.e. rehydrating the data from the records on the log stream) and can be omitted by restoring the data from a snapshot. Please follow the recommended procedure to minimize the risk of losing data. This issue affects only users upgrading from a version lower than 0.24.4 to 0.24.4 or newer.
Rolling upgrade​
Zeebe is designed to allow a rolling upgrade of a cluster. The brokers can be upgrade one after the other. The other brokers in the cluster continue processing until the whole upgrade is done.
- Upgrade the first broker and wait until it is ready again
- Continue with the next broker until all brokers are upgraded
- Upgrade the standalone gateways
If you are using the Helm charts, simply update your values file and change the image tag to the new version you wish to upgrade to, then follow the Helm upgrade guide.
If you are upgrading from a Zeebe version lower than 0.24.4, it is not recommended to perform a rolling upgrade. Please follow the recommended upgrade procedure instead.
Upgrade procedure for Zeebe < 0.24.4​
The following procedure describes how to upgrade a Zeebe broker pre 0.24.4. If the cluster contains multiple brokers then these steps can be done for all brokers in parallel. Standalone gateways should be upgraded after all brokers in the cluster are upgraded to avoid mismatches in the protocol version.
This procedure results in a downtime of the whole cluster.
Experimental: Detect reprocessing inconsistency​
With Zeebe 0.24.5 and 0.25.1 a new exterimental feature was introduced which detects inconsistency of the logstream on upgrade to mitigate the following issue.
We recommend to enable it after upgrading Zeebe from a version lower than 0.24.4 to a version greater than or equal to 0.24.4 on the first run after the upgrade, as described in the update proceedure. You can enable it using the following environment variable:
ZEEBE_BROKER_EXPERIMENTAL_DETECTREPROCESSINGINCONSISTENCY="true"
After you verified that the upgrade was successful, we recommend to disable it again by removing the environment variable and restarting your brokers.
Preparing the upgrade​
- Stop the workflow processing
- Close all job workers
- Interrupt the incoming connections to avoid user commands
- Wait until a snapshot is created for all partitions
- By default, a snapshot is created every 15 minutes
- Verify that a snapshot is created by looking at the Metric
zeebe_snapshot_count
on the leader and the followers - Note that no snapshot is created if no processing happened since the last snapshot
- Make a backup of the
data
folder
Performing the upgrade​
- With inconsistency detection
- Without inconsistency detection
- Shut down the broker
- Replace the
/bin
and/lib
folders with the versions of the new distribution - Start up the broker with the experimental inconsistency detection enabled
- Verify the upgrade
- Restart the broker with experimental inconsistency detection disabled
- Shut down the broker
- Replace the
/bin
and/lib
folders with the versions of the new distribution - Start up the broker
- Verify the upgrade
Verifying the upgrade​
The upgrade is successful if the following conditions are met:
- the broker is ready (see Ready Check)
- the broker is healthy (see Health Check)
- all partitions are healthy (see the Metric
zeebe_health
) - the stream processors of the partition leaders are in the phase
PROCESSING
(see Partitions Admin Endpoint)
If the upgrade failed because of a known issue then a partition change its status to unhealthy, and the log output may contain the following error message:
Sample Upgrade Error Message
Unexpected error on recovery happens.
io.zeebe.engine.processor.InconsistentReprocessingException: Reprocessing issue detected!
Restore the data from a backup and follow the recommended upgrade procedure. [cause:
"The key of the record on the log stream doesn't match to the record from reprocessing.",
log-stream-record: {"partitionId":1,"value":{"version":1,"bpmnProcessId":"parallel-tasks",
"workflowKey":2251799813685249,"parentElementInstanceKey":-1,"parentWorkflowInstanceKey":-1,
"bpmnElementType":"PARALLEL_GATEWAY","flowScopeKey":2251799813685251,
"elementId":"ExclusiveGateway_0tkgnd5","workflowInstanceKey":2251799813685251},
"key":2251799813685256,"sourceRecordPosition":4294997784,"valueType":"WORKFLOW_INSTANCE",
"timestamp":1601025180728,"recordType":"EVENT","intent":"ELEMENT_ACTIVATING",
"rejectionType":"NULL_VAL","rejectionReason":"","position":4294998112},
reprocessing-record: {key=2251799813685255, sourceRecordPosition=4294997784,
intent=WorkflowInstanceIntent:ELEMENT_ACTIVATING, recordType=EVENT}]
In this case, the broker should be rolled back to the previous version and the backup should be restored. Ensure that the upgrade was prepared correctly. If it is still unclear why it was not successful then please contact the Zeebe team and ask for guidance.
Partitions admin endpoint​
This endpoint allows querying the status of the partitions and performing operations to prepare an upgrade.
- In version 0.23
- In version >= 0.24
The endpoint is available under http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/partitions
(default port: 9600
).
It is enabled by default and cannot be disabled.
The endpoint is available under http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/actuator/partitions
(default port: 9600
).
It is enabled by default. It can be disabled in the configuration by setting:
management.endpoint.partitions.enabled=false
Query the partition status​
The status of the partitions can be queried by a GET
request:
/actuator/partitions
The response contains all partitions of the broker mapped to the partition-id.
Full Response
{
"1":{
"role":"LEADER",
"snapshotId":"399-1-1601275126554-490-490",
"processedPosition":490,
"processedPositionInSnapshot":490,
"streamProcessorPhase":"PROCESSING"
}
}