Sizing your environment
In order to define and size your environment for Camunda 8 appropriately, you need to understand the factors that influence hardware requirements. Then you can apply this knowledge to select the appropriate Camunda 8 SaaS hardware package or size your self-managed Kubernetes cluster.
This best practice targets Camunda 8 only! If you are looking at Camunda 7, please visit Sizing your Camunda 7 environment.
Understanding influencing factors​
Let's understand the important numbers.
Throughput​
Throughput defines, how many process instances can be executed in a certain timeframe.
It is typically easy to estimate the number of process instances per day you need to execute. If you only know the number of process instances per year, we recommend to divide this number by the 250 (average number of working days in a year).
But the hardware sizing depends more on the number of BPMN tasks in a process model. For example, you will have a much higher throughput for processes with one service task than for processes with 30 service tasks.
If you already know your future process model, you can use this to count the number of tasks for your process. For example, the following onboarding process contains five service tasks in a typical execution.
If you don't yet know the number of service tasks, we recommend to assume 10 service tasks as a rule of thumb.
The number of tasks per process allows you to calculate the required number of tasks per day (tasks/day) which can also be converted into tasks per second (tasks/s) (devide by 24 hours * 60 minutes * 60 seconds).
Example:
Indicator | Number | Calculation method | Comment |
---|---|---|---|
Onboarding instances per year | 5,000,000 | Business input | |
Process instances per business day | 20,000 | / 250 | average number of working days in a year |
Tasks per day | 100,000 | * 5 | Tasks in the process model as counted above |
Tasks per second | 1.16 | / (24*60*60) | Seconds per day |
In most cases, we define throughput per day, as this time frame is easier to understand. But in high-performance use cases you might need to define the throughput per second.
Peak loads​
In most scenarios, your load will be volatile and not constant. For example, your company might start 90% of their monthly process instances in the same day of the month. The ability to handle those peaks is the more crucial requirement and should drive your decision instead of looking at the average load.
In the above example, that one day with the peak load defines your overall throughput requirements.
Sometimes, looking at peaks might also mean, that you are not looking at all 24 hours of a day, but only 8 business hours, or probably the busiest 2 hours of a day, depending on your typical workload.
Latency and cycle time​
In some use cases, the cycle time of a process (or sometimes even the cycle time of single tasks) matter. For example, you want to provide a REST endpoint, that starts a process instance to calculate a score for a customer. This process needs to execute four service tasks, but the REST request should return a response synchronously, no later than 250 milliseconds after the request.
While the cycle time of service tasks depends very much on what you do in these tasks, the overhead of the workflow engine itself can be measured. In an experiment with Camunda 8 1.2.4, running all worker code in the same GCP zone as Camunda 8, we measured around 10ms processing time per process node and approximately 50 ms latency to process service tasks in remote workers. Hence, to execute 4 service tasks results in 240 ms workflow engine overhead.
The closer you push throughput to the limits, the more latency you will get. This is basically, because the different requests compete for hardware resources, especially disk write operations. As a consequence, whenever cycle time and latency matters to you, you should plan for hardware buffer to not utilize your cluster too much. This makes sure, your latency does not go up because of resource contention. A good rule of thumb is to multiply your average load by 20. This means, you cannot only accommodate unexpected peak loads, but also have more free resources on average, keeping latency down.
Indicator | Number | Calculation method | Comment |
---|---|---|---|
Onboarding instances per year | 5,000,000 | Business input, but irrelevant | |
Expected process instances on peak day | 150,000 | Business input | |
Process instances per second within business hours on peak day | 5.20 | / (8*60*60) | Only looking at seconds of the 8 business hours of a day |
Process instances per second including buffer | 104.16 | * 20 | Adding some buffer is recommended in critical high-performance or low-latency use cases |
Payload size​
Every process instance can hold a payload (known as process variables). The payload of all running process instances must be managed by the runtime workflow engine, and all data of running and ended process instances is also forwarded to Operate and Optimize.
The data you attach to a process instance (process variables) influences resource requirements. For example, it makes a big difference if you only add one or two strings (requiring around 1 KB of space) to your process instances, or a full JSON document containing 1 MB. Hence, the payload size is an important factor when looking at sizing.
There are a few general rules regarding payload size:
- The maximum variable size per process instance is limited, currently to roughly 3 MB.
- We don't recommend storing much data in your process context. See our best practice on handling data in processes.
- Every partition of the Zeebe installation can typically handle up to 1 GB of payload in total. Larger payloads can lead to slower processing. For example, if you run one million process instances with 4 KB of data each, you end up with 3.9 GB of data, and you should run at least four partitions. In reality, this typically means six partitions, as you want to run the number of partitions as a multiple of the replication factor, which by default is three.
The payload size also affects disk space requirements, as described in the next section.
Disk space​
The workflow engine itself will store data along every process instance, especially to keep the current state persistent. This is unavoidable. In case there are human tasks, data is also sent to Tasklist and kept there, until tasks are completed.
Furthermore, data is also sent Operate and Optimize, which store data in Elasticsearch. These tools keep historical audit data for some time. The total amount of disk space can be reduced by using data retention settings. We typically delete data in Operate after 30 to 90 days, but keep it in Optimize for a longer period of time to allow more analysis. A good rule of thumb is something between 6 and 18 months.
Elasticsearch needs enough memory available to load a large amount of this data into memory.
Assuming a typical payload of 15 process variables (simple strings, numbers or booleans) we measured the following approximations for disk space requirements using Camunda 8 SaaS 1.2.4. Please note, that these are not exact numbers, but they might give you an idea what to expect:
- Zeebe: 75 kb / PI
- Operate: 57 kb / PI
- Optimize: 21 kb / PI
- Tasklist: 21 kb / PI
- Sum: 174 kb / PI
Using your throughput and retention settings, you can now calculate the required disk space for your scenario. Example:
Indicator | Calculation method | Value | Comments |
---|---|---|---|
Process instances per day | 20,000 | ||
Runtime | |||
Typical process cycle time | * 5 days | 100,000 | How long is a process instance typically active? Determines the number of active process instances |
Disk space for Zeebe | * 75 kib | 7.15 GiB | (Converted into GB by / 1024 / 1024) |
Disk space for Tasklist | * 21 kib | 0.67 GiB | |
Operate | |||
PI in retention time | * 30 day | 600,000 | |
Disk space | * 57 kib | 32.62 GiB | |
Optimize | |||
PI in retention time | * 6 months | 3,600,000 | |
Disk space | * 21 kib | 72.10 GiB | |
Sum | 113.87 GiB |
Understanding sizing and scalability behavior​
Spinning up a Camunda 8 Cluster means you run multiple components that all need resources in the background, like the Zeebe broker, Elasticsearch (as the database for Operate, Tasklist, and Optimize), Operate, Tasklist, and Optimize. All those components need to be equiped with resources.
All components are clustered to provide high-availability, fault-tolerance and resiliency.
Zeebe scales horizontally by adding more cluster nodes (pods). This is limited by the number of partitions configured for a Zeebe cluster, as the work within one partition cannot be parallelized by design. Hence, you need to define enough partitions to utilize your hardware. The number of partitions cannot be changed after the cluster was initially provisioned (at least not yet), elastic scalability of partitions is not yet possible.
If you anticipate the load increasing over time, prepare by configuring more partitions than you currently need as a buffer. For example, you could multiply the number of partitions you need for your current load by four to add a buffer. This typically has just a small impact on performance.
Camunda 8 runs on Kubernetes. Every component is operated as a so-called pod, that gets resources assigned. These resources can be vertically scaled (=get more or less hardware resources assigned dynamically) within certain limits. Note that vertically scaling not always results in more throughput, as the various components have dependencies on each other. This is a complex topic and requires running experiments with benchmarks. In general, we recommend to start with the minimalistic hardware package as described below. If you have further requirements, you use this as a starting point to increase resources.
Note that Camunda licensing does not depend on the provisioned hardware resources, making it easy to size according to your needs.
Sizing your runtime environment​
First, calculate your requirements using the information provided above, taking the example calculations from above:
- Throughput: 20,000 process instances / day
- Disk space: 114 GB
Now you can select a hardware package that can cover these requirements. In this example this fits well into a cluster of size S.
Camunda 8 SaaS​
Camunda 8 defines three fixed hardware packages you can select from. The table below gives you an indication what requirements you can fulfill with these. If your requirements are above the mentioned numbers, please contact us to discuss a customized sizing.
* | S | M | L |
---|---|---|---|
Max Throughput Tasks/day | 5.9 M | 23 M | 43 M |
Max Throughput Tasks/second | 65 | 270 | 500 |
Max Throughput Process Instances/day | 0.5 M | 2.3 M | 4.3 M |
Max Total Number of Process Instances stored (in Elasticsearch in total) | 100 k | 5.4 M | 15 M |
Approx resources provisioned ** | 15 vCPU, 20 GB mem, 640 GB disk | 28 vCPU, 50 GB mem, 640 GB disk | 56 vCPU, 85 GB mem, 1320 GB disk |
* The numbers in the table where measured using Camunda 8 (version 8.0) and the benchmark project. It uses a ten task process. To calculate day-based metrics, an equal distribution over 24 hours is assumed.
** These are the resource limits configured in the Kubernetes cluster and are always subject to change.
You might wonder why the total number of process instances stored is that low. This is related to limited resources provided to Elasticsearch, yielding performance problems with too much data stored there. By increasing the available memory to Elasticsearch you can also increase that number. At the same time, even with this rather low number, you can always guarantee the throughput of the core workflow engine during peak loads, as this performance is not influenced. Also, you can always increase memory for Elasticsearch later on if it is required.
Camunda 8 self-managed​
Provisioning Camunda 8 onto your self-managed Kubernetes cluster might depend on various factors. For example, most customers already have own teams providing Elasticsearch for them as a service. However, the following example shows a possible configuration which is close to a cluster of size S in Camunda 8 SaaS, which can serve as a starting point for your own sizing. Such a cluster can serve 500,000 process instances per day and store up to 100,000 process instances in Elasticsearch (in-flight and history).
request | limit | ||
---|---|---|---|
Zeebe | |||
# brokers | 3 | ||
# partitions | 3 | ||
replication factor | 3 | ||
vCPU [cores] | 0.8 | 0.96 | |
Mem [GB] | 2 | 4 | |
Disk [GB] | 32 | 192 | |
gateway | embedded in broker | ||
Operate | |||
#importer | 1 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.2 | 1 | |
#webapp | 2 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.2 | 1 | |
Tasklist | |||
#importer | 1 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.2 | 1 | |
#webapp | 2 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.2 | 2 | |
Optimize | |||
#importer | 1 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.4 | 1 | |
#webapp | 2 | ||
vCPU [cores] | 0.3 | 1 | |
Mem [GB] limit | 0.4 | 1 | |
Elastic | |||
#statefulset | 1 | ||
vCPU [cores] | 1 | 2 | |
Mem [GB] limit | 3 | 6 | |
Disk [GB] request | 64 | 100 | |
Connectors | |||
# | 1 | ||
vCPU [cores] | 0.2 | 0.4 | |
Mem [GB] limit | 0.25 | 0.5 | |
Other (Worker, Analytics, ...) | |||
# | 1 | ||
vCPU [cores] | 0.4 | 0.4 | |
Mem [GB] limit | 0.45 | 0.45 |
Planning non-production environments​
All clusters can be used for development, testing, integration, Q&A, and production. In Camunda 8 SaaS, production and test environments are organized via separate organizations within Camunda 8 to ease the management of clusters, while also minimizing the risk to accidentally accessing a production cluster.
Note that functional unit tests that are written in Java and use zeebe-process-test, will use an in-memory broker in unit tests, so no development cluster is needed for this use case.
For typical integration or functional test environments, you can normally just deploy a small cluster, like the one shown above, even if your production environment is sized bigger. This is typically sufficient, as functional tests typically run much smaller workloads.
Load or performance tests ideally run on the same sizing configuration as your production instance to yield reliable results.
A typical customer set-up consists of:
- 1 Production cluster
- 1 Integration or pre-prod cluster (equal in size to your anticipated production cluster if you want to run load tests or benchmarks)
- 1 Test cluster
- Multiple developer clusters
Ideally, every active developer runs its own cluster, so that the workflow engine does not need to be shared amongst developers. Otherwise, clusters are not isolated, which can lead to errors if for example developer A deploys a new version of the same process as developer B. Typically, developer clusters can be deleted when they are no longer used, as no data needs to be kept, so you might not need one cluster per developer that works with Camunda 8 at some point in time. And using in-memory unit tests further reduces the contention on developer clusters.
However, some customers do share a Camunda 8 cluster amongst various developers for economic reasons. This can work well if everybody is aware of the problems that can arise.
Running experiments and benchmarks​
If you are in doubt about which package to choose, you can do a load test with a representative workload with the target hardware package. This will help you decide if the specific package can serve your needs.
This is recommended if you exceed the above numbers of three million process instances per day.
Take a look at the Camunda 8 benchmark project as a starting point for your own benchmarks.