Jothi Viswanathan, Author at Camunda

The Benefits of Using Camunda Compared to Traditional IT Solutions

Jothi Viswanathan — Fri, 30 May 2025 17:38:12 +0000

At Camunda, we frequently have conversations with a lot of organizations that are running their business processes on traditional IT solutions. Their business process logic and business rules driving pivotal points in the process flow are buried deep in .NET, Java, or Python programs, or in database elements like stored procedures. While this works, there’s typically a lot of friction when it comes to making processes visible and understandable across the business, as well as considerable time-consuming effort when changes need to be made.

Camunda is designed to make these and many other challenges far easier when it comes to business process orchestration and automation, not to mention adding numerous features you may never have thought of. Let’s explore how a Camunda-based solution adds value to an organization, accelerates solution development, increases agility, and helps in reducing the TCO (total cost of ownership) and shortening the TTM (time to market) for companies using it over a homegrown solution.

Improved business-IT collaboration

With Camunda, business teams get end-to-end visibility and control over the business process flow. What you see is what runs in the system!

Compare the BPMN flow below with the same logic living in a verbose document or a piece of code—the visual graphical model speaks for itself in terms of intuitiveness and clarity. With Camunda, the same BPMN model is used to design, implement, operate the solution and create business user reports. This gives all stakeholders visibility into what logic exactly runs inside the system.

The need to keep the documentation updated as the project evolves is obsolete. The BPMN is live documentation as well.

Reduced development effort

Why reinvent the wheel? Camunda helps you reduce development effort and focus only on implementing your use case specific business logic.

BPMN provides out-of-the-box support for complex logic, like asynchronous waits on an incoming message, timer-based action triggers, clearly defined compensation activities to implement rollbacks in a distributed transaction, automated error handling with an option for human intervention, and open standards based business rules with DMN;. all of this with the powerful scalability that Camunda provides.

Programming all of these on your own is redundant when Camunda does this out of the box. Leverage Camunda accelerators like connectors and reusable assets like blueprints.

Calling a REST API, SOAP service, Lambda function, RPA system, SAP, CRM, Database, or any of the other cloud services? We have you covered! Use the out-of-the-box connectors and accelerate your efforts! Here’s a sample of how easy it is to make a REST call:

Leverage the greatness of AI without additional integration effort—use the AI connectors to bring in intelligent process flow execution with Camunda. AI becomes just another endpoint to be orchestrated rather than investing development effort to establish this integration.

Leverage our AI-based BPMN Copilot to convert your verbose use case documentation or existing programs with business logic to create BPMN. Save time and accelerate your implementation!

Create UI forms in a low-code fashion—make use of Camunda forms to quickly create UI forms using the drag-drop Camunda forms editor. Nontechnical business users can quickly create UI forms themselves, and for the cherry on top? You can create forms in Camunda and display them in your custom front end applications!

Gain end-to-end visibility across the process model

Camunda enables true end-to-end visibility across your whole process model. For example:

Diagnostics and troubleshooting get easier since you can exactly pinpoint the status of the workflow or the point of failure without having to devour log files from multiple systems, especially in case of distributed systems or microservice architectures.
This enhances the efficiency of the operations and support teams, which need fewer FTE, freeing up IT teams to focus on innovation.

Check out the screenshot below of a sample Operate dashboard demonstrating the data variables and point of failure.

Business reports and insights out of the box

With Optimize, Camunda helps you generate detailed reports automatically. We provide the ability for business users to create self-service reports that are near real time using a wizard-based approach without any query languages.

Use Optimize’s wizard-based approach to:

Define your business KPIs and business audit reports
Identify opportunities to improve the business process flow
Create datasets that are ML-ready for further analysis
Set up alerts in case of KPI accomplishments/SLA breaches and so on

Take a look at the sample Optimize dashboard below:

Increased operational agility

Changes to your business flows become much clearer and faster when you use Camunda. For example:

Moving the business logic into BPMN and DMN cuts down on a lot of complexity, and maintenance over the longer term becomes easier.
With traditional systems, you could need a daunting amount of time to analyze the impact of the change in the code base and implement the code changes while keeping the code base maintainable.
With the business process depicted in the graphical BPMN notation, the impact analysis is short. Implementing the change could range from simply reorganizing the tasks in the BPMN to adding or removing tasks from the BPMN to modifying your service-task or custom connector implementations to changing the rules in DMN tables. All much easier and more straightforward.

Agile and IT teams can quickly respond to change requirements to meet business needs, ensuring that your solution can remain compliant to ever-changing regulatory compliance requirements. This of course also enables your business to remain competitive in the market.

Disaster Recovery

Zeebe has built-in active-active replication, which means you have Disaster Recovery (DR) out of the box. This lays the foundation for a resilient and highly available stage engine. Additionally we also support multi-region setups, enabling active-active or active-passive modes of Disaster Recovery across different geographical regions to ensure resilience for mission critical applications.

Innovate at any scale

Since there is no database backing its state, Camunda can scale almost linearly to meet your growing throughput requirements as your business grows! Need more throughput? It could be as simple as adding more nodes and scaling the Zeebe Cluster.

Our benchmarks have demonstrated that we can handle hundreds of process instances and thousands of task instances per second. Camunda can also handle both high throughput and low-latency use cases. Take a look at the following Grafana snapshot from a Camunda benchmark run using a BPMN process with 11 service tasks.

An open, composable, and flexible architecture

We use open standards for building the process models and implementing business rules. Camunda strives to provide low-code features to accelerate solutioning while also being developer-friendly to implement customised solutions.

By now, we all know that customisations to out-of-the-box features are often necessary. Camunda is flexible and composable, allowing you to replace out-of-the-box components with custom coding where needed. Design standardized solutions without vendor lock-in!

We’ll take care of process orchestration so you can focus on your business problems

Rather than building all these capabilities from scratch again, leverage our orchestration platform’s capabilities to help your IT teams focus on just solving your business problems. We’ll take care of orchestrating your business process flows end-to-end at scale!

The rest of the iceberg

What’s next? Camunda’s flexible, scalable, and intelligent orchestration and automation platform empowers you to create a composable and best-of-breed architecture. Operationalize AI to meet your current business requirements while staying comfortably poised to leverage other new technological disruptors as they become available in the future.

The platform provides a number of out-of-the-box components like RPA to integrate with legacy systems, Tasklist to enable human interaction, a DMN-based business rules engine, and Intelligent Document Processing.

Camunda is also flexible enough to be used in a plug-and-play fashion. Swap a certain OTB component with your specific solution. For instance, you could swap Camunda’s RPA with your existing RPA solution, or swap the Tasklist with your custom UI.

Similar to a building made of LEGO blocks, Camunda provides the flexibility to swap out the components that you want with other implementations to suit your needs. This lets you eliminate vendor lock-in and discover more value for your business!

Moving from a legacy solution to Camunda

Hopefully the value Camunda provides is easy to see, but of course it’s still a change if you’re already using something else. If you’re evaluating the effort to migrate from your programming-based solution to Camunda, be sure to:

Check out Camunda’s BPMN Copilot to convert the business process logic in your programming code like Python, Java, C# etc into BPMN quickly. This saves a huge chunk of time while migrating your business process flow to Camunda.
Consider that you can refactor business-specific logic into service tasks or project/enterprise-specific connectors, promoting standardisation and reuse.
Talk to us! We’re happy to explore the possibilities and benefits for your specific use case!

The post The Benefits of Using Camunda Compared to Traditional IT Solutions appeared first on Camunda.

Performance Tuning in Camunda 8

Jothi Viswanathan — Sat, 04 Jan 2025 03:33:51 +0000

Performance tuning an end-to-end solution can be tricky—without clarity on what you’re fine-tuning for, it can go on and on as an infinite exercise. Before you begin, identify clear goals on the required performance targets.

It can be helpful to have clear goals on metrics like:

PI creation/sec
Task completion/sec
Latency requirements for job workers
PI completion latencies

Not every one of these metrics will be significant for every use case, but it’s definitely important to have clarity on what the requirements are for your use case in particular.

Prerequisites

But before we get much further in this guide, there are a few concepts you need to make sure you understand first:

Fundamentals of Kubernetes
Specific Camunda 8 concepts:

It is, of course, also a prerequisite to have some tooling in place to monitor the various metrics that the Camunda platform already exposes. These metrics are key to observing and fine-tuning performance and are already visualized as part of our sample Grafana chart.

You can see an interactive example using Grafana here.

To help draw correlations between the various parts of your end-to-end solution, you’ll want to have a similar dashboard with metrics exposed from the Camunda platform as well as the code implementing the business logic.

Factors influencing performance

A number of factors influence your end-to-end solution’s performance, so let’s consider a few of them one by one.

Process model

The number of task instances to be executed per second influences your hardware requirements. The higher the number of TI/sec to be executed, the more the hardware you need.

The TI/sec metric includes not only user tasks and service tasks, but also call activities. Call activities also incur significant processing since they need the creation of another PI. Therefore, you have to count them when you’re estimating the TI/sec.

For example, a process model with five service tasks and five call activities will need more resources than a process model with just five service tasks. Similarly, a process model with 20 TIs (10 service tasks and 10 call activities) will have different resource consumption from another process model with 20 TIs ( 18 service tasks and two call activities). That can be significant, especially in high-throughput scenarios.

It’s important to run your benchmarks with a process model that is more or less close to your production workload to get an accurate estimate of your hardware requirements.

Message subscriptions

When designing your message subscriptions, keep in mind that all messages with the same correlation key route to the same partition. Too many messages (thousands!) with the same correlation key will overwhelm one specific partition. Instead, ensure that correlation keys are unique per PI.

For example, when a specific PI has multiple message correlations, having a message start event with the same correlation key for both the message that creates the process instance and all subsequent message catch events will avoid interpartition communication. This can be a performance booster. Otherwise, messages are forwarded from the partition receiving the message to the partition where the process instance exists.

Note that this could change beyond Camunda 8.6 if the logic to forward messages to the partitions changes.

You can gain the highest performance from messages that are not buffered; that is, messages with TTL=0. The buffered messages must expire at some point, and this takes resources from the engine.

If message buffering is required, you can tweak the experimental ttlCheckerBatchLimit and ttlCheckerInterval values and test the performance differences.

Multi-instance call activities

If all the child instances are created on the same partition as the root PI, this creates issues. If the cardinality of the multi-instance is very high (we’ve had experience with a scenario with ~8k as the cardinality), then a specific partition is overloaded and ends up with backpressure.

If this is an occasional occurrence, the system would slow down and eventually catch up with the processing. Still, with such use cases, it is recommended to run benchmarks and size your environment using an empirical approach.

As an alternative, consider sending out multiple messages that in turn trigger PIs, which are based on a message start event.

Here is an example from the Camunda community hub, demonstrating the approach using messages instead of call activities in a high cardinality multi-instance case.

Job workers

Note that JobTimeout must be longer than the average time that the worker code takes to complete execution.

The values of MaxJobsActive and JobTimeout must be set according to the time taken to execute the code in each worker. With a very short JobTimeout and higher MaxJobsActive, it is possible that jobs will get activated and be queued at the job worker client for a long time. Even before the worker gets to process the job, the job will timeout. This results in slowness in job completion.

There is no one size fits all recommendations for MaxJobsActive. You have to experiment with it and set it according to the dynamics in your environment, such as:

The actual time to run the worker code
The size of the data in relation to the memory allocated to the worker client pod
Job timeout

We recommend running a few benchmarks and identifying the optimal values using an empirical approach.

The longer the jobs take to complete, the more Zeebe resources are required. For instance, running the same test case, with a 20ms delay at the job workers. consumes less hardware at Zeebe as compared to a 3000ms delay at the job workers, with no other parameter change. So when mocking your APIs in tests, ensure that your performance tests have an equivalent delay in the mocked workers.

For high-performance use cases, Camunda’s guide on writing good workers can be helpful. For example, implementing your workers in a non-blocking way would help with high performance use cases.

In many cases, it’s a good idea to have metrics around each worker’s implementation. While the Java client provides a counter tracking the count of jobs activated and jobs handled, you could gather further metrics from individual clients (e.g., execution duration and other use-case-specific metrics). In the case of latency sensitive or high-throughput use cases, you could use these metrics for troubleshooting and evaluating performance.

Check out Camunda’s documentation for more insight.

To ensure that no critical transactions are lost in case of errors or backpressure when calling the Camunda APIs, you need a robust error handling and retry strategy. Spring Zeebe has this built in.

Check out a sample of Spring Zeebe’s implementation.

CPU type

The type of CPU used for the brokers influences the performance of Zeebe. So when running benchmarks or performance tests, perform them on the same CPU type as your production environment.

Our benchmarks have proven that the latest generation of CPUs deliver higher performance. For instance, we compared the performance of the same workload between N2D (second and thir d generation AMD) and C3D (fourth generation AMD) machines on GCP. The latter needed only 58% of CPUs to deliver the same throughput.

To compare cost benefits, run benchmarks and assess which types of machine are cost-effective and then choose the machine type accordingly. Also, assess whether running the load on a slower but cheaper CPU or faster but slightly more expensive CPU type has the better price performance for you.

We found that with on-demand VMs, the C3D type has a higher price performance. However, spot VM discounts on GCP for the N2D VMs are much higher; the cost benefit is nullified in case of spot VMs.

Disk type

Zeebe writes its state to disk, which means the disk speed influences the processing latency. We recommend high-performance SSD disks with an absolute minimum of 1,000 IOPS. Each infrastructure provider offers a variety of disk type options. Based on our observation, we’ve recommended storage classes for GCP, Azure, Openshift and AWS in our documentation.

However, for high-performance use cases, we recommend an empirical approach. Run benchmarks to compare the performance of the same test case with different disk types/IOPS variations to identify the optimal configuration.

Elasticsearch

The performance of Elasticsearch has an impact especially with high-throughput use cases. If the Zeebe records are not exported efficiently, then Zeebe’s logs grow bigger. While this is not a problem for Zeebe, this affects log compaction. If logs can’t be compacted over a long time, this affects performance.

You can see this in long-running benchmarks, where the performance drops after a point.

To define the number of shards per index, each component has environment variables you can leverage:

You’ll have to properly configure sharing for all Elasticsearch indices (Zeebe, Operate, Tasklist, and Optimize). This ensures that the Elasticsearch resources are efficiently utilized and that exporting is on par with processing in Zeebe.

To start with, have the number of shards equal to the number of Elasticsearch nodes or as multiples of them. There is no one-size-fits-all setup, unfortunately. You will need to identify the optimum number of shards based on the data size and the number of nodes in your Elasticsearch cluster.

The following Grafana screenshot showcases a few sample benchmark runs. They explore the effect of optimal sharding in Elasticsearch on the efficiency of exporting by varying only the number of shards for the same test configuration.

The number of records not exported is a useful metric to identify if Elasticsearch is slow. Benchmarks demonstrate that when ensuring the exporters export efficiently, the process instance execution time improves twofold.

Let’s take a look at another Grafana screenshot. This one demonstrates that PI execution time on the ninety-ninth percentile goes up with slower exporting. This could have a significant impact on time-sensitive processes.

By default, Elasticsearch heap size allocation is a percentage of the available RAM. So consider setting a heap size explicitly or allocating more RAM to Elasticsearch when you observe slowness in ES.

For more insight into fine-tuning heap size configuration, check out Elasticsearch’s best practices.

How fast your data can be written to and read from the disk can affect how quickly Elasticsearch can index new documents and return query results. Therefore, the disk IOPS and disk space utilization both influence the performance of Elasticsearch. Our benchmarks show this clearly.

In the following screenshot, you can see that the same test case was repeated with different disk sizes. In this case, a 1 TB disk had more IOPS, and Elasticsearch’s performance had a significant difference, although the 64 GB disk size was sufficient to hold the data for that short load test.

High-performance use case

To see this in action, let’s experiment a bit and find the optimal value for these parameters.

Raft Flush

Our benchmarks demonstrate that the commit and write latencies have a significant drop when the raft flush is delayed or disabled. However, completely turning off the flush may not be suitable for all cases. Only disable explicit flushing if you are willing to accept potential data loss at the expense of performance.

Before disabling it, try the delayed options. These provide a trade-off between safety and performance.

Here are some environment variables you can use to modify raft flush settings:

ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_ENABLED. Totally turns off raft flush. Again, this is not recommended due to the risk of data loss.
ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_DELAYTIME. You can test an interval < 30s.

Broker data LogSegmentSize

By default, the size of data log segment files is 128MB. Our benchmarks have demonstrated that having smaller log segment sizes than the default helps decrease the latency significantly.

You can use the following environment variable to define the log segment size: ZEEBE_BROKER_DATA_LOGSEGMENTSIZE.

Experiment with smaller log segment sizes for better performance.

CPUs per partition replica

Based on our benchmark observations, a ratio of the number of partition replicas per CPU in the range of 1.2–1.5 has delivered optimal performance. To understand this, consider the following example:

Number of vCPUs allocated to the brokers: 7
Number of brokers: 14
Number of partitions: 42
Replication factor: 3

Total number of partitions = Number of partitions x Replication factor = 42 x 3 = 126

Total number of CPUs = 7 x 14 = 98

Partition replica per CPU = Total number of partitions / Total number of CPUs = 126 / 98 = 1.286

This number is not set in stone; our benchmarks were based on N2D (second/third generation AMD) machines. You can use these suggestions as a starting point. Experiment further in test benchmark runs to find the optimal settings for your environment.

ioThreadCount and cpuThreadCount

Since there cannot be one-size-fits-all values for these parameters, experiment with them.

It’s advisable to have at least one CPU thread per processing (leader) partition on the broker. For instance, consider the following example:

Number of vCPUs per broker node: 7
Number of brokers: 14
Number of partitions: 42
Replication factor: 3

Total number of partitions = Number of partitions x Replication factor = 42 x 3 = 126

Total partitions per broker = 126 / 14 = 9

Total number of leaders per broker = 42 / 14 = 3

So each broker could have three processing partitions that need at least one CPU thread. The other six replicated partitions also need some CPU space, so a cpuThreadCount of four or five makes sense. Then you can run benchmarks to identify the optimal value for this setting.

Refer to our benchmark test case template to explore other parameters for high-performance use cases.

Troubleshooting throughput issues

To get PI completion metrics in Grafana, set the following environment variable to `true`: ZEEBE_BROKER_EXECUTION_METRICS_EXPORTER_ENABLED.

If you’re observing backpressure, look at the latency metrics and check the disk speed for Zeebe. For example, with AWS, gp3 volumes provide a predictable, baseline performance of 3000 IOPS regardless of the volume size. In the case of performance, SSD persistent disk volumes deliver a consistent baseline IOPS performance, but it varies based on volume size.

To ensure that the required IOPS is delivered, consider allocating higher disk space, in the case of disks whose performance varies with the volume size. This applies to Zeebe as well as Elasticsearch disks.

With slower disks, the commit latency and record-write latency would be higher. Optimal values as observed from our benchmarks are:

Commit latency: ~400ms
Record-write latency :~25ms

While there are other factors that influence these latencies, it’s critical to have these latencies in the optimal ranges for Zeebe disk speed.

If disk speed is not an issue, then consider modifying the raft flush delay and log segment size.

How is CPU utilization? Is the CPU throttled? If all the allotted CPU is not optimally used but there is backpressure, that could indicate that the hardware is not fully utilized. There is more load to be processed. Consider adding a few more partitions to the node.

On the other hand, if there is CPU throttling and if the CPU allotted is fairly well used, then consider scaling up the nodes. This will help redistribute the partitions to the newly added nodes and provide more hardware resources for the partitions to consume.

Ensure that there is no throttling by allocating sufficient CPU. This will yield better performance and sometimes even reduce CPU usage. When the brokers are CPU throttled, it can lead to more load for the system in general (due to timeouts, yields, etc.), which ends up requiring more CPU that in turn causes more throttling.

If you’re not hitting the required throughput, find out if the gateway isn’t catching up. Does it have sufficient resources? We have observed in some cases that the gateway could deliver better performance if more resources are allocated. If there is no indication of issues at the engine level, you could consider increasing gateway resources—that may deliver better performance.

The Zeebe Gateway uses one thread by default. However, if the gateway doesn’t exhaust its available resources while also not keeping up with the load, set this to a higher number. Use the property zeebe.gateway.threads. The corresponding environment variable is: ZEEBE_GATEWAY_THREADS_MANAGEMENTTHREADS.

Consider checking all the proxies or load balancers in your infrastructure for limitations or default short timeouts. We had a similar situation where there was an NGINX ingress controller limiting the number of connections. This affected the connectivity from the client pod to the gateway. We observed a lot of intermittent deadline-exceeded exceptions on the client, with clear indication that the client is unable to establish a connection to the gateway.

For instance, in the following error message from a Java Zeebe client, each UNAVAILABLE entry in the closed brackets refers to an unsuccessful connection attempt. The buffered_nanos property within the open brackets refers to the current number of nanoseconds that a connection attempt is in progress to the destination (mentioned in the remote_addr property). In this case, this is an indication of the client not being able to connect to the gateway.

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999995499s. Name resolution delay 0.000000000 seconds. [closed=[UNAVAILABLE, UNAVAILABLE, UNAVAILABLE, UNAVAILABLE, UNAVAILABLE], open=[[buffered_nanos=446461400, buffered_nanos=736575000, remote_addr=zeebe-gateway-ingress-url:443]]]

To understand why it’s important to ensure that the job stream is not closed unexpectedly when job streaming is enabled, check out our documentation.

Is your job completion not on par with job creation? Check the JobLifeTime metric. Is it more than the actual time taken to run the worker code on average?

Check whether the Camunda client application has sufficient resources (CPU, memory, etc). Is there client-side backpressure? In other words, are the job workers sending a blocked message to the gateway when it tries to stream messages? If yes, then consider scaling your client applications.

Isolate the performance of Zeebe with the same BPMN/DMN but with dummy/mock workers that have a completion delay matching the time it takes for the actual worker code to run. Always run this test to ensure that your cluster sizing is appropriate for the load.

To identify the performance bottlenecks in the Zeebe client implementation, isolate the dependencies in the job workers. For example, if there are connections to DB or other messaging systems, check whether the drop-in performance is due to throttling on those endpoints.

Conclusion

We could go on and on about performance tuning with Camunda 8, but to avoid an “infinite exercise” as we first mentioned, this guide will close here. We hope it provides you with sufficient tips and metrics for fine-tuning your end-to-end solution with Camunda.

Don’t forget to check out the many resources mentioned throughout the article. For easy reference, please find them here:

A n interactive example using Grafana
Camunda community hub example, using messages instead of call activities in a high-cardinality multi-instance case
Camunda’s job workers documentation
A sample Spring Zeebe implementation
Elasticsearch’s best practices for heap size configuration
Camunda’s benchmark test case template
Camunda’s job streaming documentation

The post Performance Tuning in Camunda 8 appeared first on Camunda.