Queue Processors – How to avoid Queue Processor Issues – 1

You are in serious trouble if you see below notification or error!

Why we say it’s a serious problem because even though you see the processors in Running status but they might not be running at background, especially the processors which are based on ‘Stream’ node.

If someone says you remove the Kafka data folders and start the service which might work. If you do so, you are really in serious trouble in some cases.

We will explain why we have stated that way? There are three ways to look at the above problem. As we always say, there can be ‘N’ number of ways to solve a problem but we discuss on what are all the best optimal solutions here.

Always ask Why?

If someone suggested cleaning up the folder and do restart? Try to ask why we have to do that? If you do so, we will come to a basic question to understand on ‘What is the problem’?

What is the Problem?

You should not believe in solutions if you don’t know what is the problem right? Let’s see if you see the above problem, you can observe a File I/O issue (highlighted below)

kafka.log.Log.replaceSegments(Log.scala:1651) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:535) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:462) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:461) ~[kafka_2.11-1.1.0.4.jar:?] at scala.collection.immutable.List.foreach(List.scala:392) ~[scala-library-2.11.12.jar:?] at kafka.log.Cleaner.doClean(LogCleaner.scala:461) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner.clean(LogCleaner.scala:438) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:305) [kafka_2.11-1.1.0.4.jar:?] at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:291) [kafka_2.11-1.1.0.4.jar:?] at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) [kafka_2.11-1.1.0.4.jar:?] Suppressed: java.nio.file.FileSystemException: C:\PRPCPersonalEdition\tomcat\kafka-data\__consumer_offsets-0\00000000000000000000.log.cleaned -> C:\PRPCPersonalEdition\tomcat\kafka-data\__consumer_offsets-0\00000000000000000000.log.swap: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[?:1.8.0_121] at

Now you have key information on why a stream node fails to start up! Now can you think of removing the Kafka-data folder and restart?

We would say, we still need to analyze more information to decide. As it’s a file access issue, can we think of the instance restart?

Restart Instance

Restarting an instance may solve or may not which might depend on the processor threads & lockers. If that solves the problem then you are on the safer side.

Unlock or Folder Permissions

Stop the instance and try to remove file locks if exists or check on folder permissions to ensure there are no recent issues with those folders. Once you do unlock the file or are done with folder permissions then try to start the instance. This would solve the problem but not 100% guaranteed.

Kafka-Data Folder

If the above steps did not solve the problem then you are in serious issue because there might be a lot of unprocessed data (queued messages)

So, removing the folder will give you a serious problem where you will see data loss which can throw you to verify (We will publish another blog post on how to design in such scenario’s)

Single Node – If there is only one stream node then you will have to take a backup before you clean up the folder and restart as data might get discarded.

Cluster Node – If you remove when the node is in a cluster then your chance of data loss is very less as your master node does contain a replica.

So, we have discussed three ways to solve these serious problems. Here are their risk factors for each method.

Server Restart (Less Risk & Less Success Rate)
Unlock File Read Issues & Restart (Moderate Risk & Moderate Success Rate)
Remove Kafka Data Folder (High Risk & High Success Rate – Take backup to reduce the risk factor depends on the cluster information)

P.S – Please provide your thoughts or do correct if we have stated something wrong or share your experience on the similar issues in the comments section.

Think & Grow with PegaOOTB

Queue Processors – How to avoid Queue Processor Issues – 1

Leave a comment Cancel reply

I’m Kondal

Let’s connect

Join the fun!

Recent posts

Pega 25 – GenAI Connect: Your First Hands-On Integration (With a Real Use Case)

Agents Around Us: Unlocking the Power of Pega Agentic AI

Your Declare Expression Might NOT Be Running and That’s “By Design”

Business & Technical Insights – Pega Blueprint’s Support for BPMN 2.0

The Untold Benefits of Pega Constellation

Is Your Pega Application Safe from Data Attacks?