Queue Processors – How to avoid Queue Processor Issues – 1

You are in serious trouble if you see below notification or error!

Why we say it’s a serious problem because even though you see the processors in Running status but they might not be running at background, especially the processors which are based on ‘Stream’ node.

If someone says you remove the Kafka data folders and start the service which might work. If you do so, you are really in serious trouble in some cases. 

We will explain why we have stated that way? There are three ways to look at the above problem. As we always say, there can be ‘N’ number of ways to solve a problem but we discuss on what are all the best optimal solutions here.

Always ask Why? 

If someone suggested cleaning up the folder and do restart? Try to ask why we have to do that? If you do so, we will come to a basic question to understand on ‘What is the problem’?

What is the Problem?

You should not believe in solutions if you don’t know what is the problem right? Let’s see if you see the above problem, you can observe a File I/O issue (highlighted below)

kafka.log.Log.replaceSegments(Log.scala:1651) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:535) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:462) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:461) ~[kafka_2.11-1.1.0.4.jar:?] at scala.collection.immutable.List.foreach(List.scala:392) ~[scala-library-2.11.12.jar:?] at kafka.log.Cleaner.doClean(LogCleaner.scala:461) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.Cleaner.clean(LogCleaner.scala:438) ~[kafka_2.11-1.1.0.4.jar:?] at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:305) [kafka_2.11-1.1.0.4.jar:?] at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:291) [kafka_2.11-1.1.0.4.jar:?] at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) [kafka_2.11-1.1.0.4.jar:?] Suppressed: java.nio.file.FileSystemException: C:\PRPCPersonalEdition\tomcat\kafka-data\__consumer_offsets-0\00000000000000000000.log.cleaned -> C:\PRPCPersonalEdition\tomcat\kafka-data\__consumer_offsets-0\00000000000000000000.log.swap: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[?:1.8.0_121] at

Now you have key information on why a stream node fails to start up!  Now can you think of removing the Kafka-data folder and restart?

We would say, we still need to analyze more information to decide. As it’s a file access issue, can we think of the instance restart? 

Restart Instance

Restarting an instance may solve or may not which might depend on the processor threads & lockers. If that solves the problem then you are on the safer side. 

Unlock or Folder Permissions 

Stop the instance and try to remove file locks if exists or check on folder permissions to ensure there are no recent issues with those folders. Once you do unlock the file or are done with folder permissions then try to start the instance. This would solve the problem but not 100% guaranteed. 

Kafka-Data Folder 

If the above steps did not solve the problem then you are in serious issue because there might be a lot of unprocessed data (queued messages) 

So, removing the folder will give you a serious problem where you will see data loss which can throw you to verify (We will publish another blog post on how to design in such scenario’s)

Single Node – If there is only one stream node then you will have to take a backup before you clean up the folder and restart as data might get discarded. 

Cluster Node – If you remove when the node is in a cluster then your chance of data loss is very less as your master node does contain a replica.

So, we have discussed three ways to solve these serious problems. Here are their risk factors for each method.

  1. Server Restart (Less Risk & Less Success Rate)
  2. Unlock File Read Issues & Restart (Moderate Risk & Moderate Success Rate)
  3. Remove Kafka Data Folder (High Risk & High Success Rate – Take backup to reduce the risk factor depends on the cluster information)

P.S – Please provide your thoughts or do correct if we have stated something wrong or share your experience on the similar issues in the comments section. 

Leave a comment

I’m Kondal

Hello, I’m Kondala Rao, known as Kondal. With extensive IT experience spanning product development, solution consulting, and business conduct, my passion lies in hands-on experimentation with the latest features of the Pega Platform and other low-code/no-code platforms to benefit businesses. Whenever I get free time, this blog is a space where I share insights, tips, and tutorials to help you leverage these technologies effectively.

I believe that even if one person benefits from my insights, it enriches my purpose to serve better. Join me on this journey of exploration and learning, and let’s elevate our skills together.

Happy Reading!!!

Let’s connect