Skip to content

Conversation

Nadahar
Copy link
Contributor

@Nadahar Nadahar commented Sep 27, 2025

This is an attempt to fix #19397.

I've outsourced the consumption of the process output to a separate thread to prevent the "executing thread" for being blocked if the output isn't closed as it should.

In addition, I've added a couple of "emergency timeouts" that should hopefully prevent the thread from blocking for too long if things go wrong for some reason.

@lsiepel We'll need to do some testing to see how this works.

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 27, 2025

If this turns out to work well, I think it's tempting to try to get OutputConsumptionUtil into Core. I made a similar class in the Exec binding that would use the one in Core too, and I'm sure there are other places in OH that would need one too.

If we have one implementation that is "clean" and works, it's better to reuse that than having everybody "reinvent the wheel".

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
@Nadahar Nadahar force-pushed the network-threaded-consumer branch from d7b177c to 471cd0b Compare September 27, 2025 19:42
@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 27, 2025

Here is the JAR in case anybody wants to test it without having to build.

org.openhab.binding.network-5.1.0-SNAPSHOT.jar.zip

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 27, 2025

I've been running with 12 network Things for a little while now, all the "network threads" are still idling/parked. They work very shortly when polling a device, then goes back to idling.

But, it might not mean anything, I don't think this bug exists on Windows in any case.

edit: Running for 3 hours, still no problems.

@reyhard
Copy link

reyhard commented Sep 28, 2025

I've tried this commit - I've built it from scratch though - your jar was complaining about missing speedtest? - and it is still happening on my main Openhab instance on Rapsberry Pi.

Its not happening on my Windows Openhab build, but over there I just have couple of things and items for testing only. I'm also using Windows ping.

I have feeling that it depends on speed of execution, if whole process is fast, then there is no problem but if whole cycle takes too long, then we situation like:
refresh() → getValue() → blocks waiting for future → future needs thread from same pool → no threads available

So maybe reason why its not happening on Windows is simple - it is fast enough to complete it?

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

your jar was complaining about missing speedtest?

Yes, I had a problem with feature resolution when building it, so there is probably some dependency that has changed version between some builds. To rectify this, I don't know any other way than to build all add-ons - which takes a couple of hours or so, that I can't really use the computer for anything else (because it is extremely laggy while building).

and it is still happening on my main Openhab instance on Rapsberry Pi

You are positive that you got the "right version" of running? I've done all I can think of with the Process handling, so if that didn't help, I'm afraid I don't quite understand what's going on.

I have feeling that it depends on speed of execution, if whole process is fast, then there is no problem but if whole cycle takes too long, then we situation like: refresh() → getValue() → blocks waiting for future → future needs thread from same pool → no threads available

That would absolutely make sense - the problem is that I can't find anything that could cause this in the code. The pool in question is only used for scheduling tasks, not for completing anything. So, to consume all threads in the pool, the scheduled tasks don't return in time for the next run - which should be close to impossible with a "normal" refresh interval.

This is also why I've been focusing on the Process handling. Also, the changes we did that caused this, didn't really modify the fundamentals of how this works, so there's no reason (that I can see) how we could have made a new "trap" here.

So maybe reason why its not happening on Windows is simple - it is fast enough to complete it?

If it's caused by a problem with the Process handling, the reason it works on Windows is quite obvious: Windows always closes the pipes when shutting down the process, while at least macOS, and potentially Linux, don't do that under all circumstances. Ironically, the way the code was before, could lead to a deadlock on Windows, because the Process handling relied on the pipe cache being large enough to store the whole Process output. If that buffer was filled before the process ended, it would hang, and it seems like Windows has a much smaller buffer for the pipes than Linux does - or that there are differences in the size of the buffer in the JVM implementations for the different platforms. I'm not really sure where the buffer limitation lies, but I know that if the buffer is filled, the process will hang the way it was. Therefore, we now drain the pipe while the process is running, instead of after it is finished.

The way we did it was to drain the pipe until it was closed while the process was running, and only when the pipe was closed, continue finishing the task. But, I know that there are situations (although I thought they were quite rare) where the pipe doesn't close as it should, which would leave the whole thing essentially waiting forever for the pipe to close. To address this, I've now moved the reading of the pipe to a separate thread, so the "task thread" still shouldn't be caught up even if the pipe isn't closed. However, the "task thread" still needs to retrieve the output to analyze it, so it could still potentially hang when trying to get() the results. That's why I added a 5 seconds timeout on that retrieval, which should normally take no time at all. With the timeout in place, it shouldn't be possible for the "task thread" to become stuck, yet it seems like that's exactly what's still happening.

Could you make a new thread dump with it "hanging" with this PR running? There must be something I'm missing that is happening here, I just need to figure out what it is.

@lsiepel
Copy link
Contributor

lsiepel commented Sep 28, 2025

your jar was complaining about missing speedtest?

Yes, I had a problem with feature resolution when building it, so there is probably some dependency that has changed version between some builds. To rectify this, I don't know any other way than to build all add-ons - which takes a couple of hours or so, that I can't really use the computer for anything else (because it is extremely laggy while building).

You can build from the specific addon folder itself and it is done in less then a minute. It only builds that jar. I remember you have an advanced build setup, so you are probably aware and something else prevents this.

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

You can build from the specific addon folder itself and it is done in less then a minute. It only builds that jar. I remember you have an advanced build setup, so you are probably aware and something else prevents this.

Yes, I normally do this, obviously, but sometimes something goes wrong with the feature resolution, and something is f. up with the dependencies. When that happens, I get a build error during feature verification, after the JAR itself has been created, and that's what happened yesterday. The only way I know to resolve that is to build them all, because then all the feature resolution stuff is redone.

I've done that now, so here is a new version (I see that this version is larger, so something was definitely missing from the other one):

org.openhab.binding.network-5.1.0-SNAPSHOT.jar.zip

@reyhard
Copy link

reyhard commented Sep 28, 2025

Here is dump using your .jar
https://mega.nz/file/x9NG3L4b#3wMA0IoguCI0afdqHtIo6BKFkKtupbUHZlbloHUJM1E

"OH-networkBinding-1" Id=35889 WAITING on java.util.concurrent.CompletableFuture$Signaller@2b696f07
	at java.base@21.0.8/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CompletableFuture$Signaller@2b696f07
	at java.base@21.0.8/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.openhab.binding.network.internal.PresenceDetection.getValue(PresenceDetection.java:317)
	at org.openhab.binding.network.internal.PresenceDetection.refresh(PresenceDetection.java:608)
	at org.openhab.binding.network.internal.handler.NetworkHandler$$Lambda/0x00000001022b7678.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.runWith(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.run(Unknown Source)



"OH-binding-network-DHCPPacketListener" Id=35890 RUNNABLE (in native)
	at java.base@21.0.8/sun.nio.ch.DatagramChannelImpl.receive0(Native Method)
	at java.base@21.0.8/sun.nio.ch.DatagramChannelImpl.receiveIntoNativeBuffer(Unknown Source)
	at java.base@21.0.8/sun.nio.ch.DatagramChannelImpl.receive(Unknown Source)
	at java.base@21.0.8/sun.nio.ch.DatagramChannelImpl.tryBlockingReceive(Unknown Source)
	at java.base@21.0.8/sun.nio.ch.DatagramChannelImpl.blockingReceive(Unknown Source)
	at java.base@21.0.8/sun.nio.ch.DatagramSocketAdaptor.receive(Unknown Source)
	at java.base@21.0.8/java.net.DatagramSocket.receive(Unknown Source)
	at org.openhab.binding.network.internal.dhcp.DHCPPacketListenerServer.run(DHCPPacketListenerServer.java:102)



"OH-networkBinding-2" Id=35891 WAITING on java.util.concurrent.CompletableFuture$Signaller@2082c147
	at java.base@21.0.8/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CompletableFuture$Signaller@2082c147
	at java.base@21.0.8/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.openhab.binding.network.internal.PresenceDetection.getValue(PresenceDetection.java:317)
	at org.openhab.binding.network.internal.PresenceDetection.refresh(PresenceDetection.java:608)
	at org.openhab.binding.network.internal.handler.NetworkHandler$$Lambda/0x00000001022b7678.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.runWith(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.run(Unknown Source)



"OH-networkBinding-3" Id=35892 WAITING on java.util.concurrent.CompletableFuture$Signaller@228e285b
	at java.base@21.0.8/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CompletableFuture$Signaller@228e285b
	at java.base@21.0.8/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.openhab.binding.network.internal.PresenceDetection.getValue(PresenceDetection.java:317)
	at org.openhab.binding.network.internal.PresenceDetection.refresh(PresenceDetection.java:608)
	at org.openhab.binding.network.internal.handler.NetworkHandler$$Lambda/0x00000001022b7678.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.runWith(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.run(Unknown Source)



"OH-networkBinding-4" Id=35893 WAITING on java.util.concurrent.CompletableFuture$Signaller@31b541dd
	at java.base@21.0.8/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CompletableFuture$Signaller@31b541dd
	at java.base@21.0.8/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.openhab.binding.network.internal.PresenceDetection.getValue(PresenceDetection.java:317)
	at org.openhab.binding.network.internal.PresenceDetection.refresh(PresenceDetection.java:608)
	at org.openhab.binding.network.internal.handler.NetworkHandler$$Lambda/0x00000001022b7678.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.runWith(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.run(Unknown Source)



"OH-networkBinding-5" Id=35894 WAITING on java.util.concurrent.CompletableFuture$Signaller@514401bb
	at java.base@21.0.8/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CompletableFuture$Signaller@514401bb
	at java.base@21.0.8/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.openhab.binding.network.internal.PresenceDetection.getValue(PresenceDetection.java:317)
	at org.openhab.binding.network.internal.PresenceDetection.refresh(PresenceDetection.java:608)
	at org.openhab.binding.network.internal.handler.NetworkHandler$$Lambda/0x00000001022b7678.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@21.0.8/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.runWith(Unknown Source)
	at java.base@21.0.8/java.lang.Thread.run(Unknown Source)

I also started adding some extra debugs on my end to observe what is going, not sure if it helps somehow

getValue2025-09-28 12:03:50.237 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'ARPING 192.168.1.153'

2025-09-28 12:03:50.237 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.237 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.237 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '--- 192.168.1.153 statistics ---'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '7 packets transmitted, 0 packets received, 100% unanswered (0 extra)'

2025-09-28 12:03:50.238 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:50.239 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] nativeArpPing completed for 192.168.1.153 on interface eth0, result: PingResult{success=false, responseTime=null, executionTime=5090.06938ms}

2025-09-28 12:03:50.239 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Completed async detection task for 192.168.1.153 on thread: OH-networkBinding-5

2025-09-28 12:03:50.239 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.allOf().join() completed for 192.168.1.153 on thread: OH-networkBinding-3

2025-09-28 12:03:50.240 [DEBUG] [g.network.internal.PresenceDetection] - All 1 detection futures for 192.168.1.153 have completed

2025-09-28 12:03:50.240 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.141 resolved to other address /192.168.1.141, (re-)setup presence detection

2025-09-28 12:03:50.240 [TRACE] [g.network.internal.PresenceDetection] - 192.168.1.153 is unreachable, invalidating cached destination value

2025-09-28 12:03:50.240 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.141, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:50.240 [DEBUG] [g.network.internal.PresenceDetection] - Sending listener final result: PresenceDetectionValue [hostAddress=192.168.1.153, latency=-1.0ms, reachableDetectionTypes=[], reachableTcpPorts=[]]

2025-09-28 12:03:50.240 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Starting async detection task for 192.168.1.131 on thread: OH-networkBinding-5

2025-09-28 12:03:50.240 [TRACE] [g.network.internal.PresenceDetection] - Perform ARP ping presence detection for 192.168.1.131 on interface: eth0

2025-09-28 12:03:50.241 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Calling nativeArpPing for 192.168.1.131 on interface eth0

2025-09-28 12:03:50.244 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.supplyAsync lambda started for 192.168.1.131 on thread: OH-networkBinding-3

2025-09-28 12:03:50.245 [DEBUG] [g.network.internal.PresenceDetection] - Waiting for 1 detection futures for 192.168.1.131 to complete

2025-09-28 12:03:50.245 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] About to call CompletableFuture.allOf().join() for 192.168.1.131 on thread: OH-networkBinding-3

==> /var/log/openhab/events.log <==

2025-09-28 12:03:49.744 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Energy_Daily_Electricity' changed from 1.706 to 1.705

2025-09-28 12:03:50.242 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'network:pingdevice:mibox_01' changed from UNKNOWN to ONLINE

2025-09-28 12:03:51.016 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Miflora_Plant_06_Temperature' changed from 16.6 °C to 16.8 °C

2025-09-28 12:03:51.693 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'MotionSensor_06_Motion' changed from ON to OFF

2025-09-28 12:03:52.302 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Miflora_Plant_05_Conductivity' changed from 239 to 238

2025-09-28 12:03:53.714 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Energy_Daily_Electricity' changed from 1.705 to 1.709

==> /var/log/openhab/openhab.log <==

2025-09-28 12:03:54.337 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'ARPING 192.168.1.130'

2025-09-28 12:03:54.337 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.337 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.337 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.337 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '--- 192.168.1.130 statistics ---'

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '7 packets transmitted, 0 packets received, 100% unanswered (0 extra)'

2025-09-28 12:03:54.338 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:54.339 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] nativeArpPing completed for 192.168.1.130 on interface eth0, result: PingResult{success=false, responseTime=null, executionTime=5110.901217ms}

2025-09-28 12:03:54.339 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Completed async detection task for 192.168.1.130 on thread: OH-networkBinding-4

2025-09-28 12:03:54.339 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.allOf().join() completed for 192.168.1.130 on thread: OH-networkBinding-1

2025-09-28 12:03:54.339 [DEBUG] [g.network.internal.PresenceDetection] - All 1 detection futures for 192.168.1.130 have completed

2025-09-28 12:03:54.339 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.130 resolved to other address /192.168.1.130, (re-)setup presence detection

2025-09-28 12:03:54.340 [TRACE] [g.network.internal.PresenceDetection] - 192.168.1.130 is unreachable, invalidating cached destination value

2025-09-28 12:03:54.340 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.130, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.340 [DEBUG] [g.network.internal.PresenceDetection] - Sending listener final result: PresenceDetectionValue [hostAddress=192.168.1.130, latency=-1.0ms, reachableDetectionTypes=[], reachableTcpPorts=[]]

2025-09-28 12:03:54.340 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Starting async detection task for 192.168.1.141 on thread: OH-networkBinding-4

2025-09-28 12:03:54.340 [TRACE] [g.network.internal.PresenceDetection] - Perform TCP presence detection for 192.168.1.141 on port: 5555

2025-09-28 12:03:54.342 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.141 resolved to other address /192.168.1.141, (re-)setup presence detection

2025-09-28 12:03:54.342 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Calling servicePing for 192.168.1.141:5555 with timeout PT4S

2025-09-28 12:03:54.344 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.supplyAsync lambda started for 192.168.1.141 on thread: OH-networkBinding-1

2025-09-28 12:03:54.345 [DEBUG] [g.network.internal.PresenceDetection] - Waiting for 1 detection futures for 192.168.1.141 to complete

2025-09-28 12:03:54.345 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] About to call CompletableFuture.allOf().join() for 192.168.1.141 on thread: OH-networkBinding-1

2025-09-28 12:03:54.356 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] servicePing completed for 192.168.1.141:5555, success: true

2025-09-28 12:03:54.356 [TRACE] [g.network.internal.PresenceDetection] - Getting latency from ping result PingResult{success=true, responseTime=null, executionTime=13.067586ms} using latency mode false

2025-09-28 12:03:54.356 [DEBUG] [g.network.internal.PresenceDetection] - Sending listener partial result: PresenceDetectionValue [hostAddress=192.168.1.141, latency=13.067586ms, reachableDetectionTypes=[TCP_CONNECTION], reachableTcpPorts=[5555]]

2025-09-28 12:03:54.357 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Completed async detection task for 192.168.1.141 on thread: OH-networkBinding-4

2025-09-28 12:03:54.358 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.allOf().join() completed for 192.168.1.141 on thread: OH-networkBinding-1

2025-09-28 12:03:54.358 [DEBUG] [g.network.internal.PresenceDetection] - All 1 detection futures for 192.168.1.141 have completed

2025-09-28 12:03:54.358 [DEBUG] [g.network.internal.PresenceDetection] - Sending listener final result: PresenceDetectionValue [hostAddress=192.168.1.141, latency=13.067586ms, reachableDetectionTypes=[TCP_CONNECTION], reachableTcpPorts=[5555]]

2025-09-28 12:03:54.359 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.131 resolved to other address /192.168.1.131, (re-)setup presence detection

2025-09-28 12:03:54.359 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.131, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.360 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.142 resolved to other address /192.168.1.142, (re-)setup presence detection

2025-09-28 12:03:54.361 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.142, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.362 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.126 resolved to other address /192.168.1.126, (re-)setup presence detection

2025-09-28 12:03:54.363 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] future.get() completed for 192.168.1.141 on thread: OH-networkBinding-2

2025-09-28 12:03:54.363 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.126, port: 3389, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.363 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() completed for 192.168.1.141 on thread: OH-networkBinding-2

2025-09-28 12:03:54.363 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: B4:2E:99:E3:67:C4, hostname: 192.168.1.126, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.364 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.145 resolved to other address /192.168.1.145, (re-)setup presence detection

2025-09-28 12:03:54.364 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.0.1, port: 80, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.364 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.146 resolved to other address /192.168.1.146, (re-)setup presence detection

2025-09-28 12:03:54.364 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.145, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.365 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.146, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.365 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.231 resolved to other address /192.168.1.231, (re-)setup presence detection

2025-09-28 12:03:54.365 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.136 resolved to other address /192.168.1.136, (re-)setup presence detection

2025-09-28 12:03:54.365 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.231, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.366 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.136, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.366 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.216 resolved to other address /192.168.1.216, (re-)setup presence detection

2025-09-28 12:03:54.366 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.217 resolved to other address /192.168.1.217, (re-)setup presence detection

2025-09-28 12:03:54.366 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.232 resolved to other address /192.168.1.232, (re-)setup presence detection

2025-09-28 12:03:54.367 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.217, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.367 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.216, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.367 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.232, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.367 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.233 resolved to other address /192.168.1.233, (re-)setup presence detection

2025-09-28 12:03:54.368 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.121 resolved to other address /192.168.1.121, (re-)setup presence detection

2025-09-28 12:03:54.368 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.233, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.368 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.121, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.368 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.155 resolved to other address /192.168.1.155, (re-)setup presence detection

2025-09-28 12:03:54.369 [TRACE] [g.network.internal.PresenceDetection] - Hostname 2a02:a45a:5746:0:7edd:90ff:fef0:c2eb resolved to other address /2a02:a45a:5746:0:7edd:90ff:fef0:c2eb, (re-)setup presence detection

2025-09-28 12:03:54.369 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.153 resolved to other address /192.168.1.153, (re-)setup presence detection

2025-09-28 12:03:54.369 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.155, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.369 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 2a02:a45a:5746:0:7edd:90ff:fef0:c2eb, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.369 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.153, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.370 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.141 resolved to other address /192.168.1.141, (re-)setup presence detection

2025-09-28 12:03:54.370 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.131 resolved to other address /192.168.1.131, (re-)setup presence detection

2025-09-28 12:03:54.370 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.130 resolved to other address /192.168.1.130, (re-)setup presence detection

2025-09-28 12:03:54.370 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.141, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.370 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.130, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.370 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.131, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.371 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() called for 192.168.1.126 on thread: OH-networkBinding-1

2025-09-28 12:03:54.371 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() called for 192.168.1.126 on thread: OH-networkBinding-2

2025-09-28 12:03:54.371 [DEBUG] [g.network.internal.PresenceDetection] - Refreshing 192.168.1.126 reachability state

2025-09-28 12:03:54.371 [DEBUG] [g.network.internal.PresenceDetection] - Refreshing 192.168.1.126 reachability state

2025-09-28 12:03:54.372 [TRACE] [g.network.internal.PresenceDetection] - Hostname 192.168.1.142 resolved to other address /192.168.1.142, (re-)setup presence detection

2025-09-28 12:03:54.372 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] getValue() called for 192.168.1.126 on thread: OH-networkBinding-2

2025-09-28 12:03:54.372 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] getValue() called for 192.168.1.126 on thread: OH-networkBinding-1

2025-09-28 12:03:54.372 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] performPresenceDetection() started for 192.168.1.126 on thread: OH-networkBinding-1

2025-09-28 12:03:54.372 [TRACE] [twork.internal.WakeOnLanPacketSender] - initialized WOL Packet Sender (mac: , hostname: 192.168.1.142, port: null, networkInterfaceNames: [eth0])

2025-09-28 12:03:54.372 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] performPresenceDetection() started for 192.168.1.126 on thread: OH-networkBinding-2

2025-09-28 12:03:54.372 [TRACE] [g.network.internal.PresenceDetection] - Performing 1 presence detection checks for 192.168.1.126

2025-09-28 12:03:54.372 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() called for 192.168.0.1 on thread: OH-networkBinding-4

2025-09-28 12:03:54.372 [TRACE] [g.network.internal.PresenceDetection] - Performing 1 presence detection checks for 192.168.1.126

2025-09-28 12:03:54.373 [DEBUG] [g.network.internal.PresenceDetection] - Refreshing 192.168.0.1 reachability state

2025-09-28 12:03:54.373 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] Blocking on future.get() for 192.168.1.126 on thread: OH-networkBinding-1

2025-09-28 12:03:54.373 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] Blocking on future.get() for 192.168.1.126 on thread: OH-networkBinding-2

2025-09-28 12:03:54.373 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] getValue() called for 192.168.0.1 on thread: OH-networkBinding-4

2025-09-28 12:03:54.373 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] performPresenceDetection() started for 192.168.0.1 on thread: OH-networkBinding-4

2025-09-28 12:03:54.374 [TRACE] [g.network.internal.PresenceDetection] - Performing 1 presence detection checks for 192.168.0.1

2025-09-28 12:03:54.374 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] Blocking on future.get() for 192.168.0.1 on thread: OH-networkBinding-4

==> /var/log/openhab/events.log <==

2025-09-28 12:03:54.227 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'LightSwitch_01_LinkQuality' changed from 127 to 134

2025-09-28 12:03:54.247 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Termometr3_Humidity' changed from 59.6 to 59.5

2025-09-28 12:03:54.344 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'network:pingdevice:reader_01' changed from UNKNOWN to ONLINE

2025-09-28 12:03:54.360 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'network:pingdevice:reader_02' changed from UNKNOWN to ONLINE

2025-09-28 12:03:54.362 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'network:pingdevice:symfonisk_01' changed from UNKNOWN to ONLINE

==> /var/log/openhab/openhab.log <==

2025-09-28 12:03:55.349 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'ARPING 192.168.1.131'

2025-09-28 12:03:55.349 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.349 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.349 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: 'Timeout'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '--- 192.168.1.131 statistics ---'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: '7 packets transmitted, 0 packets received, 100% unanswered (0 extra)'

2025-09-28 12:03:55.350 [TRACE] [.network.internal.utils.NetworkUtils] - Network [arping output]: ''

2025-09-28 12:03:55.351 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] nativeArpPing completed for 192.168.1.131 on interface eth0, result: PingResult{success=false, responseTime=null, executionTime=5107.681444ms}

2025-09-28 12:03:55.351 [TRACE] [g.network.internal.PresenceDetection] - [DIAG] Completed async detection task for 192.168.1.131 on thread: OH-networkBinding-5

2025-09-28 12:03:55.351 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() called for 192.168.1.145 on thread: OH-networkBinding-5

2025-09-28 12:03:55.351 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] CompletableFuture.allOf().join() completed for 192.168.1.131 on thread: OH-networkBinding-3

2025-09-28 12:03:55.351 [DEBUG] [g.network.internal.PresenceDetection] - Refreshing 192.168.1.145 reachability state

2025-09-28 12:03:55.351 [DEBUG] [g.network.internal.PresenceDetection] - All 1 detection futures for 192.168.1.131 have completed

2025-09-28 12:03:55.352 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] getValue() called for 192.168.1.145 on thread: OH-networkBinding-5

2025-09-28 12:03:55.352 [TRACE] [g.network.internal.PresenceDetection] - 192.168.1.131 is unreachable, invalidating cached destination value

2025-09-28 12:03:55.352 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] performPresenceDetection() started for 192.168.1.145 on thread: OH-networkBinding-5

2025-09-28 12:03:55.352 [DEBUG] [g.network.internal.PresenceDetection] - Sending listener final result: PresenceDetectionValue [hostAddress=192.168.1.131, latency=-1.0ms, reachableDetectionTypes=[], reachableTcpPorts=[]]

2025-09-28 12:03:55.352 [TRACE] [g.network.internal.PresenceDetection] - Performing 1 presence detection checks for 192.168.1.145

2025-09-28 12:03:55.352 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] Blocking on future.get() for 192.168.1.145 on thread: OH-networkBinding-5

2025-09-28 12:03:55.356 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] refresh() called for 192.168.1.146 on thread: OH-networkBinding-3

2025-09-28 12:03:55.357 [DEBUG] [g.network.internal.PresenceDetection] - Refreshing 192.168.1.146 reachability state

2025-09-28 12:03:55.357 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] getValue() called for 192.168.1.146 on thread: OH-networkBinding-3

2025-09-28 12:03:55.357 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] performPresenceDetection() started for 192.168.1.146 on thread: OH-networkBinding-3

2025-09-28 12:03:55.357 [TRACE] [g.network.internal.PresenceDetection] - Performing 1 presence detection checks for 192.168.1.146

2025-09-28 12:03:55.357 [DEBUG] [g.network.internal.PresenceDetection] - [DIAG] Blocking on future.get() for 192.168.1.146 on thread: OH-networkBinding-3

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

Argh, I'm sorry to say, but you were absolutely right in your previous analysis. The same pool is used to resolve the Futures, it was just done in a way that eluded my analysis.

This is clearly the problem then, I'll have to see how it used to be done to see what a potential solution could be (assuming that we don't want to create yet another thread pool).

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

Ok, I see now that the binding used to create a single threaded executor service (basically a complicated way to create a thread) for each "presence detection task" that was used to resolve the Futures. So, we went a step too far in trying to "save" threads here.

I see no reason to create another single threaded executor, but I think we need a dedicated pool to resolve the Futures. They simply can't be handled by the same pool that does the scheduling, or something like this can potentially always happen. I'm pretty sure that most of the time, a single thread will be enough to do the resolution for all tasks, but it should be dynamic to grow as needed.

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

I just completed setting this up on a Linux (Fedora) VM to see if I could reproduce the problem. I've configured 9 things that does pings, have 5 threads in the pool, and still it doesn't happen. It might have to do with whether these happen to be schedules to ping at the same time or not, and that's probably somewhat random.

The problem is now clear, so I don't mean to say that it shouldn't be fixed, I was just hoping that I could reproduce it so I could verify that I fixed it - but that doesn't seem to be so easy.

edit: I just managed to get it to happen on the Linux VM 👍 It seems like some "luck" is needed as well.

@reyhard
Copy link

reyhard commented Sep 28, 2025

I managed to repro it also on Windows

  1. Start Openhab with start_debug.bat
  2. Start compiling some code with all cores on 100% - I guess in your case it you could try compiling Openhab addons :D

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

I'm working on a fix, hopefully shouldn't be too long.

This avoids thread starvation leading to Futures never resolving

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
@Nadahar Nadahar force-pushed the network-threaded-consumer branch from e1bf0e5 to 570bbca Compare September 28, 2025 18:40
@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

The build seems to fail because there are problems with openhab.org again. Building locally also fails for the same reason.

Failed to load schema with public ID null, system ID https://openhab.org/schemas/thing-description-1.0.0.xsd: schema_reference.4: Failed to read schema document 'https://openhab.org/schemas/thing-description-1.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.: Unknown host openhab.org

It seems like there are some DNS trouble, I don't think there is much I can do to make the build work in the meanwhile: https://community.openhab.org/t/openhab-org-temporary-blocked-because-of-certificate-issue/166513

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

Meanwhile, Eclipse can build it just fine without the schema, so my testing works both locally and on the VM without issue now.

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

I added a random factor (up to 5 seconds) to the initial delay for the poll tasks, so that they aren't all scheduled to run at the exact same time. It can mean that the Things can take a few seconds to come online after startup, but I think it's better to have them spread out a little bit. Any viewpoints?

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 28, 2025

Now the DNS and certificate trouble has been resolved, and building is possible, here is the latest version:

org.openhab.binding.network-5.1.0-SNAPSHOT.jar.zip

@reyhard
Copy link

reyhard commented Sep 29, 2025

Seems to work now! 🎉
Running it now for ~3h and pinging seems to run smoothly

@Nadahar
Copy link
Contributor Author

Nadahar commented Sep 29, 2025

@lsiepel I think this is good now. More testing would always be good, but the problem has been identified and addressed. I still think the changes to the Process I did initially is beneficial, even though this wasn't what caused the issue in this case, it is a potential trap, although it should be a quite rare occurrence.

But, having already made the changes to avoid this rare problem, I see no reason to not include it.

This should probably be backported as well, since #17972 was.

…binding/network/internal/discovery/NetworkDiscoveryService.java

Signed-off-by: lsiepel <leosiepel@gmail.com>
Copy link
Contributor

@lsiepel lsiepel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into these issues deeply. LGTM

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 3, 2025

I tried to accept your suggestion, but it's already outdated. Did you already apply it?

edit: Sorry, just saw 58fdbf6 now.

@lsiepel
Copy link
Contributor

lsiepel commented Oct 3, 2025

I tried to accept your suggestion, but it's already outdated. Did you already apply it?

edit: Sorry, just saw 58fdbf6 now.

Ultra minor, so applied it myself. waiting for build now.

@lsiepel lsiepel merged commit 9b5d188 into openhab:main Oct 3, 2025
2 checks passed
@lsiepel lsiepel added the bug An unexpected problem or unintended behavior of an add-on label Oct 3, 2025
@lsiepel lsiepel added this to the 5.1 milestone Oct 3, 2025
@lsiepel lsiepel changed the title [network] Use a threaded consumer for Process output consumption [network] Fix cases where state no longer updates Oct 3, 2025
@lsiepel lsiepel changed the title [network] Fix cases where state no longer updates [network] Fix case where state no longer updates Oct 3, 2025
@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 3, 2025

@lsiepel I think a backport should be done, but it would maybe be nice with a little bit more testing first, now that it's merged?

@Nadahar Nadahar deleted the network-threaded-consumer branch October 3, 2025 15:20
@lsiepel
Copy link
Contributor

lsiepel commented Oct 3, 2025

Backport to 5.0.x fails due to conflicts.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 3, 2025

Backport to 5.0.x fails due to conflicts.

I can take care of that, the question is if we want to give it a few days to see if anything new pops up..?

@lsiepel
Copy link
Contributor

lsiepel commented Oct 3, 2025

Yes, let's postpone a backport close to the next patch release. No date is set yet. Hopefully 5.1.0 m2 is first.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 3, 2025

No wonder that there are conflicts, it seems that even though it was intended to be, #17972 was never backported. This PR builds on the changes from #17972, so it won't work. I'll create a "backport PR" where I include them both.

Nadahar added a commit to Nadahar/openhab-addons that referenced this pull request Oct 3, 2025
…nhab#19398)

* Use a threaded consumer for Process output consumption

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
Nadahar added a commit to Nadahar/openhab-addons that referenced this pull request Oct 3, 2025
…nhab#19398)

* Use a threaded consumer for Process output consumption

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
DrRSatzteil pushed a commit to DrRSatzteil/openhab-addons that referenced this pull request Oct 11, 2025
…nhab#19398)

* Use a threaded consumer for Process output consumption

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
lsiepel added a commit that referenced this pull request Oct 11, 2025
* [network] Fix discovery performance causing a slow openHAB start (#17972)

* Fix performance
* Improvements
* Add logging
* Improve logging
* Update bundles/org.openhab.binding.network/src/main/java/org/openhab/binding/network/internal/utils/NetworkUtils.java
* Improve thread handling
* Improve shutdown
* thread cleanup
* Improve per thread allocation
* Stop on finishing all interfaces
* Change Arping to make use of completeablefeature
* Seperate presence detection from lifecycle
* Improve logging and filtering
* Create seperate ScheduledExecutorService
* Fix review comment
* Improve network address checks
* Improve interrupthandling
* Revert threadlocal
* Refactor Presencedetection
* Make LatencyParser accept Windows' <1ms syntax for minimal latency
* Remove misleading reference to non-existing NetworkHandlerBuilder
* Handle thread-safety of NetworkDiscoveryService.executorService
* Fix network interface exclusion
* Make PresenceDetectionValue thread-safe
* Partial refactoring of PresenceDetection

Fixes:
  - Synchronization of lastSeen
  - Joining of async tasks
  - Minor logging improvements
Addition:
  - Create setIcmpPingMethod()

* Partial refactoring of NetworkDiscoveryService

Fixes:
  - Correct the number of addresses in a /24 subnet.
  - Restructure task creation so that one less thread is needed per scanned address, and so that startScan() is guaranteed to return quickly.
  - Create independent threads for posting discovery results, as this can take a long time and might not finish before the executor shuts down.

* Make NetworkHandler thread-safe
* Make SpeedTestHandler thread-safe
* Make sure that process output is consumed
* Implement tool-specific latency parsing
* Fix NetworkDiscoveryService configuration and make the thread pool size configurable
* i18n
* Fix test

Signed-off-by: Leo Siepel <leosiepel@gmail.com>
Co-authored-by: Wouter Born <github@maindrain.net>
Co-authored-by: Ravi Nadahar <nadahar@rediffmail.com>

* [network] Use a threaded consumer for Process output consumption (#19398)

* Use a threaded consumer for Process output consumption

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>

---------

Signed-off-by: Leo Siepel <leosiepel@gmail.com>
Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
Co-authored-by: lsiepel <leosiepel@gmail.com>
Co-authored-by: Wouter Born <github@maindrain.net>
Co-authored-by: Ravi Nadahar <nadahar@rediffmail.com>
@lsiepel lsiepel added backported A PR that has been cherry-picked to a patch release branch backported4 Backported patches to the 4.x branch. Used for constructing release notes. labels Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backported A PR that has been cherry-picked to a patch release branch backported4 Backported patches to the 4.x branch. Used for constructing release notes. bug An unexpected problem or unintended behavior of an add-on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[network] Pinging fails when there are more than 5 things

3 participants