Sometimes the internet connection drops out for about an hour or more, and when the connection comes back up, although I can see the gateways in the Multitech deviceshq.com console to show they have an internet connection again, they aren’t seen in the TTN console again.
It normally takes a reboot of the gateway to get it connected again or a complete power cycle.
Could anybody explain what might be happening?
@kersing does it retry several times at different intervals and then give up and won’t try again?
No, it continuously retries. At least it should. However there is an obscure issue I’ve been trying to solve for 6 months now that results in this behavior. As I can’t reproduce it myself I’m working with community members to get to the bottom of it. (I’m pushing new sources with additional debugging while typing this)
A similar rare, intermittent issue occurs with the “Semtech UDP” packet forwarder built into the Conduit. A work around in that packet forwarder is setting an “autoquit_threshold” in global_conf.json under gateway_conf. For example:
{
[…]
“gateway_conf”: {
“autoquit_threshold”: 5,
“server_address”: “router.eu.thethings.network”,
“serv_port_up”: 1700,
“serv_port_down”: 1700,
[…]
}
}
After 5 failures the packet forwarder will exit and the watchdog included in the AEP OS will start another instance of the packet forwarder a few seconds later.
If the connectivity problem is only related to the PPP connection, you could add a line restarting the packet forwarder in /etc/ppp/ip-up. For the built in packet forwarder this could be done by adding the line “/etc/init.d/lora-packet-forwarder restart” to the bottom of the file.
Hi @Paul_Stewart, I run a variety of unattended systems on RPi/Linux including a number of TTN gateways and a number of edge-computing “things”.
I maintain availability using systemd to restart software and watchdog to restart the OS.
I have not used the packet forwarder from @kersing but if the software exits with a non-zero status on encountering an abnormal situation, uses a PID file and has an option to regularly touch a file in /var/run when operating normally then it’s very easy to detect hangs and exits and automate restarts.
When the ppp connection is restored the IP address may have changed.
This breaks the UDP socket connection between the gateway and network server.
The packet forwarder has a keepalive_interval to send packets to the network server to allow downlinks to be sent at anytime. But this keepalive is only half configured without an autoquit_threshold to bring down the socket or exit the process.
Using the autoquit_threshold setting, as Peter mentined, will allow the process to exit when the network server is unavailable. When the process is restarted a new socket will be created on the changed interface.
This is also available in Kersing packet forwarder as it is based on Semtech code.
This can be tested over Ethernet by changing the ip address of the gateway.
Without the autoquit_threshold the packet forwarder will keep receiving packets and send them over the broken socket.
When autoquit_threshold is specified the process will quit and the socket reset to restore the connection automatically if there is an angel or monitor process to restart the packet forwarder.
I still have disconnections of my Multitech AEP gateways from the TTN server. I have to reboot the gateway each time it occurs. Don’t know what is the root cause, probably not even a connectivity issue.
Below are the last packet forwarder logs. Disconnection occured at this time or a bit before. No logs after this, until I rebooted 3 hours later.
Does anyone has an idea of what could be the issue and/or how to recover automaticaly from it ?
Regards,
Sylvain
admin@mtcdt:~# tail -f /var/log/lora-pkt-fwd.log
12:09:42 INFO: Flush output after statistic is disabled
12:09:42 INFO: Flush after each line of output is disabled
12:09:42 INFO: Watchdog is disabled
12:09:42 INFO: Contact email configured to ""
12:09:42 INFO: Description configured to ""
12:09:42 INFO: [Transports] Initializing protocol for 1 servers
12:09:58 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 30 seconds
12:11:02 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 60 seconds
12:11:50 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 30 seconds
12:12:33 INFO: [TTN] server “bridge.eu.thethings.network” connected
^C
admin@mtcdt:~# date
Mon Feb 19 14:54:14 CET 2018
Actually, same symptoms today roughly at the same time, with my 3 TTN gateways. Located at different places, so not an issue with the local connectivity, but rather related to a root cause on the TTn server side. How to gather information on server faults ? Anyway whaterver the server issue, the gateways should have recovered from this…
Once again connectivity of one my TTN gateways was lost 18 hours ago. But this time -unlike the last issue- I still have logs going (see below). No message regarding connection to TTN though.
Do you think it is another symptom of the same problem?