Monitoring Gateway status

gnsbglora · June 19, 2024, 1:58pm

Hello,

We have couple gateways running on raspberry pi’s using basics station and SX1301, SX1302 and SX1303 concentrator boards. Basics station is’n very talkative about the connection status without parsing the log files, so connection issues are sometimes going unnoticed for a while.
I’m trying to create a python script to monitor the connected/disconnected gateway status through the REST API. I’m able to get a list of the gateways but it only contains the gateway ID’s, EUI, created at date and modified at date.
I’m looking for at least the connected/disconnected status that is visible at the console gateway list. For some reason it only seems to work on the eu1 server, the nam1 server produces an error message.
Any Ideas?

#!/usr/bin/python3 -u

import requests
api_key = “API Key”

base_url = “https://eu1.cloud.thethings.network/api/v3/gateways”

headers = {
“Authorization”: f"Bearer {api_key}",
“Accept”: “application/json”,
}

try:
# Send GET request to retrieve gateway list
response = requests.get(base_url, headers=headers, timeout=10)
response.raise_for_status() # Raise exception for non-2xx status codes
# Parse JSON response
data = response.json()
print(data)
print("")
# Check if the response contains the gateway list
if 'gateways' in data:
    gateways = data['gateways']

    # Iterate through gateways and print details
    for gateway in gateways:
        print(f"Gateway ID: {gateway['ids']['gateway_id']}")
        print(f"Description: {gateway.get('description', 'N/A')}")
        print("-" * 20)
else:
    print("No gateways found in the response.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving gateway information: {e}")

Jeff-UK · June 19, 2024, 3:32pm

A little searching of Forum will tell you best practice for monitoring often involves use of a ‘canary’ node in reasonably close area of a GW installtion - such that node can run reliably at SF7, and possibly with reduced TX pwr and still be heard by GW. SF7 means short time on air and low power helps avoid ‘near/far signalling problems’ whilst minimising loss of gw capacity and has advantage over simple ‘am I connected checks’ in that it also serves to test the RF front end and demosulation subsystems of the GW vs just the IP media connectivity (A gateway might appear fully operational and ok if lust looiking at the connection status, but might have been rendered deaf to LoRa signals by a close proximity lightnening strike, or for older installations creaping corrosion of RF system etc.). Just a thought!

You then check for the canary message every x mins and set a threshold for missed messages before flagging an alarm to go investigate further… if several GW’s are arrayed in a genera area then a single canary can often help ‘monitor’ a number of GW’s that have been positioned for redundancy or overlapping or dead zone coverage

gnsbglora · June 19, 2024, 3:58pm

I did see the canary node suggestion and that’s what we’re using right now. We have a mqtt client listening to some clients and reporting through which gateways the messages came. However the alarm would only go off if ALL gateways in range are down. That’s why I was looking into the API solution to get the status information from the source without a workaround. It would be cool to have the possibility to subscribe a mqtt gateway but that doesn’t seem to be possible either. Since the API is there, why not use it. The question is how

Jeff-UK · June 19, 2024, 4:20pm

You appear to be looking at the GW (status list), suggestion wrt canary is look at the metadata of the canary message - if your target GW is not in the list of those handling the message then you have possible problem,…viable for monitoring even in high GW deployment density scenarios and doesnt rely on

As long as a GW in range hears the node you will get a message which you can then parse for the target GW, you are not then left wondering if node has died or GW died, unless the only GW in the area hearing your node is yours. If you have other nodes in the area delivering through that GW, where it is the only one in range, you know the GW is ok and can have confidence the canary is the problem and go fix, if no other nodes messages getting through and the canary message missing then good chance it is the GW… all comes down to probabilities and statistics and you can balance message rates (canary and ‘real’ nodes) etc to get the warning latency and problem probability you find acceptable…

Another point is hitting the API for ALL Gw’s in the system is a longer list/bigger task (especially if done repeatidly and quickly) vs just asking the server for the data associated with one canary node (indeed if you treat the canary as ‘just another node’ (*) the TTS Integration’s flow will supply the data stream anyhow as part of business as normal)…means you take pity on the community compute resources

(*) I often designate a standard e.g. T&H sensor deployed normally as being my chosen ‘canary’ for a given deployement, rather than haing a dedicated device just for the task.

descartes · June 19, 2024, 4:51pm

Just to help with focus, there is no suitable API to query a gateway’s status in the manner that you would hope would work. The connected status on the console is not updated in real time and is not considered anything more than a convenience.

There are ways to get stats and the big boy versions of TTI do have a NOC, which you may want to consider as TTN doesn’t come with a SLA, so @rish1 can advise further on that, particularly as TTN is not meant for real life use.

The very best way is the canary route as the list of gateways that hear that canary can be monitored, even if they aren’t all yours, and you can then construct a metric for how vulnerable a device is - like is it heard by more than one gateway …

gnsbglora · June 19, 2024, 7:36pm

Well, thanks for looking into this issue. I’ll keep going with monitoring the metadata of the clients. We’re in an area where our gateways are the only ones in a 50km area (SW Florida). Not all clients see all gateways. It would have been nice to find something more accurate/definitive than that. Maybe it’s something for a future feature in the API or MQTT access.

kersing · June 19, 2024, 8:25pm

It is part of the added value in the commercial offering so don’t hold your breath…

descartes · June 19, 2024, 9:11pm

The gateway being online is nothing, it just means it is connected to the internet, it can’t tell you that the antenna or the concentrator card is OK.

The absolute definitive “it’s working” measures is …

… the designated canary device. It uplinks, everything from there to the LNS has to be working for you to get an update, that tells you everything, so much more than just some ping from the gateway.

gnsbglora · June 19, 2024, 11:17pm

Sure, and we’re doing that for the monitoring. What triggered my looking the gateway connect/disconnect status, is that most gateways are boxes without a screen installed somewhere on a pole or other difficult to access spots. And when they’re turned on, you get a light for power and maybe a light for being ready to service. That ready light has to be fed from something. I can control the light in linux by software but it seems there is no easy way to determine the connect status before I climb down the ladder and turning on my phone or laptop. We’re working on sensing, automating and controlling the things in the network but it seems to be difficult to sense and control the network components itself. I always find it surprising how hard it is to find the information I need.

descartes · June 20, 2024, 8:25am

Carry a canary device and a mobile phone - the console works just fine on the small screen!

gnsbglora · June 22, 2024, 11:05pm

Here an example how to get the gateway connection status through the API:

#!/usr/bin/python3 -u

import requests
import json

api_key = "NNSXS.XXXXXXXXXX"

base_url = "https://nam1.cloud.thethings.network/api/v3/gs/gateways/eui-xxxxxxxxxxx/connection/stats"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Accept": "application/json",
}

try:
    response = requests.get(base_url, headers=headers, timeout=10)
    response.raise_for_status()

    data = response.json()
    print("Raw Data")
    print(data)
    print("")

    if 'connected_at' in data:
        connection_time = data['connected_at']
        connection_state = "connected"
    elif 'disconnected_at' in data:
        connection_time = data['disconnected_at']
        connection_state = "disconnected"
    else:
        connection_time = None
        connection_state = "unknown"

    protocol = data['protocol']
    ip_address = data['gateway_remote_address']['ip']

    print(f"Connection State: {connection_state}")
    print(f"Connection Time: {connection_time}") 
    print(f"Protocol: {protocol}")
    print(f"IP Address: {ip_address}")

except requests.exceptions.RequestException as e:
    print(f"Error retrieving gateway information: {e}")

descartes · June 23, 2024, 7:52am

Potentially, but:

and the developers have told us several times in the past that of the gazillion things that are updated when a packet comes in, the gateway connected status is way down the list, so YMMV.

And the API query is JAL (just another load) on the LNS, whereas if you read your incoming uplinks you get heads up straight away. Now if you are going to say, as above, that you are the only gateway for miles around, apart from not being verifiable - there could be 100’s, just not visible to you, shared spectrum & all that, then if you are hoping this status check will save you the cost of having some redundancy then you may need to get out a used envelope & pencil. If you can bear the loss of uplinks whilst you go to see what has happened to your gateway, then you’re golden. But if the time it takes to go to fix / deploy a new one is longer than you can can cope with, now is the time to put in a second gateway.

system · June 26, 2024, 4:25pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.