Problem - Are NOC outages the new norm?

There have been a lot of conversations of late about Gateway’s appearing offline and comments about stability issues with the NOC API. I’ve noticed a lot more of this happening in recent weeks here in Australia, though it’s not limited to “here”

For example (at the time of writing, 6 days offline)
[EDIT - now online - though you get the idea…]
http://noc.thethingsnetwork.org:8085/api/v2/gateways/core-electronics-sentrius-rg191

Our community often shows around half of the Gatways offline https://www.thethingsnetwork.org/community/newcastle-lakemac/

Every week or so, they come good. Though slip back into an offline state for days on end.

Does anyone have insights worth sharing with what’s going on?

I have seen similar behavior, gateways have been very unstable in the last few weeks. majority of the gateways that have been unstable are running through the meshed router. @Maj is there anything strange going on with TTN in australia? I don’t think I’m the only one that’s seeing the same thing. I’m not blaming anyone I would just like how to fix it.

@htdvisser is there something we are missing in the gateway config eg some reconnect flag we should set.

	"gateway_conf": {
	"server_address": "router.au.thethings.network",
	"serv_port_up": 1700,
	"serv_port_down": 1700,
	"servers": [ {
		"server_address": "router.au.thethings.network",
		"serv_port_up": 1700,
		"serv_port_down": 1700,
		"serv_enabled": true
	} ]
}

Are the gateways just appearing offline in the console or have they stopped routing packets as well? I know there have been some NOC issues at the global level recently, which affects the “last seen” time in the console.

We (Meshed) monitor all the gateways we’ve deployed and we know within minutes is there’s a widespread outage, either real or perceived (ie Console/NOC issue). There has been some occasional outages that affect multiple gateways, but they typically last less than 30 mins. The last one that ran for several hours was about a month ago.

The first place to look is the gateway packet forwarder log. Is the gateway still online and forwarding packets? Post output here if unsure. Then we can work out if it’s related to a particular gateway, or more widespread.

Thanks for sharing that @Maj. I don’t think this is a Meshed or any specific hardware issue, as you said it’s happening at a global level.

Great to hear you’re keeping an eye on those devices though!

At Digital Catapult in the UK we have 2 or 3 times in the last year seen “not connected” status for large numbers of gateways on the console, yet on checking/testing the gateways are functioning correctly and passing traffic through. It is the status on the console that is wrong, not the gateways themselves.

I think this supports your view that it is not a Meshed issue at all but a more global one.

Cheers @mark-stanley, and yep, similar findings to us (no data loss, just that the Gateways are being reported as offline).

It’ll be interesting to hear if others have experienced this as well.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.