Just to be clear @htdvisser, as I understand it you or someone close to you took a conscious decision to remove the traffic tab and you didn’t think to tell anyone as you were doing it??
So instead me, @Jeff-UK and @kersing spend a few minutes checking, cross checking, spinning up browsers to make sure it’s not Safari/Chrome/FireFox on Win10/Win7/iOS/macOS/Linux, CLI, API etc etc and I post on Slack. BTW, not everyone uses the forum, so I’ll expand on your answer two hours after the fact.
I can respect that you have to take tactical decisions that we are not in a position to have reversed as we (on here at least) aren’t paying the bills. Or indeed even consult us.
But if TTN is going to continue to be the huge PoC for v3 that demonstrates to the corporate world you can run a large installed base, please communicate with us so we don’t end up spinning our wheels looking for answers. Because if you do it to us, how big do you have to be before you give a customer heads up on a short term critical change?
I think it would be fair to say many of us can figure out even the most cryptic of posts that we’d be happy to expand upon - so a terse “v2 gateway traffic tab is going to have to go, to much pressure on the NOC” would have eye brows raised but at least we’d know not to go hunting for answers.
Sorry but I can’t buy such an advice. It’s unfair to make such a change without prior warning. I agree with @descartes as we are not in a position to have reversed as we (on here at least) aren’t paying the bills. But such a practice could raise up in TTI quickly also, and this is the last what I want. You know, bad habits at home, bad habits everywhere.
Again, as @Jeff-UK wrote: " Ok, appreciate the explaination BUT this causes HUGE problems for many." You have done this step half of year too early, at least.
If the Mapper scrapes directly from the NOC that might be an issue as I’ve also noticed over last couple of hours that the noc url doesnt respond and connection times out - either for all V2 gateways or by selecting known individual GW’s - same result: time out… Combine that with changes above and fact individual GW page no longer shows a last connected item means there is now no direct way to check online status of V2 GW’s
I don’t think TTI are turning off the NOC just yet. Judging by Hylke’s quote above, it seems that the NOC will continue to exist (with all its problems), but it just won’t cause the gateway page to show “disconnected” when the NOC is down.
If that’s the case then I agree with it, since telling people their gateway is down when its actually up causes confusion every time the NOC crashes.
But like everyone else here, we absolutely need some way of telling whether gateways are connected or not. As gateway owners we don’t know if gateways are routing packets so we need the backend to tell us. I can’t think how to do this without the NOC, or running some code on the gateways themselves.
We’re indeed not turning off the NOC, nor are we removing live gateway traffic or gateway statuses from the Console. The only recent change is that the console hides NOC functionality when the NOC is down.
When this happens, you can use ttnctl to get the status of your gateway directly from the v2 Routers:
ttnctl gateways status your-gateway-id --router-id ttn-router-eu
I just restarted a bunch of servers, hopefully that improves the situation for now, but I can’t promise that it won’t go down again.
I understand not showing information that isn’t available. However now people are confused by suddenly missing tabs and fields. Would it be possible to keep those in place and display a message in stead of the values? That makes for a more consistent user interface.
Your first post-implementation message was just far too subtle for all of us.
The second is much clearer but is an egregious breach of good UI design - as Jak suggests, just put a message saying that data isn’t available rather than have us refresh our browsers to see if we’ve won the NOC lottery.
At best, show the last known data with the timestamp the info was last available.
Could the offending server processes be set to restart at midnight UTC so we have some info at some point.
user@descaaa6:~# ttnctl gateways list
ID Activated Frequency Plan Coordinates
1 eui-58awtffffe8017ec false EU_863_870 (0.000000, 0.000000, 0)
plus some more ...
user@descaaa6:~# ttnctl gateways info eui-58awtffffe8017ec
INFO Found gateway
Gateway ID: eui-58awtffffe8017ec
Frequency Plan: EU_863_870
Router: ttn-router-eu
Auto Update: on
Owner: descartes
Owner Public: yes
Location Public: no
Status Public: no
Brand: The Things Network
Model: Indoor Gateway
Placement: indoor
AntennaModel: Built in
Description: TTN Indoor Gateway on-the-go
Access Key: ttn-account-v2.CWuW2lg1-kvawoNwtfffe8017ecwyjoxxbnww
Collaborators:
- Username: descartes
Rights: gateway:settings, gateway:collaborators, gateway:status, gateway:delete, gateway:location, gateway:owner, gateway:messages
user@descaaa6:~# ttnctl gateways status eui-58awtffffe8017ec --router-id ttn-router-eu
INFO Discovering Router...
INFO Connecting with Router...
INFO Connected to Router
FATAL Could not get status of gateway. GatewayID=eui-58awtffffe8017ec error=unavailable: connection error: desc = "transport: Error while dialing dial tcp 52.169.76.255:1901: i/o timeout"
Thanks Hylke, I checked over breakfast earlier and saw ‘Last Seen’, Traffic tab and GW Overview Connected/Not Connected colums all back
Are there other items we should be aware get hidden in these circumstances (to save us trawling or having to respond to user cry’s for help)? We can get them documented and flagged on the forum so users know not to panic!
So to clarify what you are saying is if NOC goes down the various pages will automagically remove the page elements related to NOC - as called out - last seen, traffic tab and for GW overview page the con/not con column? When NOC back up they automagically re-appear?
As Jac says this is inconsistent UI/experience so would be good if the repective elements remained in place but instead called out something like ‘sorry there is a noc issue’ so user knows it’s not their GW or their browser or whatever. E.g. the Traffic tab could still be shown but with a line saying ‘noc is down, no current data available for display’ or some such, GW overview lines could substitute ‘noc down’ for connected/not connected.
As mentioned on other threads and #ops last nigh the noc also stopped responding to browers with connection timed out error suggesting noc was down/not responding… can that be scripted and checked with a simple watchdog and then if down for more that say 10 mins (to allow for some off time for maintenance/updates) then automagically trigger a server/noc restart - that would save you the hastle of having to go beat it with a stick as and when needed - appreciate this may not be the fun part of your job or a priority therefore automating makes sense? Doing this automatically would also stop the flood of forum or #ops posts when one of the pages starts to noc has thrown a wobbly again (note though data coming through on consle I checked noc url (noc.thethingsnetwork.org:8085/api/v2/gateways) for overview of gateways and for some individual known good gw’s earlier and still getting connection time out problem)
As Andrew says we absolutely have to have a way of monitoring GW status, I would say ~1/4-1/3rd of my personal deployed GWs do not carry traffic for me regularly (hourly/daily) - some may see my traffic within a given month but that is no use for status monitoring via data received (they are depoyed for community benefit). These days I try to follow best practice of locating a canary (often a chosen functional data gathering node) close to a gw so full path to backend can be monitored and gw status verified, but only adopted that approach after some time on TTN and after a few harsh lessons - and ofcourse those early canaryless deployments are all on V2!
Maybe awkward formulation from my side - I agree with observations. TTN have made a lot of problems with this step. I’m scared what will be the next such step
I see application traffic for my devices so suspect it’s your end… is node joined and transmitting ok, do you see activity in your gw log and traffic page etc.
Looking deeper it depends on when you were checking - there was significant outage of V2 earlier which was recovered but then seemed to go into decline again