this seems to be the problem: https://github.com/TheThingsNetwork/lorawan-stack/issues/1730 … and I think, it will be fixed asap
Bad news: Each of my two TTIGs doesn’t connect since 2 days.
They blink fast green and sometimes green/red.
Disconnecting power for more then 10 minutes doesn’t help.
Debug Output says:
[AIO:INFO] cups has no cert configured - running server auth and client auth with token
[AIO:ERRO] [-1] HTTP connect failed: UNKNOWN ERROR CODE (0052)
[AIO:DEBU] [-1] HTTP connection shutdown…
[CUP:ERRO] CUPS connect failed - URI: https://mh.sm.tc:7007
and here the corresponding Debug lines of an earlier successful boot:
[AIO:INFO] cups has no cert configured - running server auth and client auth with token
[CUP:VERB] Retrieving update-info from CUPS https://rjs.sm.tc:9191…
[AIO:DEBU] [2] HTTP connection shutdown…
[CUP:INFO] Interaction with CUPS done (no updates) - next regular check in 1d
[TCE:INFO] Starting TC engine
Both URIs (https://mh.sm.tc:7007 and https://rjs.sm.tc:9191) are connecting with my Firefox Browser
and show the following message:
{“error”:“Invalid or missing input Expecting value: line 1 column 1 (char 0)”}
Both TTIGs are on
[SYS:DEBU] Station Version 2.0.0(minihub/debug)
[SYS:DEBU] Version Commit e17c5af
[SYS:DEBU] Station Build 2018-12-06 09:30:37
[SYS:DEBU] Firmware Version 2.0.0
[SYS:DEBU] FW Flavor ID semtech0
[SYS:DEBU] Model minihub
This TTIG was reset by pressing the Reset Button in Config Mode.
The other TTIG was left in its state.
The behaviour of both seems to be the same.
Who can help ?
Hi @ll, the main problem of not connecting was a misconfigured DNS Server on my site. But the general Problem, if TTIG is disconnected by WLAN or Power interruption stays the same as before, has even got worse since the timeout was rised from 60 to 600 seconds. I have to disconnect the TTIG for 10 minutes, then the connection is ok for sometimes half a day, sometimes even half an hour. The thing is absolutely unreliable. I’ve transported my second TTIG to another location, where it connects via a Sophos firewall to fibre cable (an absolutely reliable company network), but problems are the same … my next try will be a connection via mobile phone tethering, this worked some months ago. It’s an epic fail, since some months i try to get this damned thing working.
As it is running on an ESP processor, I know a thing or two about those.
You may want to check the following for a very stable WiFi:
- Use fixed channel for the wifi used by this gateway.
- Test with B/G only set in the access point (or use a B/G-only one) to see if the gateway will accept it.
- Try to use an AP for only ESP base units.
My indoor gateway is also as unstable as a drunk on roller skates for the last few days.
For example we had to power cycle everything here in house and thus the gateway connected before the cable modem had a connection. This was apparently enough to not being able to make a connection to the TTN backend. It is now powered off to see later this evening if it can connect again.
It would be nice to know what is wrong here and if there is something we as users can do to make it reliable again.
Sorry for my bad English … it’s a hard work for an old man to write his thoughts down here 8-;))
The problem seems to be the following: An interrupted connection (by WLAN ot power outage for example) doesn’t teminate correct on server side and persists … then the TTIG opens a new one and confuses the server (see bei’s wild guess in post #61 about 9m ago). After a timeout of 600 sec (was previous 60 sec) the server side kills this dead connection and it’s ID. So the new connection can’t work correctly because of this missing ID. This is my interpretation of bei’s Analysis, don’t know, if it is correct. I’m hoping since 9 mths that someone on serverside may correct this issue. In the meantime for reliable work i will buy some other gateways, in our ongoing user group at the Bürgernetz Dillingen we will buy a LG308 or perhaps a Kerlink … no more TTIG, I’m very frustrated. My first plan was to build a RasPi + 880 GW, but i decided to use a TTIG, I thought: Built by TTI it must be reliable and we could concentrate on Node Development … hahaha … the damned thing should be able to handle broken connections, but it isn’t. And closed source software doesn’t make things easier …
Well, with the WiFi tips I gave, you could minimize the connection interruptions.
My indoor gateway did appear to be working fine for a while, but like I said the last few days were really horrible regarding stability.
Or maybe it is right now that I experience almost every hiccup as I’m testing a lot with it right now.
The broken connection problem persists for months on my side, the only connection that worked for more then one day was tethering by a mobile phone, very funny …
@TD-er: Many thanks for your good tips, maybe they help to make it a little bit more reliable, but the problem itself persists. Using a dedicated AP for the TTIG indeed seemed to solve the problem for some hours, but not forever … if the TTIG did’t receive and forward any packet in the timeout period the connection also terminated properly and the problem disappeared.
By forcing the access point to B/G only, you also may increase stability of the WiFi connection as it does increase the sensitivity of the WiFi radio. (also allows to operate stable in a very noisy environment)
Full ack, using a dedicated AP on B/G only could be an improvement, thks … I’ll give it a chance.
The better solution, I think is the LG308, connected via wire. We will try it Thursday evening …
we will see, which new bugs come with the new gateway 8-;))
Hello Everyone,
As you may have noticed over the last few days, we have been updating some of our backend components in preparation for V3. As a part of this, we’ve deployed a fix to handle unclean disconnection of TCP connections. This should improve the stability of the gateways during ISP rests, or powercycles.
We’re testing Server side WebSocket ping which will prevent the server from disconnect gateways every x seconds and we hope to deploy that soon.
Please comment on this thread with some debug information on issues if any.
Thanks,
Krishna
Hello Krishna,
many thanks for your information about the unclean disconnection,
my critical TTIG, which used to fail reconnecting after a powerfail or WLAN break is online since yesterday 13:00 o clock. After an outage today (I think WLAN) at 12:15 it reconnected at 13:00 without my intervention. I think your fix could have been successful. I’ll watch this behaviour the next days and will report it here.
2020/02/21 7:30 … by chance I saw at the moment my TTIG blinking red/green … means it had lost WLAN Connection and was establishing it again. After shortly blinking fast green the LED got solid green … means established connection to CUPS and configured correctly. Looking at TTN Console there was an outage of about 10 min. Perfect … I think and hope, the issue I described above is solved. Many thanks…the timeout of 600 instead of the initially 60 seconds isn’t a big problem for me and my apps.
My problem seems, as mentioned above by TD-er, WLAN disconnects, which I’ll try to minimize now. I’ll establish an AP of its own for the TTIG, operating BG only.
My second TTIG, connected to a corporate WLAN, which is switched off every night, runs stable the last two days … also a big improvement. This device was implemented for testing purposes only and will be switched off within the next 2 weeks.
Hello Everyone,
Our latest update seems to have gone well and we got some positive feedback on Forum/Slack. So we’re going to deploy another update where the server supports WebSocket Pings. This will keep TTIG connections alive even when there’s no upstream data. The ping interval is set to 30s.
This will be deployed within the next hour in the US-West cluster and based on it’s performance, soon in the EU cluster.
Regards,
Krishna
Hey thanks for the confirmation. With the newer release, even this 10 min wait time will not be necessary. Let’s see how the update goes.
Hello Krishna,
Within the last 6 days there was only one outage on my “critical TTIG” (maybe it lost WLAN Connection) lasting some hours. Your latest update seems a big improvement. Is the Update in the US West cluster ok ? When will it be implemented in the EU Cluster ?
regards
Franz
Hey Franz,
The updates are now rolled out to the EU cluster as well.
Regards,
Krishna
Hello Krishna,
my TTIGs work like a charm … I’m really impressed
many thanks for this solution 8-;))
cu
Franz
Yes @KrishnaIyerEaswaran2, it works fine now.
But still missing the TTNmapper integration (read: not shown on the map with use of the app)
Yes, location is set correctly and visible for the public
BR,
Jeroen
After i noticed the new firmware rollout post here, today i reinstalled my TTIG.
Now it delivers time of day in metadata, great!
But: it’s about 1 second late, compared to my MatchX1701 gateway which uses GPS time.
Or the MatchX1701 is wrong?
Need to analyze this further. Will keep you posted here.
"gateways": [
{
"gtw_id": "eui-40d63cfffeMATCHX",
"timestamp": 64530420,
"time": "2020-02-28T18:20:49.389245Z",
"channel": 5,
"rssi": -69,
"snr": 8,
"latitude": 52.53737,
"longitude": 13.41779,
"altitude": 60
},
{
"gtw_id": "eui-58a0cbfffeXXTTIG",
"timestamp": 568394828,
"time": "2020-02-28T18:20:48.352178096Z",
"channel": 0,
"rssi": -78,
"snr": 8.25
}
]
}
‘GPS Time’ is normally understood to be the time at the beginning of 1980. Since then there has been 18 leap seconds added, so there is now an 18 second difference between the time the GPS network uses as a base and UTC time.
I suspect what you mean is ‘time obtained from a GPS’. The time taken from the GPS (and then used by the gateway) can be different from UTC time.
Its often assumed GPSs put out UTC time, this is not always the case.
I know all that, but it does not explain why we see a difference of around 1 second here.
The MatchX gateway uses time obtained from gps for feeding the PPS input line of Semtech’s concentrator chip. The packet concentrator code (Semtech) draws absolute time from NMEA sentences of same gps signal. For me it looks like this code has a bug: if NMEA record arrives near top of second, it is intepreted with the next pps pulse, instead with previous. This makes the absolute time advance by 1 second.
If this is the root cause here, the “new” TTIG time will probably be correct.