I’m experiencing strange DL behavior, confirmed DL attempts till success changes depending on time interval the Node sends test “Hello!” messages. If TX_INTERVAL=60 (seconds) it takes about 7 on average DL attempts for the Node to actually receive that DL. If TX_INTERVAL=15 success rate is close to 100%, like almost each attempt is received and confirmed.
Any ideas, where to look for a solution?
If it’s “node listening at the wrong time” how to fix it?
My setup: SX1276 LoRa module + Arduino Pro Mini + LMIC library <—OTAA—> The Things Indoor Gateway
I can confirm this behaviour and put it down to the way the TTIG disconnects from the TTN servers to save resources,and gets into timing issue of store and forward before getting DL message.I often see the issue when using ADR and confirm message when duty cycle set to 300 seconds + it never correctly goes to correct SF. When at 30 seconds or less it does.I am still hopeful that TTI issue new update for TTIG to fix the Flashing Green light issue of disconnect of session toTTN server.I now try to use sketches that initially duty cycle less then 30 seconds then move to greater interval. Simon
Indeed, it’s quite likely that downlinks might arrive too late (or might not even be delegated to the TTIG by the backend?) when a TTIG is not receiving a lot of traffic:
This includes confirmed uplinks and ADR (after 1 or 2 seconds), and less likely the very first OTAA Join Accept (after 5 or 6 seconds). Given its indoor use, it’s quite likely that it’s not receiving many packets from other users?
A summary of some problems discussed hidden in other topics:
Some report it is not buffering the uplink:
…but others see it being sent, but much later, showing “retry” for uplinks that are actually only sent once:
Also, it seems it might cause uplinks to go missing if it’s already reconnecting:
Like also mentioned by @Vaelid above, this seems to affect ADR:
I ran my setup with 45 second intervals for some time and got ~100% success rate, the server connection issue looks very likely.
Is there a way to widen the RX window with LMIC library?
Do you see the downlink in the TTIG Traffic page in TTN Console? Is there any way you can check if the TTIG accepts/transmits it? (Most gateways will simply drop downlinks that are too late.)
(If you think it was transmitted, see LMIC_setClockError. But please keep this thread on topic; it’s not about LMIC.)
Yes, all DL are visible in console, none missing, even those who are never received at node side. Small amount of UL are being lost, that’s in the acceptable range for LoRa.
About LMIC, I have played around that MAX ERROR, and now it’s on 1%, could not find any other value to make the results more consistent. Sorry for off-topic, I’ll make another thread.
If this is the cause (which must for the moment be considered unproven), the way to combat is probably not to alter behavior of real nodes, but rather to do something like make a dummy node set to minimum power with a resistor in place of an antenna which periodically sends a packet that is valid LoRa, but invalid LoRaWAN and so will be passed by the gateway but quickly ignored by the backend servers. Though first start with an actual node with those power reducing measures, and verify if it improves performance of the other nodes. The point is that if you need something to work around a bug and keep the gateway connected, you only need one such thing, you shouldn’t need to have all your nodes chattering away at a high rate consuming battery and bandwidth.
Of course the only real solution is for the firmware source of the TTIG to be released so that the community can fix it. The fact that this did not happen at the time of product release is grotesquely in conflict with everything TTN would appear to stand for. That’s an issue regardless if gateway bugs are to blame for this particular situation, or not.
That’s somewhat dubious, this would simply not work in a network with a 1-second RX1, so if it is happening, it is probably unforeseen behavior of components used rather than an intentional feature.
Taking into account that a GW reconnects almost immediately, and TLS handshake consumes a lot of resources server-side… It seems to be a pretty questuionable optimization.
Are you sure they have an access to the sources (developed by TrackNet and/or Gemtek)?
But it seems this issue can be resolved at server side.
2019-05-30 13:41:06.522 [AIO:DEBU] ssl_tls.c:6546 MBEDTLS[1]: mbedtls_ssl_read_record() returned -30848 (-0x7880)
2019-05-30 13:41:06.526 [AIO:ERRO] Recv failed: SSL - The peer notified us that the connection is going to be closed
2019-05-30 13:41:06.532 [AIO:DEBU] [2] WS connection shutdown...
2019-05-30 13:41:06.543 [TCE:VERB] Connection to MUXS closed in state 4
2019-05-30 13:41:06.551 [TCE:INFO] MUXS reconnect backoff 1s (retry 0)
So to summarise it is an issue and there are what I see as 3 solution.
Keep the web socket Connections permanent open. - consumes resources.
Action by TTI on TTN Servers.
Keep sending traffic to node say every 20 seconds with short payload.This seem wasteful in air time and on TTN server resources but is the only option open to gateway / node owners.This is easy to set up with a spare node.
TTI Change server software to take account of the disconnect of web socket in regard to handling down links and intersection to the TTIG.- Action by TTI.
I hope TTI looks seriously at this issue and resolve it as it causes issues with ADR and Acknowledge responses and packet loss. To have all of the TTIG owner sending dummy traffic to keep gateway open is very wasteful on TTN.
Related to above when will there be update software for TTIG?
During Semtech’s Basicstation (workshop) presentation held at The Things Conference last January, it was mentioned that Basicstation itself is open source, but the implementation of the ‘Platform Layer’ and ‘Radio Layer’ building blocks for ESP8266 (used on the TTIG) are closed source.
This makes availability of an open TTIG project on Github less likely.
Is there a way to widen the RX window with LMIC library?
Well I guess the answer is yes and no… Search in the radio part of the code for the register LORARegSymbTimeoutLsb. It defines the timeout in RX1 and RX2 in terms of number of symbols. For instance in RX2, receiving SF9, if you set this value to 80 symbols, it will be about 300 ms for the LoRa module to timeout if no preamble is received. If you set a higher value, the LoRa module will search longer for a preamble, and would also capture a transmission that may have started somewhat late.
But remember you can’t shift RX2, it needs to start at 2 seconds after TX_ready, so the only thing you can do is extend the timeout. And the maximum would be 255 symbols.
Is there anything new on the subject? I came across this problem with my TTI Gataway. If my node sends an uplink every 10 minutes it is obviously too long and many values are lost. Now I have additionally installed a dummy node that sends every 2 minutes and now all values are transmitted without loss even at 10 or 15 minute intervals. But that doesn’t seem to be a really good solution to me, if you should actually keep the traffic low.
During the conference I got confirmation that work on ongoing on new firmware, however progress is very slow. TTN does not own the sources and can’t do the work themselves and they can’t release the sources to allow the community to improve things which is a shame as I know there are several avid ESP developers in the community that should be able to help out.