TTIG does not support downlinks when not frequently receiving uplinks?

didzis · November 22, 2019, 11:04pm

I’m experiencing strange DL behavior, confirmed DL attempts till success changes depending on time interval the Node sends test “Hello!” messages. If TX_INTERVAL=60 (seconds) it takes about 7 on average DL attempts for the Node to actually receive that DL. If TX_INTERVAL=15 success rate is close to 100%, like almost each attempt is received and confirmed.

Any ideas, where to look for a solution?

If it’s “node listening at the wrong time” how to fix it?

My setup: SX1276 LoRa module + Arduino Pro Mini + LMIC library <—OTAA—> The Things Indoor Gateway

Vaelid · November 23, 2019, 7:48am

I can confirm this behaviour and put it down to the way the TTIG disconnects from the TTN servers to save resources,and gets into timing issue of store and forward before getting DL message.I often see the issue when using ADR and confirm message when duty cycle set to 300 seconds + it never correctly goes to correct SF. When at 30 seconds or less it does.I am still hopeful that TTI issue new update for TTIG to fix the Flashing Green light issue of disconnect of session toTTN server.I now try to use sketches that initially duty cycle less then 30 seconds then move to greater interval. Simon

arjanvanb · November 23, 2019, 9:11am

Indeed, it’s quite likely that downlinks might arrive too late (or might not even be delegated to the TTIG by the backend?) when a TTIG is not receiving a lot of traffic:

…to which @tchenier already responded:

This includes confirmed uplinks and ADR (after 1 or 2 seconds), and less likely the very first OTAA Join Accept (after 5 or 6 seconds). Given its indoor use, it’s quite likely that it’s not receiving many packets from other users?

A summary of some problems discussed hidden in other topics:

Some report it is not buffering the uplink:

TTIG packetlosses

the packet is simply lost… shouldn’t that gateway buffer the packet it recieves, connect to websocket, and send it?
…but others see it being sent, but much later, showing “retry” for uplinks that are actually only sent once:

New gateway: The Things Indoor Gateway

I noticed that it takes ~2 seconds when the packet was received by the TTIG till it starts a new WS connection and another second till the connection is established and the packet get’s send. If the packet is received by multiple gateways deduplication doesn’t work anymore and the packet shows up as “retry” in the TTN console.
Also, it seems it might cause uplinks to go missing if it’s already reconnecting:

The Things Indoor Gateway - TTIG part 1

Is there any chance the TTIG aggressive disconnect/reconnect logic could be relaxed a little?

I have a single TTIG that I use for testing various end nodes. One node in particular reports once an hour. It’s the only Lora device around and there are no other gateways in reach. The problem is that if the TTIG is in the middle of reconnecting, it seems to drop or not hear the node. This happens quite frequently.
Like also mentioned by @Vaelid above, this seems to affect ADR:

TTIG and ADR

I can only conclude that the TTIG doesn’t do well with ADR if it doesn’t get a packet from anything within 30 seconds. This is because it seems to do a store and forward of the received packet while it reconnects to TTN and thus will miss the downlink slot.

I don’t know if there’s any GitHub project where this might be/have been reported on? (I don’t see it on, e.g., The Things Network · GitHub, The Things Products · GitHub, The Things Industries · GitHub, or maybe it is part of GitHub - lorabasics/basicstation: LoRa Basics™ Station - The LoRaWAN Gateway Software ?)

didzis · November 23, 2019, 10:16am

I ran my setup with 45 second intervals for some time and got ~100% success rate, the server connection issue looks very likely.
Is there a way to widen the RX window with LMIC library?

arjanvanb · November 23, 2019, 10:27am

Do you see the downlink in the TTIG Traffic page in TTN Console? Is there any way you can check if the TTIG accepts/transmits it? (Most gateways will simply drop downlinks that are too late.)

(If you think it was transmitted, see LMIC_setClockError. But please keep this thread on topic; it’s not about LMIC.)

didzis · November 23, 2019, 11:10am

Yes, all DL are visible in console, none missing, even those who are never received at node side. Small amount of UL are being lost, that’s in the acceptable range for LoRa.

About LMIC, I have played around that MAX ERROR, and now it’s on 1%, could not find any other value to make the results more consistent. Sorry for off-topic, I’ll make another thread.

cslorabox · November 23, 2019, 2:43pm

If this is the cause (which must for the moment be considered unproven), the way to combat is probably not to alter behavior of real nodes, but rather to do something like make a dummy node set to minimum power with a resistor in place of an antenna which periodically sends a packet that is valid LoRa, but invalid LoRaWAN and so will be passed by the gateway but quickly ignored by the backend servers. Though first start with an actual node with those power reducing measures, and verify if it improves performance of the other nodes. The point is that if you need something to work around a bug and keep the gateway connected, you only need one such thing, you shouldn’t need to have all your nodes chattering away at a high rate consuming battery and bandwidth.

Of course the only real solution is for the firmware source of the TTIG to be released so that the community can fix it. The fact that this did not happen at the time of product release is grotesquely in conflict with everything TTN would appear to stand for. That’s an issue regardless if gateway bugs are to blame for this particular situation, or not.

That’s somewhat dubious, this would simply not work in a network with a 1-second RX1, so if it is happening, it is probably unforeseen behavior of components used rather than an intentional feature.

didzis · November 23, 2019, 3:28pm

I changed TX_INTERVAL to 180 seconds and suddenly all works fine. Mystery!

Maybe all this fuss was just because of LMIC’s example code default time interval - 60 sec, who by some coincidence conflicts with TTN?

hobo · November 23, 2019, 4:46pm

Taking into account that a GW reconnects almost immediately, and TLS handshake consumes a lot of resources server-side… It seems to be a pretty questuionable optimization.

Are you sure they have an access to the sources (developed by TrackNet and/or Gemtek)?

But it seems this issue can be resolved at server side.

    2019-05-30 13:41:06.522 [AIO:DEBU] ssl_tls.c:6546 MBEDTLS[1]: mbedtls_ssl_read_record() returned -30848 (-0x7880)
    2019-05-30 13:41:06.526 [AIO:ERRO] Recv failed: SSL - The peer notified us that the connection is going to be closed
    2019-05-30 13:41:06.532 [AIO:DEBU] [2] WS connection shutdown...
    2019-05-30 13:41:06.543 [TCE:VERB] Connection to MUXS closed in state 4
    2019-05-30 13:41:06.551 [TCE:INFO] MUXS reconnect backoff 1s (retry 0)

(log’s taken from Hacking the TTI Indoor Gateway - Tinkerman )

So it seems to be the server what closes connection.

Vaelid · November 25, 2019, 9:14am

So to summarise it is an issue and there are what I see as 3 solution.

Keep the web socket Connections permanent open. - consumes resources.

Action by TTI on TTN Servers.

Keep sending traffic to node say every 20 seconds with short payload.This seem wasteful in air time and on TTN server resources but is the only option open to gateway / node owners.This is easy to set up with a spare node.
TTI Change server software to take account of the disconnect of web socket in regard to handling down links and intersection to the TTIG.- Action by TTI.

I hope TTI looks seriously at this issue and resolve it as it causes issues with ADR and Acknowledge responses and packet loss. To have all of the TTIG owner sending dummy traffic to keep gateway open is very wasteful on TTN.

Related to above when will there be update software for TTIG?

bluejedi · November 25, 2019, 9:55am

During Semtech’s Basicstation (workshop) presentation held at The Things Conference last January, it was mentioned that Basicstation itself is open source, but the implementation of the ‘Platform Layer’ and ‘Radio Layer’ building blocks for ESP8266 (used on the TTIG) are closed source.
This makes availability of an open TTIG project on Github less likely.

rharte · November 28, 2019, 12:21am

@didzis

Is there a way to widen the RX window with LMIC library?

Well I guess the answer is yes and no… Search in the radio part of the code for the register LORARegSymbTimeoutLsb. It defines the timeout in RX1 and RX2 in terms of number of symbols. For instance in RX2, receiving SF9, if you set this value to 80 symbols, it will be about 300 ms for the LoRa module to timeout if no preamble is received. If you set a higher value, the LoRa module will search longer for a preamble, and would also capture a transmission that may have started somewhat late.
But remember you can’t shift RX2, it needs to start at 2 seconds after TX_ready, so the only thing you can do is extend the timeout. And the maximum would be 255 symbols.

cslorabox · November 28, 2019, 5:24am

Actually the maximum is 1023 but the upper two bits are in a different register.

rharte · November 28, 2019, 8:21am

Aha, I see, they are mapped in RegModemConfig2

Never noticed it and never needed it But thank you for your addition!

arjanvanb · December 4, 2019, 4:52pm

Indeed, but for, e.g., the closed source NOC, integrations and TTN Console in V2, issues could be reported on Issues · TheThingsArchive/ttn · GitHub

@KrishnaIyerEaswaran2 (@laurens, @rish1), is there any place where possible bugs for the TTIG can be reported?

Nobyte · February 1, 2020, 5:34pm

Is there anything new on the subject? I came across this problem with my TTI Gataway. If my node sends an uplink every 10 minutes it is obviously too long and many values are lost. Now I have additionally installed a dummy node that sends every 2 minutes and now all values are transmitted without loss even at 10 or 15 minute intervals. But that doesn’t seem to be a really good solution to me, if you should actually keep the traffic low.

kersing · February 1, 2020, 6:27pm

During the conference I got confirmation that work on ongoing on new firmware, however progress is very slow. TTN does not own the sources and can’t do the work themselves and they can’t release the sources to allow the community to improve things which is a shame as I know there are several avid ESP developers in the community that should be able to help out.

Nobyte · February 1, 2020, 6:32pm

Too bad. Thanks anyway for the answer and the information.

arjanvanb · February 2, 2020, 11:03am

Do you happen to know if this implies that it might not be the server that is causing the WebSocket connection to be dropped?

Or, @KrishnaIyerEaswaran2, could you tell us who’s dropping the connection?

kersing · February 2, 2020, 11:08am

Sorry, no. I would have to speculate so let’s wait for an authoritative answer.