How reliable are downlinks?

At the time V2 of the TTN stack was created those decisions made perfect sense. Things have moved on from there and now the stack shows it’s age. Don’t forget this stack has been running for years.

I had extensive discussions with some of the team and they felt dictating a node to stack awake and ‘burning’ battery was not their decision so they followed the then current best practice.

That assumption was 100% valid at the time the stack was created and there are still tutorials on how to setup a RPi based gateway linking to the old code base.
So the keyword is ‘current’, had they known a jit queue was being added they might have designed things differently, but the jit queue was not available at that time and there still is no reliable way to determine which UDP gateways provide a jit queue and which don’t.
Not everyone updates the software so you can’t assume all gateways will have jit by now. I too have a gateway deployed at a remote site that needs to be updated sometime. It is my first remote gateway and remote access didn’t seem that important at that time. May-be it isn’t as the RPi is still running like a charm almost 4 years and numerous power interruptions later.

It is not part of the spec. It was implemented that way because the old gateways only stored one packet until the time to transmit, if anything else needed to be transmitted by the gateway before the queued packet that packet was silently discarded resulting in loss of downlinks. Not good for OTAA with a 5/6 second interval between request and reply.

2 Likes

I say ‘tempted’ because I’m a bit too pragmatic to look to create a full spec compliant stack in all it’s glory - rather the core very well architected, documented and very clear use cases / functionality level that others can build on if they need other bits or their own regional setup.

My original thought last summer when I was struggling with the TinyThing (Arduino) version was to take LMIC apart in to component elements spread over a lot more files (I hate endless scrolling), better signposted and create some sort of build system so the many many #defines to switch in different regional code could be sanitised.

I realise this would be anathema to some, and I’ve 101 other things to do at present, so don’t worry, these are only musings.

Given how infrequent uplinks are, a node that is awake for two second rather than one isn’t really burning that much more power - from profiling it, the processor awake typically isn’t that huge compared to the energy to transmit and receive.

But developers who are seriously concerned about power usage sleep until the RX window. It’s really not that hard to do, you just have to actually understand what is going on in your node.

This is only one of many aspects of the lack of any “gateway capabilities” interchange. Eg, “gateway is receive only” etc.

I suspect in practice there’s some correlation to the change in UDP protocol format, though the commit history is opaque enough that figuring out exact sequence of changes is more than a few click away.

Anyway, the upshot is the mobile backhaul works pretty well on private networks designed around the idea of using it and sees widespread usage since it’s often the only thing readily available on sites where it would be nice to put a gateway; if it’s not working on TTN (and there’s no actual proof it’s to blame for the asker’s issues in this case), that’s a result of less than optimal design decisions.

You need to go back 4 years. The only options at that time were RFM95 or equivalent modules with LMIC on controllers that required timing relaxed with 10% to be able to hit the RX slots at all and microchip modules, for the later sleep is not an option because the controller is not running the stack and doesn’t know the timings. And you should not forget the target audience of TTN, most users are not hardcore embedded developers.

My three 3G connected gateways (MultiTech, Dragino LG308 and TTOG) are proof it is working without any issues. So I think there are other issues or a combination of factors coming into play for the asker.

I would add to the list of successful 3G deployments with various gw types with many Laird RG186, many RAK Pilots, an iMST Lite Gw, a Dragino LPS8, several of my Owl RPi0W or RPi 2/3 based GWs, several of my TTIGs, TTN ‘Kickstarter’, and Multitech AEP all running fine at various test sites or with several on permanent deployments. Yes, I have seen occasional issues, these have been when on edge of reception areas and suspect switching cell towers, and one was troublesome near a major motorway network where I suspect sheer volume of cell traffic/hand offs and likely some passing ‘priority’ users (emergency services or key government staff. :wink: ) Ithink it also depends on the backhaul used by associated cell tower and RAN, as I saw situation a couple of years back where a site was poor then when I went back 10-12 months later it was just fine. A chance discussion with mobile engineers some weeks later indicated a major backhaul update to a direct fibre connection 6 mo earlier which may have helped.

The same feather with the same firmware seems to work very well when I park in the same place on-site to test that gateway. The test where it wasn’t very good was when I was parked in a different place, although still nearby. So I don’t think it is the feather or the firmware.

For those saying their 3G gateways work fine, how much are you using downlinks? TTN sending downlinks to a gateway just in time must work most of the time or surely there would be more complaints. But those messages finding their way to the gateway is a different matter. I have no idea how good the 3G link performance is or how I’d even measure it, but it seems the most likely culprit to me, given the adequate performance of the same feather on-site.

Using OTAA for test nodes and trackers/coverage testers at trial sites means all using Downlinks at some point, also for ADR adjustments over time. Several have been tested or are on permanent deployments where they are using e.g. Laird RS186 T&H sensors, and whilst most are using LPP with unconfirmed payloads a few have Laird confirmed payload option, triggering tyically 1-8 downlinks per day (to stay inside the TTN FUP of 10 dowloads max).

That tells me the problem wasnt the GW/Downloads per se but likely you were in crap reception areas! Often, even if inside a nominal coverage area, you can find local nulls (e.g. due to higher gain GW Antenna Tx patterns - these tend to be great for reaching towards the horizon, but introduce dead spots in near-medium range! :frowning: ) (For that reason I invariably deploy more omnidirectional 2-3dBi ant’s vs more exotic units claiming >>5 or 6dBi gain - thats my limit - or directional units - a lesson learnt working with the emergency services who want/need even coverage patterns) Or you could have been shielded by buildings, or terrain even if you think you have LOS. One GW I have deployed gets great coverage for miles around apart from the direction of a local retail park where a large suppermarket with metalised outer shell and of course significant metal content in store (tins, shelving etc.) creats a massive RF shadow beyond, similarly another suffers where it covers an industrial/distribution park where again the warehousing buildings and associated stock creates bad RF shadowing, the resiliance of LoRa means the associated reflections and multipaths created means reception can still happen compared to legacy radios (FSK, gFSK etc.) but even so LoRa isnt pefect in such environments…

1 Like

When assuming the same data rate and no special antennas: would uplinks and downlinks have the same characteristics? Like say that the gateway is installed quite high, and the device is much closer to the ground, would an SF7BW125 uplink and downlink have the same reception quality? I wonder if RF shadows and things like Fresnel zones could affect one direction more than the opposite direction.

Aside: for EU868, RX2 always uses SF9BW125 (or SF12 for an OTAA Join Accept), with some higher TX power to make up for the faster data rate. An EU868 SF7 or SF8 uplink would often/always yield RX1 with the same SF for the downlink, but SF9 and up would often/always yield RX2 with SF9 instead. So, for SF9 and up the uplink and downlink are not symmetrical in EU868. (For SF9 the downlink is actually better, I think, given it’s using the same SF with a higher TX power.) Other regions may have specific downlink settings too, not matching the uplink settings.

The only reason for needing that kind of time allowance is if there are aspects of the firmware timing behavior which are not well understood. If clocks were actually that wrong, debug UARTs wouldn’t work.

There are a lot of people in the LMiC realm making guesses about timing, and not doing what they need to in order to actual measure it, which is blip a gpio on the node and watch that on a scope or analyzer with the gateway transmit LED on the other channel.

But even running with such an assumption, you can sleep for 90% of the delay to the RX window.

My three 3G connected gateways (MultiTech, Dragino LG308 and TTOG) are proof it is working without any issues.

With such a thing you cannot extend from one situation to another with any validity at all or from day to day in the same setting. Even wired Internet can at times provide delay which breaks the RX1 window or the hold-until-last-instant server behavior for longer ones.

So I think there are other issues or a combination of factors coming into play for the asker.

It would not be my first suspicion either - as I’ve said repeatedly, the asker’s real problem is lack of visibility into what is going on.

But it can be an issue.

As a matter of physics in isolation, yes.

And if you were using the same hardware on both ends, perhaps practically.

But as a matter of implementation on a node vs a gateway, no. The radios are not the same.

Typically radio systems are more sensitive to noise sources at the receiver - things like power supply noise, etc. I had an OOK IoT gadget that exhibited a sharply curtailed receive range when operated from a portable charger battery - one can imagine the boost converter was not quiet. LoRa is relatively good at rejecting amplitude type noise, but not perfect.

And interference sources can be external, tool. Setting up some experiments to coordinate uplink timing, I was able to consistently demonstrate that a node some tens of meters away from the gateway was able to reduce the reception of distant nodes on other channels.

1 Like

Thats not a suprise, just another example of the classic ‘near/far’ problem. The Ant must be capable of receiving all the the desired frquencies (channels) in the target band if not be a wide band Ant, it’s then down to how much front end headroom, discrimination and cross channel rejection capability within a given Concentrator board/RF subsystem design - none of which will be ideal/perfect even in a LoRa system which classically performs better than legacy radios.

What does help in that scenario is use of differing spreading factors on the different channels as that increases rejection capability and deminished cross interference by anything from 16-22dB, but even then if far signal is low and near signal is close to swamping front end you may get problems. For that reason whilst I often put a ‘heart-beat’ node close to a gw install so I can easilly monitor if receiving ok (and it also helps check if data chain back to NS fully functional), I usually place at least 30m away from GW and if I can get a good location anywhere from 100-300m away :slight_smile: (If placement not ideal (<<30m) I would then dial down the heat-beats Tx power to reduce risk of interference/swaming front end)

Agree where the PSU source is noisy - have seen that too - typically use nodes off batteries vs switching PSU’s though occasionally I use USB powerbanks for field tests if not too concerned for absolute sensitivity tests, though I find using USB power banks for GW source when trialing short term not an issue as GW itself helps filter noise from PSU and also typically masks with much more of its own self generated noise! :slight_smile:

I wasn’t being very clear. At my house I have the GSM GW for testing while I’m writing the firmware. I am usually a bit too close to this GW while testing at home, diagonally opposite corners of a double garage. But I sometimes put the GW outside, and sometimes test from inside the house making the distance much further, and neither of those things has an effect. Downlinks have never been reliable via this GW, and even joining can take forever, although moving to MCCI LMIC seemed to improve that a bit for some some reason. I cannot think why. It now takes 6 - 10 mins to join rather than 40 - 60 mins, and I can usually get a downlink on 1 in 4 or 6 attempts rather than what felt more like 1 in 10 or 20.

The tests when I talk about going on-site use a different GW. The site is about a 10 minute drive from my house so there is no chance I’m using the GSM GW then. I agree the LoS must have been affected when I was in the car park, I can’t think what else it could be. I have no idea where this GW is, other than somewhere on the site. The first test I want to do next semester when hopefully the COVID lockdown has eased a bit is test the node at the pump controller location to see how it performs in place.

The node is a Feather with a piece of wire soldered into a via as the antenna. If this is put into a weather shielded metal control box with a bunch of contactors etc is that likely to act as some sort of RF shield or be noisy and stop the node from working?

Your advice works for people with both the skills and the equipment. 90% of TTN users do not own a gateway and are ‘reduced’ to other ways of debugging. (Given that there are > 111.000 developers and just < 11000 gateways according to the TTN home page)

Right, but after running with 3G connected gateways for over 12 months I can confidently state 3G back haul is very well possible even with downlinks. Of course there are issues at times but 99% of the time everything works as expected for my gateways using that telco.
Other sites and other telcos may perform differently. If you search the forum you should find messages concerning vodafone in Italy where the gateway performance degraded over time due to traffic prioritization in their network. However those cases are not the norm in my experience and from what I’ve heard from other gateway owners using 3G.

An antenna within a metal box? May-be google faraday cage and reconsider?

Plenty. I’ve been using one of those gateways at workshops where we use OTAA nodes that are started within coverage area of that gateway. Wouldn’t work without downlinks.

LOL, exactly what I was thinking.

The AdaFruit product page says “Simple wire antenna or spot for uFL connector” which I assume means we could solder on whatever a uFL connector is and attach an antenna that can be mounted outside the metal box.

Or maybe we sit it on top of the box with a little umbrella!

I wouldn’t mind taking the G3 GW back to the sponsor and have them test it. I would also like to test the node in place ASAP, and not just from across the road. But it’s now between semesters and AU is worried about a 2nd wave of COVID given what’s happening in Victoria so I’m just going to have to sit on my hands for a few more weeks before I can think about going to see them in person and not just sit in my car across the road like I have been.

But that improvement may tell you that it’s not (all) to blame on the GSM gateway?

A gateway will just transmit at the time the network tells it to. If the downlink arrives at the gateway too late to be transmitted (due to network latency), then it will not transmit it at all; it will not transmit it with some delay, but discard it. So, if changes in the device make downlinks work better, then at least part of the problem is not in the gateway.

I’d investigate if the uplinks between the different versions of LMIC are different. Like if there are any differences in the join procedure of the two LMICs. Maybe they start OTAA with a different SF, maybe also making TTN make a different choice for RX1 or RX2? Or, if you are in US915 or AU915, then maybe MCCI supports the initial ADR Request with some network configuration? (I don’t know if those settings affect downlinks. For EU868, LoRaWAN 1.0.2 includes some details in the Join Accept, most importantly also configuring RX2.)

If you cannot find differences in the uplink SF or downlinks, then I feel the improvement you saw indeed tells you that failing downlinks are not (all) to blame on the gateway. So even though the ethernet gateway seems to give you better results, you’d still have some investigation to do.

Aside: for Basic Station, does anyone know if TTN leaves the choice for RX1 or RX2 to the gateway?

Station will choose either RX1 or RX2 transmission parameters, whereby RX1 is preferred. If the frame arrives too late or the transmission spot is blocked, it will try RX2. To force a decision, the LNS can omit either the RX1 or the RX2 parameters.

Given one example message I found, it seems TTN forces a single choice indeed:

As we’re seeing in this thread, trying to debug a node without physical and administrative access to a gateway is an exercise in banging one’s head against a wall.

Apart from community groups that can share resources, buying a concentrator card is pretty much the “price of playing” if one wants to do much beyond build a copy of known good software/hardware.

It would be nice if that were not the case, but realistically, it is. The asker of this thread has now spent far more time being frustrated and driving to where another gateway is, than a concentrator would cost at even the most minimal wage.

I have to wonder if those numbers really capture the situation, it may include those who registered from a tentative interest. Actual active accounts versus gateways would be more interesting, but it still doesn’t show who is doing actual development.

However, for the node RX timing problem in specific, one actually can blip a gpio on both transmit and receive, and then use either a digital scope or a cheap USB logic analyzer to measure the time in between. It’s more complete and illustrative when the node is compared to the gateway, but one actually could compare the nodes RX to its own TX and the spec. The catch is that you need to find something that indicates that the packet has actually finished transmitting, or else externally calculate the packet duration and include that in the expected delay measured.

2 Likes

Sure, if you have them and can use them. But there are many many modules with marketing materials designed to seduce that only require a USB cable and the ability to download software.

I talked to Wienke Giezeman at the Reading conference last autumn about this - his feeling is that there are sufficient materials to inform - I disagree and I think it leaves some people who could champion LoRaWAN somewhat disillusioned & misinformed.

1 Like

A USB-based logic analyzer suitable for measuring node timing costs around $10 and works with sigrok/pulseview open source software.