Adjusting Link Dead Check on Device

Hi The Things forum!

Im using:
LoRaWAN MAC Version: 1.0.2 (revB)
MCCI LoRaWAN LMIC Library, version 4.1.
A device that sends one unconfirmed uplink with data hourly.

Problem:
The time it takes for an end device to realise that the uplink messages are not seen by the gateway is too long. There are 2 constants called LINK_CHECK_DEAD = 32 and LINK_CHECK_UNJOIN = 56.
I’m wondering if I could decrease these values so that the device could detect that info is not going through.
So that if (LMIC.adrAckReq > LINK_CHECK_DEAD) the device is increasing SF etc in an earlier stage. Is this solution something you would recommend or is there a smarter way?

Thank you!

This logic is currently implemented:

if( LMIC.adrAckReq > LINK_CHECK_DEAD ) {
        // We haven't heard from NWK for some time although we
        // asked for a response for some time - assume we're disconnected. Lower DR one notch.
        EV(devCond, ERR, (e_.reason = EV::devCond_t::LINK_DEAD,
                            e_.eui    = MAIN::CDEV->getEui(),
                            e_.info   = LMIC.adrAckReq));
        dr_t newDr = decDR((dr_t)LMIC.datarate);
        // newDr must be feasible; there must be at least
        // one channel that supports the new datarate. If not, stay
        // at current datarate (which finalizes things).
        if (! LMICbandplan_isDataRateFeasible(newDr)) {
            LMICOS_logEventUint32("LINK_CHECK_DEAD, new DR not feasible", (newDr << 8) | LMIC.datarate);
            newDr = LMIC.datarate;
        }
        if( newDr == (dr_t)LMIC.datarate) {
            // We are already at the minimum datarate
            // if the link is already marked dead, we need to join.
#if !defined(DISABLE_JOIN)
            if ( LMIC.adrAckReq > LINK_CHECK_UNJOIN ) {
                LMIC.opmode |= OP_UNJOIN;
            }
#endif // !defined(DISABLE_JOIN)
        } else if (newDr == LORAWAN_DR0) {
            // the spec says: the ADRACKReq shall not be set if
            // the device uses its lowest available data rate.
            // (1.0.3, 4.3.1.1, line 458)
            // We let the count continue to increase.
        } else {
            // we successfully lowered the data rate...
            // reset so that we'll lower again after the next
            // 32 uplinks.
            setAdrAckCount(LINK_CHECK_CONT);
        }
        // Decrease DataRate and restore fullpower.
        setDrTxpow(DRCHG_NOADRACK, newDr, pow2dBm(0));

        // be careful only to report EV_LINK_DEAD once.
        u2_t old_opmode = LMIC.opmode;
        LMIC.opmode = old_opmode | OP_LINKDEAD;
        if (LMIC.opmode != old_opmode)
            reportEventNoUpdate(EV_LINK_DEAD); // update?
    }

Why would the device fall out of range of a gateway?

Can you not test this?

Those are the values you can play around with to remediate the symptoms. However be aware a node can only join a limited number of times before the devnonces start colliding (as in random nonce was used before). So the best solution is to check why there is no gateway receiving the transmission and remediate that for instance by installing a new gateway at nearby.

Aren’t there several situations where a device can fall out of range, without the gateway being aware of that? For example, if someone joins a device at a location with good connection and then places it in a new location where the previous ADR-settings is no longer applicable. Then the gateway will be unaware that the device needs ADR adjustments because no uplinks were able to reach it. In this case, the device must recognise that the uplinks are not being received and the device should adjust its settings.

I have a really good test setup for this, though it takes some time, and I find the LIMIC-logic not super straightforward. I’m trying to check when the library is appending the requirement for a downlink ACK and whether I can control when and how often that happens. Perhaps there is a better solution to the problem than to decrease LINK_CHECK_DEAD.

Thank you! I will set up a test with LINK_CHECK_DEAD decreased and LINK_CHECK_UNJOIN remained so that the settings are adjusted but the re joining logic is untouched. :+1:

First of all, gateways are dumb in LoRaWAN. The network server is the entity where all the intelligence is. That is why adding a gateway can easily be done and is transparent for end devices. There will just be an additional path for the data to travel to the network server.

Secondly, if an end device is moved and out of RF range with the current ADR setting the network server doesn’t know. Missing uplinks can be caused by the device being powered down as well. And even if the network server would know the ADR settings are incorrect it can’t do anything about it because it needs the end device to send an uplink before it can respond. (Even for class C because it needs to know which gateway to use for a downlink)
In LoRaWAN the network can tell the device to switch to a lower spreading factor or use less power when transmitting, the device will need to increase power or switch to a higher spreading factor if there is no response to the link check requests. And for that you found the right controls to change.

Don’t use ACKs. You are only allowed 10 of them each day on TTN and even less on commercial networks. LMIC should never use ACKs without explicit instructions to do so by the end device firmware code writer.

Is it possible to adjust when a “link check requests” is sent? If so I might not have to change the limit of missed requests just how often the end device asks for it. Im aware of the 10 messages a day limit, the use case of the device is 24 unconfirmed uplinks /day, but it is really important that the uplinks goes through and that the device adjust its settings if more than 10% of the uplinks don’t go trough.

Apologies if Im not always using the correct terminology :slight_smile:

Ideally they stop doing this with immediate effect!

But overall, what you are trying to achieve is all baked in, it’s just the problem is:

There is nothing in the ISM radio band that can ensure transmissions are received - it’s shared, anything could happen including large vehicles blocking line of sight. The metric TTI recommend is that you are able to cope with 10% packet loss. In a “normal” radio environment, what ever that is, but mostly not in the middle of an urban area or industrial park, then 1-2% is typical.

Hammering the uplink through is very bad for many reasons. Asking for a return receipt is very bad for many reasons.

The best strategy is to include overlapping payloads - so that the one sent for the second hour has a copy of the first hour - which can be made smaller by using a delta - the difference. Or even a moving window of three - current and previous two.

Alternatively you can send a request for missing data - so if your application receives the data for hour 5 but hasn’t had hour 4, very easy for it to detect in advance, it can send a downlink to request a resend of hour 4.

Another alternative is to send the data more than once with an appropriate random gap between uplinks.

Ideally your application can profile the at risk devices and set settings accordingly - this way you can make best use of a single small payloads with replay for the majority and either slight longer payloads for those prone to drop-out.

And as well developed as LMIC is, it isn’t as feature complete as other stacks, none of which I would use the LinkCheck mechanism to ensure complete data - I’d roll my own. But I would use LinkCheck to ensure local connectivity, having super glued the person moving the devices to the floor and the devices to their location. And keep a rolling log of data so if the super glue fails or a friend moves the devices or turns off a gateway - after they have been fired from the company, you can ask the device to replay a range of data. I would NOT rely on any stack to do anything useful or sensible with LinkCheck, I’d issue it at an appropriate interval from my own code and then follow the well formed guidance of TR007 which would suggest increasing the SF by a couple of points, retrying and then after a couple more attempts, trying a rejoin.

All of this is at the paranoid end of LoRaWAN robustness. Mostly gateways & devices just get on with their tasks for years. There is a school of thought that says if you HAVE to have every hourly reading, then this is not the correct technology. I’d disagree, I’d have a replay system or overlapping data and remove the saboteurs from the organisation.

As with all these situations, knowing what you are measuring would help enormously so we can comment more specifically.

Hi again,

Sorry for a late reply. The use case is water metering, with a node that sends uplinks hourly with real-time data. Hence, saving data locally on the device would not make so much sense here. Superglue was not the solution I went for either but adjusting the ADR backoff parameters.

X = amount of spreading factor steps away from SF12.
k = the amount of uplinks it takes for a node to unjoin the Network.
k= -LINK_CHECK_INIT + (1+ LINK_CHECK_DEAD)*X + LINK_CHECK_UNJOIN

With default settings, the amount of uplinks it took for the node to unjoin was in the worst case:
64+(1+32)*5+24 = 253h = 10days and 13h

Suggested fix:

LINK_CHECK_INIT = -16
LINK_CHECK_DEAD = 1
LINK_CHECK_UNJOIN = 8

1day - 1day 10h to unjoin depending on the SF the device has at the last Downlink.

Will ofc also inform my friends to have patience and not moving the devices closer to the gateway and then back, but sometime a friend just close a big door or it simply starts to rain.

Sincerely!

What you need is TotalRecall™, the firmware extension of choice. However it’s not been ported to LMIC (not so hard) and needs somewhere to store the data (for LMIC based devices, that could be a mess).

Then you can ask the device to replay the data. But with something with a smallish payload like water metering, you could include the last two readings as well as the current - then you can afford to lose two.

But with metering applications there is a cunning plan: send an incremental value! Ta Dah!