Does a node ever need to rejoin after OTAA?

Hi,

after joining with OTAA, are there uses cases that the node needs a rejoin? Does the backend server revoke the DevAddr and session keys in some cases?

If so, can this be detected via the downlink response given by the backend?

Thanks.
Casper

A re-join is usually only done when a device resets.

Another situation would be when the frame counters reach their maximum value of 65.536 or 4.294.967.296 (depending on whether you use 16 bit or 32 bit counters).

The backend (TTN) does not automatically revoke keys, but in the future it might be possible for the application (you) to “manually” revoke a device session. Communicating this with the device is the responsibility of the application (you).

Clear, thanks!

…or the lack of such response.

When rolling out a lot of nodes we will (also) use the lack of some specific response (like receiving an ACK or downlink at least, say, every 48 hours), to start a join again. We want this to allow for switching networks:

1 Like

Thanks!

Is this rejoin same as that described in v1.1 (type 0, type 1, type 2)? If an end device power cycles, does it do a new join request or rejoin request? If it is a rejoin, which type of rejoin will it do?

Typically a normal Join procedure. When I wrote that message 4 years ago, there was no v1.1 yet, and thus, no ReJoin procedure :wink:

Thank you! What is the purpose of the rejoin mechanism then? I skimmed over the specifications, but wasn’t able to understand. When would each type of rejoin request be used?

The ReJoin procedure is mostly useful for roaming. When a device leaves the service area of its network operator, and enters the service area of another operator, the device can send a ReJoin request to “ask” the home network operator if it should start a roaming session with the other network operator. The home network operator can also send MAC commands to the device telling it to (periodically) send such ReJoin requests.

In practice, we don’t see this being used yet, and we also don’t have guidelines for it yet.

Since the Network Server can tell a device (through MAC commands) if, when and how often it should send ReJoin requests, I think that’s sufficient for most devices, so I wouldn’t implement custom behavior for it if I were developing devices, and only use regular Join requests.

3 Likes

I think it may also be good to come back to the initial question asked in this thread, since we’ve learned quite a bit since 2016.

Here’s a quick summary of what Joins do:

  • Every Join request contains a unique DevNonce to keep the join procedure secure
  • There can be 64k (65536) different DevNonces for the same AppKey/NwkKey
  • Assuming that the root keys don’t change, a node can therefore send 64k Join requests in its lifetime
  • In LoRaWAN versions prior to 1.0.4, the DevNonce was random, and therefore the probability of picking a random one decreased over time. In LoRaWAN versions after 1.0.4, the DevNonce is a counter, which requires a bit of persistent memory on the device to keep track of the counter, but does not have increasing Join difficulty.
  • When a Join is accepted, a new Session is started
  • When the network receives the first message with the new Session, the old one is discarded
  • Every Uplink and Downlink message in a Session uses a unique Frame Counter (FCnt)
  • Frame Counters are 32 bits wide (allowing for 4G, 4.294.967.296, messages in a Session). Older versions of LoRaWAN used 16 bits Frame Counters (allowing for 64k, 65536, messages in a Session)

There are no real rules for when a device should transmit Join requests, but generally speaking we see devices that Join:

  • When the device doesn’t have a Session
    • when it is activated for the first time
    • when it loses its Session on reset
  • When the device thinks it has lost connection to the network
    • after following the usual reconnection steps (TODO: link to guideline)
  • When the application tells it to
    • always good to be able to send a downlink to the device to reset it
  • Periodically
    • resetting a device once every week

Since most devices send Join requests when they reset, it is EXTREMELY important to avoid synchronization by always using backoff and jitter in the implementation of the Join mechanism of devices.

5 Likes

Can you explain what exactly is meant by above?

Synchronization of devices happens if end devices respond to a large-scale external event. Some examples of synchronized events that we’ve experienced are:

  • Hundreds of end devices that are connected to the same power source (could be in a train, ship, building) and the power is switched off and on again
  • Hundreds of end devices that are connected to the same gateway, and the firmware of the gateway needs to be updated
  • Hundreds of thousands of end devices that are connected to The Things Network, and we have a database failover

Many end devices respond to these events, but if they respond in the wrong way, things can go terribly wrong.


Let’s take an example device that starts in Join mode when it powers on, and reverts to Join mode after being disconnected from the network. There are 100s of such devices in a field, and one gateway that covers this field.

The power source for the devices is switched on, and the gateway immediately receives the noise of 100s of simultaneous Join requests. LoRa gateways can deal quite well with noise, but this is just too much, and the gateway can’t make any sense of it. No Join requests are decoded, so no Join requests are forwarded to the network and no Join requests are accepted.

Exactly, or approximately 10 seconds later (the devices either have pretty accurate clocks, or they’re all equally inaccurate), the gateway again receives the noise of 100s of simultaneous Join requests, and still can’t make anything of it. This continues every 10 seconds after that, and the entire site stays offline.

Not great.

This situation can be improved by using jitter. Instead of sending a Join request every 10 seconds, the devices send a Join request 10 seconds after the previous one, plus or minus a random duration of 0-20% of this 10 seconds. This jitter percentage needs to be truly random, because if your devices all use the same pseudorandom number generator, they will still be synchronized, as they will all pick the same “random” number.

With these improved devices, the Join requests will no longer all be sent at exactly the same time, and the gateway will have a better chance of decoding the Join requests.

Much better. Especially if also the initial Join request was sent after a random delay.

But what if you have another site with 1000s of these devices instead of your site with 100s of them? Then the 10 seconds between Join messages may not be enough. This is where backoff comes in. Instead of having a delay of 10s±20%, you increase the delay after each attempt, so you do the second attempt after 20s±20%, the third after 30s±20%, and you keep increasing the delay until you have, say, 1h±20% between Join requests.

An implementation like this prevents persistent failures of sites and the network as a whole and helps speed up recovery after outages.

7 Likes

Power off/on related synchronized events can also be caused by power-outages in geographic regions (e.g. districts, cities).

One usually has no control over other LoRaWAN applications in an area and (depending on the application) the RF signals usually reach a larger area than where they are needed.
For many locations there is no guarantee that there will not be many end devices in the area and the number of devices may change/increase over time. Therefore, in theory, each gateway and each end device is prone to such large-scale external events.

So the ‘backoff and jitter’ strategy should actually be implemented in each LoRaWAN end device that performs OTAA joins.

Does randomizing of the delays have any impact on how spreading factors are/should be changed during join retries?

Will a ‘jitter and backoff’ strategy cause unnecessary join delays for devices in areas with only limited number of devices?

A ‘backoff and jitter’ strategy will probably need to be implemented in LoRaWAN libraries like LMIC and LoRaMac-node, because retries of failed joins are automatically performed by those libraries as part of a join request.

1 Like

One way to avoid pseudo random to provide the same results on all devices is to use a unique number to seed the random generator. The DevEUI (maybe combined with AppEUI) comes to mind.

1 Like

This makes sense. Thank you!

Ideally, a node knows what rate it was communicating at and can adjust it’s rejoin appropriately, lower SF trying more often.

Not if you use a random ± 20% and 8 channels for lower SF’s. This would be a good candidate for stochastic modelling.

This already happens in current LMIC (and probably LoRaMac-node) implementations but the intervals are predefined and of fixed length. AFAIK no randomization is applied.

Knowing that retry intervals in current (LMIC) implementations are already automatically incremented (by LMIC) but at the same time also spreading factors are automatically increased during retries, I was wondering if it suffices to only add a randomization to the length of retry intervals.

I know at least one LoRaWAN library implementation that works the opposite, tries joining using the highest SF first and gradually decrements if it fails. IIRC it is still unclear whether latter conforms to LoRaWAN specifications or not.

So it actually depends on the implemented algorithm.
Practical guidance for implementing ‘jitter and backoff’ would therefore be useful.

Not for power loss unless the user saves this info, which I sort of doubt, I know I haven’t

I meant after a power cycle (reset) when the device has not stored the keys receiced from from a previous join and does a new fresh join after the restart.