How to implement the Fair Use Policy on a dev board?

MERS_Pero · March 23, 2022, 11:32am

Hello,
as I have recently begun to play with LoRaWAN related projects, I came accross the Fair Use Policy described here on this forum and many other sites. The way I understand it, it requires each node to perform OTAA activation only once in its lifetime and never repeat it again. Okay, sound like a good deal, since activation data can be stored in the node’s flash memory.

But when I check out some library examples, both LMIC and LoRaMAC, they all consist of couple of phases, where the first one is usually OTA-activatition and the others are Transmission, Reception and Idle mode. So, during the work on the device it is expeceted to reset it many times, thus restarting the OTAA all over again. Isnt this in the direct contrast with the FUP? And if I’m a developer, I might restart my board a hundreds of times. So what to make of it? How do you work on your new projects/devices?

Similar question arises when the node uses a deepsleep mode. After wakeup, the system usually resets, and thus repeats the OTAA anew. For example, in the MCCI-LMIC examples, usually we have some init code like this:

// LMIC init
os_init();
// Reset the MAC state. Session and pending data transfers will be discarded.
LMIC_reset();
// Start job (sending automatically starts OTAA too)
do_send(&sendjob);

and then we process the LoRa events in the loop by calling os_runloop_once(); But if there are no eventy, the device should go to sleep for hours. Now, after the wakeup, should it initalize again, or continue processing the loop?

kersing · March 23, 2022, 12:05pm

That is actually very uncommon. ESPs suffer from that, but most ARM based controllers and 8 bitters are perfectly capable of retaining state while in deep sleep.

No. Best practice is to limit joins. The limit for a node is actually 64K times as at that time all nonces are consumed and when using random once you will hit the wall far earlier. (Node needs to be freshly registered to reset that state)
So while developing repeated joins are expected and allowed. When deployed limit them. And it helps to have your own gateway so during development you don’t consume gateway airtime from other gateways for the joins.

MERS_Pero · March 23, 2022, 12:23pm

which is what I do. Assuming noobody in my part of town uses it, I guess I can OTAA as much as I want? So no harm will come if my app resets and forwards joint-requests many times daily?

Well my ARM-based nRF52832 surely resets when it wakes up from the SystemOFF deepsleep. Actually, the only way around is to activate the retained RAM sections and store some variables there.

kersing · March 23, 2022, 12:52pm

That mode has been chosen because the 0.4 micro amp used to retain RAM is too costly? The average battery looses more power on self discharge.

MERS_Pero · March 23, 2022, 1:33pm

Ok, I get your point. So can I wrap up like this: once my device is in the field and does its thing, measureds something and then transmitts it, the best course of action would be to activate only once, retain the state during the deepsleep and keep on transmitting once it us up again?

The second best couse of action would be to make it sleep for some hours, and if the state can not be retained, to restart the whole activation thing, transmit once and go back to sleep. In this case the number of OTAA atempts should be limited to 64K.

descartes · March 23, 2022, 1:47pm

Either do the first thing.

Or save the state of the join (many try when getting started, few succeed).

But if you try to do your second option you will end up with multiple join fails and the device will be stuck until it happens to hit a JoinNonce that hasn’t been used. Think welding the petrol cap shut on a car - it will run out and then you’ll have to do something radical to get it working again.

kersing · March 23, 2022, 2:14pm

No. It is mandatory to retain state.

Joining on every uplink does not scale as it involves too many downlinks.

Join on every uplink will burn power. There is no way power use for memory retention on your controller comes even close to the energy expenditure of a join cycle.

MERS_Pero · March 23, 2022, 2:24pm

why would it fail if I have my deveui, appeui and appkey safe and sound burnt into my flash? If they worked once, they should work everytime, or am I missing something here?

MERS_Pero · March 23, 2022, 2:26pm

Sorry, I’m confused a bit. How does this relate to your first answer here, that is,

kersing · March 23, 2022, 2:49pm

Because a join requires a unique (incrementing for 1.0.4) nonce. You need to retain that nonce as well.

kersing · March 23, 2022, 2:53pm

Read the standard which actually mandates state saving.

When developing that is often not an option because you update the firmware and values might shift around a bit in structures (and as a result on eeprom). So during development you discard and restart. In the field that is not an option.

descartes · March 23, 2022, 3:05pm

Yes, a big chunk of the LoRaWAN spec related to the Join process - the device sends the EUI’s and a JoinNonce and, if the JoinNonce hasn’t been used, it gets back a session details that are encrypted with the AppKey - so if you don’t save the AppsKey and NwsKey and the F_Cnt and some other essential details, you will have to re-join - but devices that restart frequently end up with the same random seed so the JoinNonce it generates can already exist. And other details.

See: https://www.thethingsnetwork.org/docs/lorawan/

BTW, we answer the questions because we have some modicum of experience. You really really are best just using deep sleep that saves the RAM - preferably on a common device that you can get support from the community - once you know how to fly, you can make your own rocket ship!

bertrik · March 23, 2022, 4:01pm

I asked about support for this in the mcci-lmic stack: Get airtime estimate · Issue #754 · mcci-catena/arduino-lmic · GitHub
My suggestion was for the stack to give me a measure of time spent for upload, after each uplink.

What could be a fair implementation is to consider the duty cycle of 30 seconds/day also for each individual uplink, which then amounts to 1:2880. So if the stacks spends 0.1 seconds sending my uplink (for example), my application should wait for 288 seconds before sending the next packet.

As I understand the reply, it is recognised as a useful feature, but not a top priority for the mcci lmic.

Jeff-UK · March 23, 2022, 4:01pm

@MERS_Pero I may be wrong here in which case my colleagues will correct but key point is

The fact is that a great many - historically most - COTS devices and modules do not do this, and sadly too few home brew developments.

IIRC it is not mandatory but recommended for LoRaWAN 1.0/1.01/1.02 or 1.03 - I believe MCCI as called out earlier targets 1.03?

However for 1.04 as stated

The key section in 1.04 L2 spec is under 6.2.5:

The Join-Request frame contains the JoinEUI and DevEUI of the end-device followed by 1433 
a nonce of 2 octets (DevNonce).   1434 
DevNonce is a counter starting at 0 when the end-device is initially powered up and 1435 
incremented with every Join-Request. A DevNonce value SHALL never be reused for a given 1436 
JoinEUI value.  If the end-device can be power-cycled, then DevNonce SHALL be 1437 
persistent (e.g., stored in a non-volatile memory). Resetting DevNonce without changing 1438 
JoinEUI will cause the Join Server to discard the Join-Requests of the end-device.  For each 1439 
end-device, the Join Server keeps track of the last DevNonce value used by the end-device 1440 
and ignores Join-Requests if DevNonce is not incremented.

I dont think this requirement existed in 1.03 or earlier…

Swap batteries in many well know branded products and they will initiate a new join if using OTAA as they do not retain full state and associated variables if any… 'cause not 1.04…

cslorabox · March 23, 2022, 4:19pm

Linear counting may not have been mandated, but join nonce re-use was still prohibited and networks were expected to retain at least some amount of history record to block it.

Classic behavior with a bad port of LMIC to a new platform is that the random number generator it uses to guess hasn’t been correctly ported, so doesn’t actually work, and always yields the same answer. Since that nonce has already been used, the join silently fails. LMIC increments the nonce by one and tries again. The result is that each time state is lost, the node takes long and longer to count to an un-unsed value and successfully join.

Anyway, as several have observed up the thread, regularly joining in deployment is unworkable, since even in the ideal case it requires at least one wasteful transmission and reception (and often several) before any useful data can be sent - like doubling to quadrupling the power cost of sending a data packet up.

With the linear counting requirement in the new spec, the already inadvisable idea of making a node that does not retain state, now becomes effectively impossible, since at least the join nonce must be retained.

Jeff-UK · March 23, 2022, 4:34pm

Indeed and that was a key part of the system security as a new Dev-Nonce was required to prevent replay attacks based on re-issuing old join req messages…that is well understood. Many retry mechanisms have been used across the industry over the years as you call out walking value by increment of 1 common and used in LMIC - but after many rejoins that can lead to a very long time to establish a new valid rejoin. Mandatory (effectively) persistence only came in recently, but as repeatidly called out it is best practice, even if in earlier days it was not the only practice!

Again to quote from 1.03 this time

For each end-device, the network server keeps track of a 974
certain number of DevNonce values used by the end-device in the past, and ignores join 975 
requests with any of these DevNonce values from that end-device.  976
Note: This mechanism prevents replay attacks by sending previously 977 
recorded join-request messages with the intention of disconnecting the 978
respective end-device from the network. Any time the network server 979 
processes a Join-Request and generates a Join-accept frame, it shall 980
maintain both the old security context (keys and counters, if any) and 981 
the new one until it receives the first successful uplink frame using the 982
new context, after which the old context can be safely removed. This 983 
provides defense against an adversary replaying an earlier Join-request 984
using a DevNonce that falls outside the finite list of values tracked by 985 
the network server.

The issue we now face is the fact that “the Internet” has a VERY long memory and there are lots of howtoos and tutorials from bygone years that simply dont take this into consideration. Not many of these will have provision for a non-volatile retention mechansism - something we will just have to live with (along side telling folk not attempt a 1.04 or later build if using such under resourced h/w solutions.)…Hell we still have enough problems telling folk to avoid old SCPF/DCPF tutorials like the plauge!

descartes · March 23, 2022, 4:36pm

I don’t use precise maths, but I do know the payload sizes so I can build in a wait time after an uplink so I’m not hogging the airwaves. And in a similar fashion, curb the number of uplinks or increase the wait time before the next one. It’s not millisecond precision but it keeps things sane.

MERS_Pero · March 24, 2022, 3:52pm

Thanks for clearing that up. I don’t plan to restart frequently, just every couple of minutes in that case. I also notise some version incompatibilites as there is a number DevNonce defined one time as:

a unique, random, 2-byte value generated by the end device. The Network Server uses the DevNonce of each end-device to keep track of their join requests…

and another time as

a 2-byte counter, starting at 0 when the device is initially powered up and incremented with every Join-request .

But I see that people are discussing it already here. In any case, I see that all the DevNone, and DevAddr, and NwkSkey, and AppSkey should all be stored if I dont want to OTAA everytime my board wakes up.

MERS_Pero · March 24, 2022, 4:03pm

This implies that every individual DevEUI has a finite amout of join attempts in its lifetime, and that amount being 65 536. I guess that’s enough but it’s still anoying to be obliged to store it in the non-volatile memory. Especially since during the development I expect to burn through the 10000 of those attempts.

kersing · March 24, 2022, 4:22pm

Nope. First of all you can delete the device from the LoRaWAN providers administration and add it again. All counters at their side will be reset at that point and that means you can reset the counter as well.
Also there are 4 variables involved. The DevEUI, the AppEUI, the AppKey and the nonce. Change any of the first 3 and you can restart the nonce at 0.

So your statement should read:
The combination of a DevEUI, AppEUI and AppKey has a finite amount of join attempts in its lifetime. (If not reset by some means which not every LoRaWAN provider might provide)

I’ve been developing LoRaWAN applications for over 5 years and I am not even close to that amount on all devices combined. So you might need to rethink your development strategy if you think you need over 10k for a single device.