Are you seeing a TOO_EARLY downlink error?

If so, it’s because it got mentioned else where and now you are seeing it, like all the time, because that’s the way your head works, the mass of neurones in your cranium are a huge pattern recognition system with some elements highly tuned - it’s the reticular activation system in the limbic node in the core of your human brains. Like when you buy a Porsche in Racing Green and suddenly you keep spotting them all over the place.

The best fix is to not do downlinks. The second best fix is to stop watching the console quite so much.

The error occurs when a downlink is scheduled with a gateway running the UDP packet forwarder. The PF code uses a 32 bit counter/timer - if it overflows just at the point of a downlink being processed, it can’t do the math and it returns an error. I’m told that Basic Station doesn’t have this long standing documented issue and it uses 48 bits, so it must be better.

The inside source at TTI Towers that patiently explained this to me also says that there is no significant change in TOO_EARLY metrics.

I’m beginning to suspect that LoRaWAN is actually a huge MMO and we all go on quests to discover “The Truth” - which only leads to another quest …

1 Like

That should not happen too often because 32 bits are quite a few milliseconds (7 weeks).

What happens far more often is that the UDP packet arrives to late at the gateway and because the gateway logic doesn’t expect it too late it recons the downlink packet is scheduled a long way (about 7 weeks) in the future and refuses to keep in queued for that amount of time.

With V3 having RX1 at 5 seconds in stead of the previous 1 second, there is more time for the communication to happen and the downlink packet can be sent to the gateway with a larger margin for communication delays. These days the default margin is 530ms which should be plenty, if needed you can increase it (never mind the class C help text in the console, it applies to class A as well). However, if you need a larger value you should seriously consider looking for another backhaul.

3 Likes

This post lead from a message I sent Nick yesterday.

I have noticed this on gateways that have good solid as you can get backhaul.

We never ever saw these until recently. Now every time you open the application live date there is at lease one logged. How do you get these events forwarded to you? Would like to write them the a db to see the occurrence of them.

I have limited the device (Milesight EM300) as much as I can to generate downlinks (MAC settings)

image

One reason I can think of is that may-be TTI made a change that now shows this information where previously it was not presented?

Keep in mind a solid backhaul doesn’t mean there can’t be delays causing this to happen. I’ve seen these messages as well for gateways where the turn around time to the community servers is < 50ms.

Given that you see such a message every time you look at the application you either have a large number of devices or there are too many downlinks. May-be contact Milesight to get them to fix the firmware? (I’ve been involved in an issue in the past where a gentle nudge from TTI, being a Lora Alliance member, helped in convincing the vendor to fix their firmware)

May be 5 devices are high?

My one application is open a lot of times, as I want to see what is the behavior is of 2 of the devices. So I notice different messages very fast.

Unfortunately to a certain extent MAC commands causes downlinks and to fine tune them is a pain, takes lots of time. If you hade a large scale deployment this will be near impossible.

Maybe they only started pushing these messages recently.

These not arriving at the device is going to start people be more insistent to push the confirmed setting.

The packet forwarder will have behaved the same all along (if you didn’t change the gateway firmware). So the messages would have been dropped previously as well.

Which does not solve the issue and introduces new issues with some (most) devices.

It was early evening when I had the attention of a diligent TTI staffer who was kind enough to answer my help request, here’s more detail for the unconvinced:

The Packet Forwarder uses a 32 bit counter for its timing. This rolls over relatively frequently. If the counter is on 4294961000 and the NS has to schedule a transmission at what will effectively be 99880, what is it to do if the code can’t cope? Some code might, older code can’t, it doesn’t understand and so the gateway sends back a TOO EARLY event. See:

One of the many changes from v2 to v3 is the considerable amount of extra information available to us via the console & from the CLI logging and we can capture 2000 lines & scroll back. So we have increased our opportunities to access this information. This led to many questions in the early days as people moved over from v2 to v3. And as I suggest above, once you see something out the of the ordinary, it keeps clicking in your head and humans have a tendency to over-focus on “the thing”.

And, having had a glimpse in Paris of the just one layout of the dashboard, I’ve every reason to believe that metrics on intricate details such as Too Early response are being monitored and so I have no reason to disbelieve TTI when they say that there hasn’t been an increase in that metric.

It can’t be a backhaul latency issue - that’s a request arriving too late, whereas this packet arrives in a timely fashion and is rejected because the code isn’t up to snuff.

Turning on confirmed downlinks won’t make a blind bit of difference to getting the broadcast done and will actually pollute your airwaves. One of the most spectacular ways to shoot yourself in the foot is to use a confirmed downlink - because if the device fails to ack it, you end up in a loop until you reset the session (no, you can’t clear the queue, it’s not in the queue, it’s in the sending box) - and if you reset the session then you have to rely on the firmware developers having implemented some level of LinkCheck or the LInkCheck MAC command being implemented.

I have a friend that manages the building of bridges over active motorways. He uses something called engineering & project management. Software developers have managers that use something called “time to market”. We can be our own worst enemies.

The suggestion from TTI was that Basic Station is much smarter, something that resolved @LasseRegin’s issue with no uplinks for reasons not explored but potentially to do with TCP rather than UDP as the transport, but was conflated with a single reported Too Early console entry.

We shouldn’t need to ‘fine tune them’ - if a device is certified, to a reasonable degree with the right LW version & RP, they should just work. And there is a whole forum of people around to ask for assistance. We don’t have a device list with observations, comments, tips & a rating. If we did, we could ‘encourage’ vendors to resolve these issues. But for larger scale deployments never believe the vendor - 5 iffy devices is one thing, a box of 1,000 is a whole new level of angst.

If a device is inspiring the LNS to generate frequent MAC downlinks - I see it often enough when either trying a device or developing one - then there is a configuration setting at either end that needs to be resolved. If the device is “OK” then there is now a CLI command to cease all MAC downlinks.

I know that the LNS can appear to be somewhat obsessional about sending MAC commands, sometimes I see a one-to-one uplink to MAC cmd downlink ratio and for me to spend time decoding/debugging why MAC acks aren’t happening or what the mismatch is can often be sidelined to get on with the actual job of delivering. But the thing that I know the most is that I have clients who have not the least clue what is going on in some situations, something that takes time to explain AND then the entire situation is reversed where someone at TTI has to patiently explain to me what nuance I’ve missed & the jolly good reason it’s like that. This is one of those moments - I asked a question and within a couple of minutes had the material to see what the detail was. Now you have it too.

1 Like

I stopped watching, I will apply my filter :see_no_evil:

TTI are more than happy to receive detailed logs with a summary of the potential issue. But as I didn’t have anything like that and their dashboard wasn’t reporting an issue, there wasn’t much to go on.

The JSON download from the console (top right) is a tour-de-force of detail and you can click it every so often to capture what’s on the browser when ever you want. Post-processing is a bit of a handful but the tools are around to solve that.

I totally disagree. However I probably don’t know what I am talking about having worked on a packet forwarder for months on end.

Right let’s look at the code to see when it considers a packet late:

Keep in mind both packet->count_us and time_us are of type uint32_t so no negative values will occur.
That means the window for packet being late is:

TX_START_DELAY + TX_MARGIN_DELAY + TX_JIT_DELAY

= 1500 + 1000 + 30000 = 32500 microseconds = 32.5 milliseconds.
That means if the transmit time is within 32.5 milliseconds of the current time it is considered late. If the scheduled time is after the current time, even one microsecond the calculation “packet->count_us - time_us” will yield a huge number which well exceeds 32500.

Now we’ll check when a packet is early:

(packet->count_us - time_us) > TX_MAX_ADVANCE_DELAY

with

#define TX_MAX_ADVANCE_DELAY ((JIT_NUM_BEACON_IN_QUEUE + 1) * 128 * 1E6)

TX_MAX_ADVANCE_DELAY resolves as (3+1) * 128 * 1E6 = 512000000

Keep in mind we’re still dealing with unsigned maths.

Let’s assume we get a packet with packet->count_us = 3957466499. The packet arrives at the jit function to be queued with time_us = 3957466499 + 500 ms = 3957466499 + 500000 = 3957966499

(packet->count_us - time_us) > TX_MAX_ADVANCE_DELAY)

3957466499 - 3957966499 = (according to my computer, I don’t do unsigned maths where numbers roll over) 4294467296 which exceeds 512000000.

So as a result of a packet being just 500ms late at the gateway it gets marked as JIT_ERROR_TOO_EARLY
(It will also report it early if the packet gets there just one microsecond after the current timestamp due to the unsigned maths)

2 Likes

Oh Lord, a roll over situation I hadn’t considered and it’s µs as well which just makes it even more sensitive. :scream:

Yes, I just editted my message to point out a packet being 1 microsecond late will be reported wrongly.

Thanks for doing the analysis Jac, this confirms what I had logically expected/instincts told me on the mods thread even though I dont get into the code details - the numbers behind this are enlightening and given the default often used of 530ms(?IIRC) that stacks with a close approximation to the 500ms used above.

So long as we are all aware of it and dont now panic when the message occasionally pops up hopefully we can close this and move on…now its on radar will the code be ‘fixed’? :thinking:

I might reconsider fixing it in MP forwarder now that the message appears in the console, however it is just a cosmetic issue, the net result is the same whether the packet is reported late or too early, it won’t be transmitted.

For people looking to solve it, use a packet forwarder that uses TCP, not UDP. BasicStation and MP forwarder using the ‘ttn’ transport are two of the options. TTN prefers BasicStation…

Which sadly, doesn’t yet have a wide enough set of install instructions - so Balena or …

The RX time stamp is it when the packet arrive where? At the server or gateway ( am not aware that the gateway will have a accurate timestamp).

How did you calculate this?

I this not a concern when downlinks and not sent? Don’t the server have to now recomputed if a downlink is necessary?

To quote Adrian: “this TODO can drink beer in a couple of years” - it was in the first commit to the code base for the jitqueue in June 2016.

Mostly this appears to be impacting the MAC commands as we do see rather a lot of them now-a-days. But it could impact a user downlink - at which point there should be a mechanism in the firmware to ack that downlink on the next normal uplink - I used to use a bit flag but I’m moving to the last four bits of the downlink counter when it’s not a MAC command so the server can compare & check that a user downlink has been rec’d. Although the housekeeping uplinks usually confirm that settings are as expected. Digital Twin for the win.

If it’s a MAC command then that will inevitably happen as the LNS won’t get back an ack in the next uplink. As for user downlinks, let’s go to the source, Luke …

Yes all my comments I were only referring to MAC command.

I excluded the once in 6 weeks downlink I need to send and it does not arrive, now I have to resend it and repeat until it gets there, so now I am trying every min, but it still does not get there, LOST SOME WHERE IN SA :japanese_ogre: :see_no_evil:

Nothing has changed in the code base of late & the metrics are the same, so perhaps you are exacerbating the problem with too many downlinks? Or just hyper-focusing with some hyper-ventilation?

Do you have a simple device with something like a TTIG that you can send a downlink to. And then turn off the TTIG and use a UDP based gateway to compare and contrast.

As for the code, any downlink error response appears to me to be logged but with some really strange text that’s not found much in the repro starting // that refers to the potential of letting the LNS know it didn’t work.