Multiple new devices trying to OTAA

floodnetwork · August 4, 2017, 3:49pm

We seem to be seeing problems with joins when starting a batch of devices “simultaneously”.

When testing a new build of devices we start them at intervals of maybe a minute. We can see the activation messages appear on the Data tab, but only one or two seem to progress past this point. I have a feeling this is due to the activation of so many devices at the same time.

I was under the impression that the gateway can’t send many downlink messages because it’s under the same duty cycle limits as the nodes. If a gateway can’t transmit a Join Ack then what happens to this ack? Does it sit in a downlink queue waiting for the next Join Request or is it discarded?

Ben

kersing · August 4, 2017, 4:31pm

At workshops the same issue pops-up. As you assume, you are probably running out of airtime for the gateway which has to adhere to the same limits a node is subjected to. If I recall correctly the console might list a message in the device data stating there is no gateway available to send the response.

If there is no gateway available to transmit a packet (join response or downlink) it is discarded. For joins a new join request will be sent by the node for which a new reply will be generated by the back-end.

You could add gateways at your location to increase the available downlink capacity. TTN will choose the gateway with best link and available airtime to transmit a packet.

arjanvanb · August 4, 2017, 7:52pm

Indeed: TTN Console will show “Activation not valid: no gateways available for downlink”.

floodnetwork · August 14, 2017, 4:19pm

Thanks @arjanvanb and @kersing - I appreciate the help!

hoonppark · August 22, 2017, 9:25am

@kersing, I have experienced the same problem. Only two devices can successfully join and the third device always fails.

It seems link there are only two DownlinkOptions available at any given time.
By looking at the source code blocks below, it looks like if ‘len(downlinkOptions)’ becomes 0, it causes the join failure problem due to ‘No gateways available for downlink’

In the ‘~/Go/src/github.com/TheThingsNetwork/ttn/core/broker/activation.go’ source code file, line #85 shows:

// Select best DownlinkOption
if len(downlinkOptions) > 0 {
	deduplicatedActivationRequest.ResponseTemplate = &pb.DeviceActivationResponse{
		DownlinkOption: selectBestDownlink(downlinkOptions),
	}
}

In the ‘/home/hoon/Go/src/github.com/TheThingsNetwork/ttn/core/handler/activation.go’ source code file, line #103:

if activation.ResponseTemplate == nil || activation.ResponseTemplate.DownlinkOption == nil {
	return nil, errors.NewErrInvalidArgument("Activation", "No gateways available for downlink")
}

If this problem is due to the duty cycle enforced to the gateway, if there are hundreds of devices that try to join the network through the same gateway, it will take a long time for all devices to join.

For regions like Korea that is using LBT (Listen-Before-Talk), there is no duty cycle limit. I think gateways for the Korean region should not have this problem and allow more than 2 devices (maybe 100 devices) to join the network at the same time. I’m not really sure if it can be done even with LBT because I do no know how many downlink messages can be queued in one gateway.

hoonppark · August 22, 2017, 2:13pm

Let’s say there are 100 (OTAA) devices that are sending and receiving LoRa packets via one gateway.

Question 1. It will take some time for all 100 devices to join the network. right?

When these 100 (OTAA) devices are turned on for the first time, they will try to join the LoRa network. It seems like there is a duty cycle enforced on a gateway so only two devices can join the network at the same time. It means all other 98 devices will fail to join the network and retry over and over again. Eventually all 100 devices will join the network.

Question 2. In case of the Region using LBT. Can 100 devices join the network at the same time?

In the region using LBT, the duty cycle is not enforced. I guess this means if 100 devices try to join the network at the same time, all these 100 should be able to join the network at the same time. But, I’m not sure if it can happen.

Question 3. How many downlink messages can a gateway hold in its downlink queue?

Let’s say all 100 devices have joined the network and ,theoretically speaking, they are transmitting "confirmed’ uplink messages all at the same time. When a network server schedule downlink messages for all 100 devices through one gateway to send acknowledgement to a uplink message from each device, can one gateway hold 100 downlink messages that should be sent down to 100 different devices? How long it will take for a gateway to transmit these 100 messages to 100 devices?

Question 4. Can a TTN Router component hold 100 downlink messages that are destined to one gateway? And, what is the maximum downlink messages one Router component can hold in its downlink queue?

htdvisser · August 24, 2017, 6:14am

A 1. Yes, if many devices join at exactly the same time there will be many collisions, and the gateway will be unavailable for downlink if it has any kind of duty cycle
A 2. No, they will “hear” another device talking and ~will~ should wait with sending the join
A 3. Those downlink messages have to be scheduled at exact times. If the gateway is not available at that time, messages will simply be dropped instead of queued.
A 4. Sure, as many as you can fit in memory, but again, conflicting downlinks will just be dropped.

hoonppark · August 25, 2017, 9:59am

@htdvisser, thanks for the answers.

I’d like to ask some additional questions if you don’t mind.

My questions are as follows:
(1) It seems like only 2 devices can JOIN the TTN server at the same time. Is it true?
(2) If it is true, why? Is it because of RX1 and RX2 (only 2 receive time slots) on the device side?
(3) If it is not true, How many devices can JOIN the TTN network server simultaneously?

htdvisser · August 25, 2017, 10:27am

This is purely because of the limitations of the gateway. The gateway simply can’t transmit more than 1 packet at the same time, so if multiple joins happen at exactly the same time, the gateway can only answer two of them (one in RX1 and one in RX2)

The network server can handle as many joins at the same time as your server’s CPU can handle.

Please keep in mind that the LoRaWAN specification also asks you to avoid this kind of situation.

Uplink frames that can be triggered by an external event causing synchronization across a large number of devices can trigger a catastrophic, self-persisting, radio network overload situation.
An example is typically the JoinRequest of a group of end-devices. The whole group of end-devices will start broadcasting JoinRequest uplinks and will only stop when receiving a JoinResponse from the network.
Uplink transmissions backoff shall be random and follow a different sequence for every device, for example using a pseudo-random generator seeded with the DevAddr

hoonppark · August 25, 2017, 1:54pm

@htdvisser, thank you for the answers.

I thought a gateway could transmit multiple downlink packets via multiple channels at the same time. When you say “The gateway simply can’t transmit more than 1 packet at the same time”, do you mean 1 packet at the same time per channel? Or, 1 packet at the same no matter how many channels are available on a gateway at a given moment.

If “1 packet transmission at a time” even if there are multiple available channels on a gateway,
this limitation (1 packet transmission at a time) must be applied to the confirmed messages when the TTN server sends ACK downlink messages.

If I had 100 devices with one gateway in the area, the same problem could happen. Could the same problem be avoided by calculating the random transmission time for each device?

kersing · August 25, 2017, 2:23pm

The current generation gateways have just one transmitter, so one transmission at a time no matter how many channels are available.
And the limit applies to all transmissions, OTAA responses, confirmed messages and downlink messages.

hoonppark · August 29, 2017, 7:37am

One transmitter would be fine if a gateway can handle multiple JOIN ACCEPT (JOIN Confirm messages) almost simultaneously by putting the all downlink message in a queue.

My question is:
When there is one JOIN Request, a JOIN ACCEPT message will be sent to a device after 5 seconds by JOIN_ACCEPT_DELAY1. During this 5 second waiting period, can a gateway transmit other downlink messages such a uplink confirm message or other JOIN ACCEPT message that 5 second waiting period has expired?

Or, does the first JOIN ACCEPT message scheduled is overwritten by other donwlink messages that need to be transmitted immediately?

arjanvanb · August 29, 2017, 11:57am

No. The node needs to know exactly when, and at which channel, it can expect a response: nodes are only listening at that specific time to that specific channel. So, the backend or the gateway cannot just choose some random time or channel at which to transmit the Join Accept. Instead, the backend can only choose between RX1 and RX2. (Same applies to confirmations and downlinks.)

Yes, if the gateway’s duty cycle allows for that.

hoonppark · August 29, 2017, 3:19pm

So, when one gateway’s RX1 and RX2 are both occupied for transmitting JOIN ACK downlink messages, will the 3rd JOIN REQUEST fail and cause the following error message - "Activation not valid: no gateways available for downlink’ ?

And, the reason for the error message “Activation not valid: no gateways available for downlink” is because the gateway’s RX1 and RX2 are occupied simultaneously and there is no way to send down extra downlink messages. So, the TTN Router had to make the 3rd JOIN REQUEST message fail in this situation and display the error message “Activation not valid: no gateways available for downlink”. Do I understand it correctly?

Since this limitation - only 2 devices can join the network simultaneously - is due to the limitation of all LoRa gateways, all other Network Servers such as Actility Server must have the same issue. right?

I wonder if there is any workaround…

arjanvanb · August 29, 2017, 4:01pm

Yes. (But there is no guarantee that the first two Join Requests will succeed; see below.)

Yes, if multiple nodes are trying to join at exactly the same time. But apart from that, the gateway also has a maximum duty cycle, just like any other LoRaWAN device.^† This implies that after transmitting in one sub-band (not limited to Join Accepts; this could also be a “regular” confirmation or downlink of data), a gateway has to keep quiet for some time before it can use that same sub-band again. So even when nodes do not try to join at exactly the same time, but slightly after each other, you might see that very same message.

As for terminology: I guess you understand that a gateway does not have only a single RX1 and a single RX2; all is relative to the time some uplink was received. RX1 and RX2 are related to a specific node, but the exact time slots related to that might already be occupied for a specific gateway, or be forbidden by the gateway’s duty cycle.

Yes. And you cannot rely on 2 nodes being able to join at the same time anyhow:

A gateway might only receive one or even none of your Join Requests to start with, as simultaneous transmissions on the same channel/SF will collide. (Also affected by other nodes sending uplinks.)
It might receive none of the Join Accepts if it was transmitting for some other node while your nodes sent their Join Request. (Current gateways cannot listen while transmitting.)
If the network does not prioritize Join Accepts over uplink confirmations or downlinks, then when many nodes require such confirmations or downlinks, the network might not be able to send Join Accepts either…

All you can do is follow some LoRaWAN basics:

Randomize, like htdvisser wrote above.
Only join when needed, so: keep using the “session” keys as long as possible.
Limit confirmed uplinks and downlinks. (Here TTN might do better than Actility Server due to TTN’s Fair Access Policy?)

^† In regions with duty cycles; I just read your remark “regions like Korea that is using LBT (Listen-Before-Talk), there is no duty cycle limit”.

hoonppark · August 30, 2017, 6:02am

I have several additional questions regarding the use of a sub-band (or a channel).

Let’s use Korea as a target region for this discussion.
Korea is using LBT (Listen-Before-Talk), not a duty cycle policy.

In Korea, all devcies must use one of the following three default channels (sub-bands) when they transmit a JOIN REQUEST.

(1) 922.1MHz (DataRate 0 to DataRate 5)
(2) 922.3MHz (DataRate 0 to DataRate 5)
(3) 922.5MHz (DataRate 0 to DataRate 5)

I could see devices were sending JOIN REQUESTs using one of these 3 default channels with a spreading factor 12.

In this situation, my questions are as follows:

Q1. Theoritically speacking, if 2 devices have transmitted a JOIN REQUESTs by using default channels 922.1MHz and 922.3MHz simultaneously with a spreading factor 12,
can a 3rd device send a JOIN REQUEST using the first default channel 922.1MHz with a different spreading factor (such as a SF11)? I think it should be possible.

Q2. Is this 3rd JOIN REQUEST going to fail because RX1 and RX2 on a gateway is already occupied by the first 2 devices’ JOIN REQUESTs even if the 3rd device is using a different SF (which is SF11)?

Q3. Are these RX1 or RX2 on a gateway unavailable until a JOIN REQUEST Accept message is sent down to a joining device?

Q4. If a gateway assigns RX1 and RX2 for each device, I’m not sure why the 3rd device fails. (This RX1 and RX2 story is still not clear to me.)

Q5. My understanding is a gateway is a dumb device which just transmits a downlink message as soon as it recevies a downlink message from a Network Server.
In TTN’s case, TTN’s Network Server or a Router schedules a downlink message and sends a downlink message when a gateway has to transmit it to a device. Is it right?

kersing · August 30, 2017, 6:52am

You keep asking the same question, not new/additional ones. I would suggest you download the LoRaWAN specification and read it, especially the parts regarding the receive windows. (RX1 and RX2)

A1. Yes multiple devices can send join requests at the same time using different frequencies and spreading factors.

A2. One of the 3 join requests is going to fail because the gateway is only able to send a response to two nodes.

A3. RX1 and RX2 are NOT on the gateway. RX1 and RX2 are on the node and refer to the time window the node is listening for a response. The gateway ‘just’ sends packets at the predetermined time, it does not know if that packet is targeted at the nodes RX1 or RX2 window.

A4. See A3, the gateway does not know about RX1 and RX2. It just gets a packet from the back-end that it needs to send at a certain time. As there is only one transmitter only one packet can be transmitted at that time, so if multiple nodes are listening for a response at the same time window only one can get a downlink packet.

A5. Nearly correct. The gateway gets the data a short time before it needs to be transmitted, each packet contains a time stamp telling the gateway when to transmit it.

In a (final) attempt to explain the problem:

If node A transmits a join request at moment T it expects a response at T + 5 seconds -/+100 milliseconds (=RX1) or at T + 6 seconds -/+ 100 milliseconds (=RX2).
If node B transmits a join request at moment U it expects a response at U + 5 seconds -/+ 100ms or at U + 6 seconds -/+100ms)
If node C transmits a join request at moment V it expects a response at V + 5 seconds -/+ 100ms) or at V + 6 seconds -/+100ms)

Now if T equals U equals V (all nodes transmit at the same time) TTN can instruct the gateway to send a response at T + 5 seconds and at T + 6 seconds. There is no third option as none of the nodes will be listening for a response at a third time interval. This means only two nodes can receive a join accept message.

htdvisser · August 30, 2017, 7:18am

Let’s start by clarifying the RX1/RX2 story. After an uplink message, there are two opportunities for sending a downlink message back to the device (RX windows). So at the first window, the device listens for a very short time. If no transmission is detected, the device stops listening and waits for the second window. If a transmission is detected, the device starts receiving it and does not use the second window.

Now, when the gateway receives two uplink messages at the same, it will be able to answer one in the RX1 window and the other in the RX2 window.

However, when a third uplink would also be received at exactly the same time, there is no way to send the corresponding downlink, because the transmitter of the gateway is already busy at the time of the device’s downlink windows.

Even when that third uplink would be received a little bit later, the transmitter of the gateway would still be busy with the other transmissions

So to answer your questions:

Yes, it can send a join request, and it may be received by the gateway, as multi-channel gateways can indeed receive multiple SFs on the same channel at the same time.
The third join-accept is not going to be sent because the gateway’s transmitter will already be busy with the other join-accept
The gateway is available as long as it’s transmitting (plus DutyCycle or tOffAir if applicable). If a downlink transmission can be fit in the “white space” in the picture above, then we can send it just fine
See above
It transmits at an exact time, there is no queueing until there is a “free spot”. The only reason we queue on the network side, is because some gateways don’t have enough memory to fit more than one scheduled message.

hoonppark · September 1, 2017, 2:59pm

@htdvisser and @kersing, Thank you very much for taking your time to provide a detail explanation. I understand it more clearly by both of your answers.