Are any of you with the loop/chronic reboot problem (not the occasional wifi disconnection) located in the United States?
Yes. I am.
I thought, onehorse, that you seemed to have the occasional wifi disconnection problem, and not the loop/chronic reboot problem that I was trying to allude to, one that may cause crashes every 60 seconds and doesnāt seem to allow the device to work well even for 10 minutes? The former seems like a problem that may be fixed with a firmware update. The latter, more severe, problem seems like it may be a problem with the Lora gateway boardā¦ and some have wondered whether there may have been a batch of boards that had a high rate of defects in the manufacturing or assembly processā¦
Quite right, my gateway works fine for hours at a time, so this is perhaps a different problem than never being able to activate the gateway, run it stably for more than a few seconds.
I assumed my problem was also a reboot problem since when it stops working, invariable the second led is flashing rapidly indicating failure to connect. Why would it disconnect once connected except upon some kind of restart/reboot event?
"Why would it disconnect once connected except upon some kind of restart/reboot event?"
My sense is that itās not that rare for client devices to at least briefly lose connection to an access point and a major aspect of the problem you are seeing, that one hopes will be cured via a firmware update, is that once the connection is lost it isnāt quite rapidly reestablished.
I have been running my MTCAP-915 gateway for 48 hours straight with no touble but this is on ethernet. I suppose to be fair, I need to swap it with the TTN gateway ;>
Yeah, Iām optimistic that your TTN device will be very stable on Ethernetā¦ Mine are. And Iād guess the MTCAP device will do fine on the Wifi. Do let us know the results if you try it.
Iām located in the Netherlands and the device isnāt able to get activated. The status is still ānot connectedā
Thanks for the info. If you havenāt yet had a chance to do much troubleshooting, here are some questions you might askā¦
Is it stable in that condition? Or is it rebooting repeatedly? Is it connected to the Internet via Ethernet? What are the condition of the LEDs? One on and the second slowly blinking? Thereās a helpful list of reasons associated with different patterns here:
Do you know its IP address? Can you browse to the IP address adding /info after the IP address? What does it say? Is there any kind of firewall that might be blocking UDP traffic?
Also one of my gateways has the same problem - stuck in a loop.
But two other gateways from an other pledge, which i configure at the same day, same time & same place and conditions, run without any problems .
OK, whatever caused the outage has been resolved and I have the TTN gateway working again now connected to ethernet. I will report its behavior over the next day or two to see if it has changed.
Are you able and willing to swap power supply and lora board between working and non working gw?
And make some detailed pictures of both circuit boards where chip ids are readable?
Peeking in the firmware code I finally understand the weird values in HTTP: Got 1232 bytes
: the URLs printed in the logs are not the ones actually used, but ?filter=ttn
is appended. And indeed, https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870?filter=ttn is much smaller than https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870
Iām also trying to enable debug logging and add much, much more logging, but then the UART output really starts losing many bytes, making it very hard to decipher my new logging. I tried to double DEBUG_PRINT_BUFFER_SIZE
, but then the compiler fails to allocate space. To be continuedā¦
The good news: I do see responses from the LoRa card, like LORA: recv_rpl: 0x23 0x35 0x1 0x0 0x0 0x0 0x0 0x59 0xd
.
The bad news: the very first step in the configuration already fails:
status *= configureRXChain(0, appGWActivationData.configuration_sx1301.rfchain[0].enable,
appGWActivationData.configuration_sx1301.rfchain[0].freq);
do we already have the āprogrammingā i.e. the command set details for the LoraCard?
UPDATE: one of the good working gateways, now also in an booting loop. Sometimes it receive an sensor packages, but most of the send sensor packages are lost due to rebooting cyclus. For this gateway, i had switch on the automatic update in the console. I changes the power supply, but no change.
At the moment , the gateway without problems, i have switch-off the automatic updates in the console.
So, my situation, only 1 of the three gateways is usable.
Increasing the value of DRV_USART_BAUD_RATE_IDX0
to, say, 921600
helps making the logs a bit more readable, though still not perfect. These problems might also be caused by my UART-to-USB cable. And of course, the same setting must be used when capturing the output, like in LOG="ttn-gateway-
date +ā%Y%m%d-%H%M%Sā.log"; pio serialports monitor --raw -b 921600 | while read l; do echo "[
date +ā%F %Tā] $l" | tee -a $LOG; done
, or when using a Raspberry Pi to capture the output.
This allows for changing the log level to SYS_ERROR_DEBUG
in:
ā¦and for enabling some disabled logging in the original code, and adding many more calls to SYS_DEBUG(SYS_ERROR_DEBUG, ...)
.
I added the following at the end of sendCommand
in src/app_lora.c
to know when things go wrong (not sure why the code uses status *= ...
which will still try all next configuration steps if a previous one has already failed):
SYS_DEBUG(SYS_ERROR_WARNING, "LORA: sendCommand %s\r\n", gotresponse ? "OK" : "ERROR");
ā¦yielding:
CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration
LORA: send_cmd: 0x23 0x31 0x1 0x0 0x0 0x55 0xd
LORA: recv_rpl: 0x23 0x31 0x1 0x0 0x0 0x55 0xd
LORA: sendCommand OK
LORA: send_cmd: 0x23 0x3a 0x1 0x0 0x0 0x5e 0xd
LORA: recv_rpl: 0x23 0x3a 0x10 0x0 0x1 0x1 0x4c 0x47 0x38 0x35 0x30 0x31 0x36 0x30 0x31 0x37 0x38 0x32 0x4 0x1 0xd
LORA: sendCommand OK
LORA: version: 01
LORA: configureRXChain(0, ...)
RF: 0,1,867500000
LORA: send_cmd: 0x23 0x34 0x6 0x0 0x0 0x1 0xe0 0xff 0xb4 0x33 0x24 0xd
LORA: recv_rpl: 0xd
LORA: sendCommand ERROR
LORA: configureRXChain(1, ...)
RF: 1,1,868500000
LORA: send_cmd: 0x23 0x34 0x6 0x0 0x1 0x1 0x20 0x42 0xc4 0x33 0xb8 0xd
LORA: recv_rpl: 0x23 0x34 0x1 0x0 0x0 0x58 0xd
LORA: sendCommand OK
LORA: configureIFChainX(0, ...)
Above, the call to configureRXChain(1, ...)
actually consistently succeeds. Or maybe the if(appData.rx_uart_buffer[1] == command)
in the call to configureRXChain(1, ...)
is seeing the response of the earlier call to configureRXChain(0, ...)
, as it somehow gets in too late? The if(TIMEOUT(4))
is not being hit. After the above lines the log is a mess again, but I see both OK
s and ERROR
s.
I need to run now; just posting my early findings in case it helps someone right now; will try to trigger the default configuration later, to ensure the config that is fetched from the internet is okay. To be continuedā¦
Would it make sense, after the gateway that stopped working properly has been turned on and powered up for you to press the reset button for five seconds so that it needs to be reactivated?. And then reactivate it selecting to not have automatic updates?
Yes!
Seeing LORA: recv_rpl: 0xd
above, which is a newline, I wondered what would happen if I always first read any pending RX from the LoRa moduleās UART before sending new commands.
Guess what: in two occasions there was indeed such excessive newline pending in the receive buffer while the firmware was about to send a new command. Even better: discarding those makes the LoRa configuration complete without any error. Next, the activation simply succeeds and it even downloads new firmware, which obviously no longer includes my fixes, but somehow is not booted (yet) as the SD card with my own firmware is still in the gateway, my own firmware is loaded again after the reboot. So, using my own firmware itās now activated and happily forwarding packetsā¦! And the debug logs no longer show garbage either.
To be continued, but: in my case the reboot loop during activation is apparently totally fixable by just using new firmware.
And Iām not even alarmed by this weird double occurrence of frame counter 9
, on two frequenciesā¦
(Well, I amā¦)
@arjanvanb Can you share the image you created, curious to see of this fixes the reboots I have every so many hours.
Here goes, based on todayās develop
branch: https://drive.google.com/open?id=15UMxp7voWhCAHY0_xvZDkWDPHf74pPuX
I will create a PR tomorrow have created a PR, so if you can wait a few days for TTP/TWTG to validate it: just wait
If you donāt want to wait:
- I donāt have the factory firmware for you, so: no way back!
- No need to wipe any existing configuration.
- Unpack the
/update
folder to the root of a FAT32 formatted SD card. - Remove power, insert the SD card, attach power.
- If youāre using a serial cable for logging: set the baudrate of your monitor to 921600.
- Leave the SD card in your gateway to avoid any downloaded firmware from overwriting it.
- After the updated firmware was downloaded, when the SD card is still in place, youāll see:
ā¦which makes me think that even when removing the SD card after that, it wonāt overwrite the firmware until something new is released. But I did not test.FIRM: Starting download FIRM: available bytes: 79 FIRM: (Downloaded FOTA key) 69 AE B7 78 1F 49 4E 7F BC B6 C7 CD 9C 59 4F 5D FA AA 3D 81 D4 9C 56 90 A6 83 81 98 FF 18 88 6A FIRM: (Stored FOTA key) 69 AE B7 78 1F 49 4E 7F BC B6 C7 CD 9C 59 4F 5D FA AA 3D 81 D4 9C 56 90 A6 83 81 98 FF 18 88 6A FIRM: Firmware is already downloaded MAIN: No new firmware available
This basically adds a bit more logging, plus:
// flushUart removes any pending bytes from the receive buffer.
void flushUart(DRV_HANDLE handle)
{
bool flushing = false;
uint8_t buffer[1];
while(DRV_USART_Read(handle, buffer, 1) > 0)
{
if(!flushing)
{
SYS_DEBUG(SYS_ERROR_DEBUG, "LORA: flushing: ");
flushing = true;
}
SYS_DEBUG(SYS_ERROR_DEBUG, "%02x ", buffer[0]);
}
if(flushing)
SYS_DEBUG(SYS_ERROR_DEBUG, "\r\n");
}
(Trying to adhere to the existing code styleā¦ Also, it would be nice to use, e.g., DRV_USART_ReceiverBufferIsEmpty
, but the tooling refuses to find that?)
The above is then invoked at the start of:
bool sendCommand(uint8_t command, uint8_t* payload, uint16_t len)
{
flushUart(appData.USARTHandle);
bool gotresponse = false;
...
SYS_DEBUG(SYS_ERROR_DEBUG, "LORA: sendCommand %s\r\n", gotresponse ? "OK" : "ERROR");
return gotresponse;
}
I only get LORA: flushing: 0d
(being an empty line) once now.
However, the mystery continues: I doubt getting two lines as a reply is normal, and I donāt understand how the code could ever print the following!? (See followup post for an explanation.)
RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
Okay, the code in my firmware uses "%02x"
to print the hexadecimal values, which is different from the "%#x"
in the code on GitHub. But how could it ever (consistently!) print two subsequent lines for LORA: recv_rpl
for this very commandā¦!?
My logging
This is complete; see next post for an explanation.
CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration
LORA: send_cmd: 23 31 01 00 00 55 0d
LORA: recv_rpl: 23 31 01 00 00 55 0d
LORA: sendCommand OK
LORA: send_cmd: 23 3a 01 00 00 5e 0d
LORA: recv_rpl: 23 3a 10 00 01 01 4c 47 38 35 30 31 36 30 31 37 38 32 04 01 0d
LORA: sendCommand OK
LORA: version: 01
LORA: configureRXChain(0, ...)
RF: 0,1,867500000
LORA: flushing: 0d
LORA: send_cmd: 23 34 06 00 00 01 e0 ff b4 33 24 0d
LORA: recv_rpl: 23 34 01 00 00 58 0d
LORA: sendCommand OK
LORA: configureRXChain(1, ...)
RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(1, ...)
IF: 1,1,1,-200000
LORA: send_cmd: 23 35 07 00 01 01 01 c0 f2 fc ff 0f 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(2, ...)
IF: 2,1,1,0
LORA: send_cmd: 23 35 07 00 02 01 01 00 00 00 00 63 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(3, ...)
IF: 3,1,0,-400000
LORA: send_cmd: 23 35 07 00 03 01 00 80 e5 f9 ff c0 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(4, ...)
IF: 4,1,0,-200000
LORA: send_cmd: 23 35 07 00 04 01 00 c0 f2 fc ff 11 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(5, ...)
IF: 5,1,0,0
LORA: send_cmd: 23 35 07 00 05 01 00 00 00 00 00 65 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(6, ...)
IF: 6,1,0,200000
LORA: send_cmd: 23 35 07 00 06 01 00 40 0d 03 00 b6 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChainX(7, ...)
IF: 7,1,0,400000
LORA: send_cmd: 23 35 07 00 07 01 00 80 1a 06 00 07 0d
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
LORA: configureIFChain8(...)
IF8: 1,1,-200000,250000,7
LORA: send_cmd: 23 36 08 00 01 01 c0 f2 fc ff 02 02 14 0d
LORA: recv_rpl: 23 36 01 00 00 5a 0d
LORA: sendCommand OK
LORA: configureIFChain9(...)
IF9: 1,1,300000,125000,50000
LORA: send_cmd: 23 37 0b 00 01 01 e0 93 04 00 03 50 c3 00 00 f4 0d
LORA: recv_rpl: 23 37 01 00 00 5b 0d
LORA: sendCommand OK
LORA: send_cmd: 23 40 01 00 34 98 0d
LORA: recv_rpl: 23 40 01 00 00 64 0d
LORA: sendCommand OK
LORA: send_cmd: 23 31 01 00 00 55 0d
LORA: recv_rpl: 23 31 01 00 00 55 0d
LORA: sendCommand OK
LORA: send_cmd: 23 30 01 00 00 54 0d
MON: SYS Stack size: 2870
MON: heap usage: 152KB (233KB), free: 187KB
LORA: recv_rpl: 23 30 01 00 00 54 0d
LORA: sendCommand OK
LORA: configLora OK
LORA: Configuration succeeded
LORA: Starting operation