TTN GATEWAY central

Today I received my 2nd TTN gateway. Unlike my 1st gateway this one has been operating flawlessly for at least 6 hours now (whereas the 1st one never succeeded in maintaining a stable connection with the TTN network for longer than 20 minutes before descending into the reboot loop).

I have been receiving messages from my TTN node and a SODAQ One without any problems during this period.

This 2nd gateway is operating under virtually the same conditions (i.e. location, setup, even using the same wall socket :wink: ) as the malfunctioning 1st gateway (albeit that I did not install the beta firmware on the 2nd gateway that I installed (in vain) on the 1st gateway).

This strengthens my belief that the 1st gateway is suffering from some fault in assembly or component.

It strikes me as curious that the gateways do not have a serial number?! This would have been helpful in identifying a possible bad batch of malfunctioning gateways.

3 Likes

I talked to them and they told that they had 20 gateways running to reproduce the failure and have not been successful so far. It seems time to send them faulty ones I think.

4 Likes

@TheThingsNetwork just let me know and I can bring my gateway to your office in Amsterdam when going to work. :slight_smile:

1 Like

Same for me. If I can help by delivering my faulty TTN-gateway to your office, I would be happy to do so.

Why don’t they ship the 20 working ones to us then?

1 Like

In this forum there are 56 different users reporting that their NTT gateway is in the loop. If you add mine that has the same problems we are at least 57 identified users with a none working gateway !

greetings from Luxembourg

2 Likes

Maybe swap them out with people who come bring their bootlooping gw for testing?

Different types of reboot-problems have been reported. As long as it is not clear what exactly is causing this, a link with the environment can not be completely ruled out. The gateway could decide to reboot after the response to a a network request times out. It is therefore important that problematic gateways are brought into the lab for additional testing.

I’m activating my new TTN gateway. I have followed the steps but my gateway restart after the activated step.

LED1 and LED2 are on, LED3 is blinking slowy and become on but after few seconds LED3 switch from on to off to on and after gateway restart. ( in video it’s more easy https://youtu.be/Rb-KZeLbR-0 )

I have tried to reset from factory the gateway but I arrive on the same step every time.

Any advice to fix this ?

Thx

1 Like

Hi all.

I was replying to Arjan’s post about reboot reason 0x13 on the FAQ page, but I found out this was not the right page. So I replace it to here.


Hi Arjan,

I have this same issue, so I did a small investigation to, and I think it is indeed an bitwise OR.
As I did see a file called “wdt_p32mz2048efm144.h” in the github source I assumed the equal controller being used in the gateway is the pic 32mz2048efm144. The datasheet can be found here: https://www.mouser.com/ds/2/268/60001320D-967555.pdf.

Figure 6.1 in this document shows it is all combined to an or.
Heading 16 is telling is more about the WDT. In here I read the software on the controller must give a reset to the wdg timer within the wdg timer prescaler value called WDTPS. On github the file TTN_Gateway_v1/system_init.c contains a value called WDTPS so I am pretty sure this is the controller used on the board.

I will look futher into this tomorrow to find out why the WDT is triggered, but it looks like a software hickup on the controller and not something with the LARA module.


Edit:
So looking into the gateway I can confirm the controller is the PIC 32m2048efm144.

I found another interesting document about the wdt: http://ww1.microchip.com/downloads/en/DeviceDoc/61114E.pdf

So the #pragma config WDTPS = PS8192 in the firmware system_init.c could be according to the above document a post scale ratio divider of 1/8192. Which sets the watch dog timeout to 8,192 seconds. Now we only need to find out why the processor is being interupted for 8,192 seconds.

According to my UART log, the last thing the gateway is doing is inside the static void restart_lora_configuration(void) function in firmware/src/app_lora.c


Edit 2:

Ok so I found out WDT timeout is being reset every time “MON: SYS Stack size: xxxx” is being print to the UART log. This is done in the same function.

So there seems to be 2 issues.

1: LARA configuration is failing. The PIC 32m2048efm144 and the LORA module can communicate though, because it can retrieve the LARA version from the LORA module.

2: after 3 configuration tries, the software will perform a hardware reset to the LORA module. The WDT timeout seems to be to tied to successfully reset the LORA module.

4 Likes

OK, looks like my gateway is officially STUFFED!!! NOW!! after a few days powered off, I power on today and only the first (Power) LED is solid with the second LED flashing rapidly, and its been like this for hours!?

I’m rapidly losing faith with this hardware and the teams ability to communicate/resolve this in a timely manor

What does it mean when your gateway’s connected to the “Network” but “Broker Connection” is false? (as per the info page)

ah yep my gw is playing reboot roulette during activation too.

you can complete the github issue if you have more details or just +1 on the issue.

Today, I’ve been sent new firmware to create some more logging, which I returned to TTP just now.

3 Likes

yep was given a new FW and i’ve up-ed the logs out to the GH issue.

…but it seems the new firmware does not really log much more, if anything, if that was its intention. Peeking into the code (especially in the configure... functions in src/app_lora.c) shows some disabled logging that would be very, very helpful, I’d say. Maybe it’s time to compile my own version, some day…

1 Like

This may not help many here, since it relates to firewall rules rather then any gateway rebooting problem.

I had successfully activated a TTN gateway at home and then took it to another location where a device running m0n0wall firewall software generally limited traffic except to specified permitted ports or IP addresses.

I did the five second reset button push, after first powering on, to clear the previous, locked activation.

And when I tried to then to activate the device, the process stalled with one steady light plus one blinking light (i.e., checking for Internet) on the gateway. That was fixed when I opened outgoing access via the LAN to all UDP ports. When I tried to just add UDP access to destination port 1700 that didn’t seem to fix the problem for purposes of initial device activation.

At that point, the first three LEDs became steadily lit, with the fourth one blinking (i.e., gateway not successfully connecting to router). That was fixed when I opened up TCP access from the LAN to destination port 1883 in order to permit MQTT communication, as per the instruction here: http://www.hivemq.com/blog/mqtt-security-fundamentals-securing-mqtt-systems.

Then the activation completed successfully, and I was able to send traffic via the gateway using a TTN node.

I’m no longer at that location, but wonder if I could have further limited the outgoing UDP traffic after the activation was complete.

I just want to relate my experiences over the last few days with gateways. I am in the process of testing some end nodes and using two gateways, the TTN-915 and a Multitech MTCAP-915.

It was very easy to set up the TTN gateway by following the instructions on the website. It worked the first time and in less than an hour I had data showing up in my console. As a dummy Arduino user this was very gratifying. Setting up the MTCAP took a lot more work and I had to have someone at Multitech help me figure out how to connect to my laptop via wifi and help from a colleague on how to set up the packet forwarder, etc.

Both work well except:

The TTN gateway attempts to re-boot at least once every day, sometimes multiple times in a day as documented above. In these cases the console data stops updating. This was frustrating at first since I didn’t know if I had reached my message limit on TTN or my edge node had stopped working, and when debugging edge nodes this matters. Now when I don’t get the data I expect I check the TTN gateway which is usually waiting for a power cycle to get it back up and running.

The MTCAP has been running continuously for 24 hours now, sending data to the SEMTECH console and then being forwarded on to Cayenne. I was pleasantly surprised to find a complete record of the sensor data on both sites while the edge node, and the gateway, continued to run overnight.

My hope is that the TTN gateway can be fixed via a firmware upgrade soon since there is no reason it can’t run continuously and reliably also. But for customer network applications I think I will stick with the MTCAP for the moment since the costs are comparable and now that I know how to set it up, the ease of use is also.

Is your TTN gateway connected to the Internet via WiFi? If so, could you try switching to an Ethernet connection?