Using ports other than 1 with LMIC

sunbutncat · February 13, 2021, 8:05pm

Hi,

I’m wondering if anyone has had any issues with secondary LoRaWAN ports not coming across TTN correctly. In my current setup, I’m using LMIC to broadcast short status messages on Port 2 that consist of 8 flags in the form of a bitmap. Here is what the code looks like:

void send_status() {
  uint8_t txBuffer[1];
  LoraEncoder encoder(txBuffer);

  encoder.writeBitmap( _BME280_found, 
                   gps_available(),
                   _ssd1306_found,
                   _axp192_found,
                   _ifWebOpen,
                   _ifSetMiner,
                   _ifAuthenticated,
                   _ifLaunched );

  // Battery / solar voltage
  //encoder.writeUint8(0);

  boolean confirmed = false;
  ttn_cnt(_count);
  Serial.println("Sending balloon status message.");
  ttn_send(txBuffer, sizeof(txBuffer), 2, confirmed); 
  _count++;
}

where ttn_send is written like this:

void ttn_send(uint8_t * data, uint8_t data_size, uint8_t port, bool confirmed){
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & OP_TXRXPEND) {
        _ttn_callback(EV_PENDING);
        return;
    }

    // Prepare upstream data transmission at the next possible time.
    // Parameters are port, data, length, confirmed
    LMIC_setTxData2(port, data, data_size, confirmed ? 1 : 0);

    _ttn_callback(EV_QUEUED);
}

About one minute after startup, I’m broadcasting observation messages on Port 1 that consist of temperature, pressure, humidity, GPS coordinates and including the same bitmap byte normally used in the status messages. These messages come across just fine and here you can see I’m using the LMIC library the same way.

void send_observation() {
  //LoraMessage message;

  uint8_t txBuffer[31];
  LoraEncoder encoder(txBuffer);

  encoder.writeLatLng( gps_latitude(), gps_longitude() );
  encoder.writeUint16( gps_altitude() );
  encoder.writeRawFloat( gps_hdop() );
  encoder.writeUint16( (uint16_t)_elevation_now ); // In meters
  encoder.writeUint16( (uint16_t)(_pressure_now*10.0) ); // Convert hpa to deci-paschals
  encoder.writeTemperature(_temperature_now);
  encoder.writeHumidity(_humidity_now);
  encoder.writeBytes( _launch_id, 8 ); // Add 8 bytes for the launch_id
  encoder.writeBitmap( _BME280_found, 
                   gps_available(),
                   _ssd1306_found,
                   _axp192_found,
                   _ifWebOpen,
                   _ifSetMiner,
                   _ifAuthenticated,
                   _ifLaunched );

 Serial.println("Built message successfully");

// LORAWAN_CONFIRMED_EVERY is defined in a configuration file as 0
#if LORAWAN_CONFIRMED_EVERY > 0
  bool confirmed = (_count % LORAWAN_CONFIRMED_EVERY == 0);
#else
  bool confirmed = false;
#endif

  ttn_cnt(_count);
  Serial.println("Sending balloon observation message.");
  ttn_send(txBuffer, sizeof(txBuffer), 1, confirmed);

  _count++;
}

The very first Port 2 Status message gets received by the gateway, but the following 5 are discarded (counts 2,3,4,5 in the picture), even if the subsequent flags are changed (payload is different). Interesting that the first Status message includes is annotated with “Retry” by TTN. The Port 1 Observation messages, however, come across just fine… in the picture you can ignore the empty byte 00s , as it’s just because Lat/Lon fields haven’t been populated by that time.

I’m using a TTGO T-beam V1.1 with the LMIC library to send the messages and a 915mhz TTN Indoor gateway to receive the data. I’m forwarding to ttn-handler-eu. I’m wondering if there’s any obscure fact that I’m missing. My hunch is that the LMIC library isn’t clearing out a cache or something to that effect. Or perhaps the TTIG gateway isn’t forwarding correctly to the TTN eu server?

Has anyone seen these port issues before?

Nick Screen Shot 2021-02-08 at 7.54.27 AM

descartes · February 13, 2021, 8:54pm

I have a range of devices on the same application that have different sets of sensors on and I use the port number to differentiate between them with no problems using LMiC

I suspect your problem may be that you are flooding the airwaves and something isn’t keeping up. Slow the frequency of transmission down to Fair Use Policy and if it’s still not working we can look again.

sunbutncat · February 13, 2021, 10:33pm

Glad to hear you don’t have any issues with LMIC in this respect.

I adjusted the time per your thoughts and it produced the same thing, but without the “retry” designation. Maybe this is an improvement, maybe not.

The observation payloads on Port 1 are 31 bytes spaced 5 seconds apart, and the status messages on Port 2 are only 1 byte spaced 15 seconds apart, so it shouldn’t be the time on air. Also, I’m using SF7 and only use the rapid sending of messages for about 10 minutes, so I shouldn’t be breaking the Fair Use policy with this method.

cslorabox · February 13, 2021, 10:35pm

It would help if you could provide the raw gateway records of these uplinks.

If you’ve been often starting your node over from frame count 0, you could be getting traffic seen as a retry and rejected. If your node starts over from zero, you’re going to have to reset the frame counter in the network console each time; a proper LoRaWan node continues the frame counter even if you power cycle it or change the firmware, but few nodes truly implement LoRaWan correctly.

Something the issue is however most definitely not is a fair use policy violation, as even you are quite obviously and extremely violating that with your described usage, there’s no enforcement.

descartes · February 13, 2021, 10:38pm

FUP aside, and without any feedback on what your new schedule is, knowing the timings inside LMiC, I’d strongly recommend setting them at least 30 seconds apart to debug - I generally see it taking the scheduler a few seconds to pick up the send request, transmit and then there are the Rx1 and Rx2 windows - barely time to fit it all in inside 5 seconds.

This way you will know if you are overloading LMiC

cslorabox · February 13, 2021, 10:39pm

LMiC not being ready is a distinct possibility; if v3’s 5 and 6 second RX windows apply, it simply could not be ready to transmit again in 5 seconds.

And anyway, such an interval is absurd.

But there’s a 65 second gap from the port 2 message that does show up, to the first of the absurdly too often port 0 messages.

No doubt slowing things down should be the first step; but then looking at LMiC serial logs and raw gateway view.

Jeff-UK · February 13, 2021, 10:53pm

I hope you are using SF7, and depending on where in the world you are the problem you may face is a knock on the door as I think 31bytes at SF7 comes in close to 90-100ms airtime so running at near 2x legal limit in many places of 1% duty cycle as Nick and Chris said you need to back it down whatever…

sunbutncat · February 13, 2021, 11:13pm

Looks like what we see on TTN for the application matches what the gateway is forwarding to the network server. The 65 second gap you mention is the problem I’m trying to fix… there should be additional Status messages happening at counts 2, 3, 4, 5 until the first Observation message comes at count 6.

To address the airtime constraint concerns, the rapid succession of messages is just for 10 minutes. It’s for a weather balloon project, and we want to collect as much data as possible below 900 millibars. Afterwards, the messages get spread out to 1 per minute.

And the entirety of the balloon flight is 90 minutes… so after the calculation, others can see the 1% duty cycle shouldn’t be broken, at least with respect to daily allowances, since we will not be flying more than 1 per day.

But if v3 does indeed require 5 and 6 second RX windows, then yeah that’s cause for me to bump down the transmission rates, because we’ll be using 2 downlinks per launch. Screen Shot 2021-02-13 at 5.01.25 PM

cslorabox · February 13, 2021, 11:16pm

That’s still not legitimate.

What you should be doing is cacheing those rapid readings and packing several into a packet transmitted less often (perhaps with some overlap of coverage so each gets transmitted more than once).

Ideally try to balance the application part of the payload against the weight of the header - transmitting packets with only tiny application payloads frequently makes no sense, as if you do so you’re overwhelmingly transmitting overhead and not data.

Eg, don’t transmit one payload byte every five seconds (which will end up over 90% overhead), transmit 12 bytes (covering a minute) every 30, with half of them being repeated in the next packet. At SF7 you might as well go twice as large and only transmit once a minute.

sunbutncat · February 13, 2021, 11:28pm

You’re right the 1 byte payload is turning into 14 bytes with the header. Most of needless overhead, so in the future I’ll use fewer transmits.

Unfortunately, using a 30-second interval doesn’t seem to fix the original problem. Here is a new series of packets. Only the first Status message makes it to the network and no more after it.

Screen Shot 2021-02-13 at 5.34.24 PM

cslorabox · February 13, 2021, 11:49pm

You need to be monitoring the debug log from LMiC, and possibly make it more verbose.

But note also that you don’t have a 30 second interval, you’ve actually tried to pack 4 packets into 83 seconds. An actual 30 second interval would have been:

17:19:03 frame count 0
17:19:33 frame count 1
17:19:03 frame count 2
17:19:33 frame count 3 not 17:19:23 frame count 4

sunbutncat · February 14, 2021, 12:02am

Yes, the count difference is simply because I begin broadcasting observations after 1 minute. It’s still waiting 30 seconds in between send_status() messages.

_send_interval = 30000; // Status message every 30 seconds
  uint32_t last = 0;
    while ( !_ifLaunched ) {

      if ( last == 0 || millis() - last > _send_interval) {
        // Check if pressure is 2 mb different
        if ( get_pressure() + 2.0 < _sfc_pressure )
        {
          _ifLaunched = true;
        }

        // Used only for testing purposes. NEEDS REMOVAL!
        if (millis() > 60000) {  // Trigger observations after 1 min no matter what
           _ifLaunched = true;
        }

        // Send status message every 30 seconds
        send_status();
  
        // Store for next iteration
        last = millis();
  
      } // End 30 second loop
    }// End _ifLaunched check

But I’ll take a look at the LMIC debug log and report back.

Thanks,
Nick

cslorabox · February 14, 2021, 12:06am

No. Your log quite clearly shows that either it isn’t waiting as long as you wanted it to, or it’s skipping numbers without having transmitted a corresponding packet.

It’s possible that part of the issue is that your timebase is wrong, and what your software thinks is 30 seconds is actually 20. That would probably also mean that all the receive windows are broken, which will not play well with TTN v3.

LoRaTracker · February 14, 2021, 9:18am

My recollection is that duty cycle limits are calculated on an hourly basis.

descartes · February 14, 2021, 11:12am

@sunbutncat, please can we actually debug the problem that you came to this forum with rather than alter code that is raising eyebrows all over?

You wanted to know why uplinks were being lost. The only way to tell is to send them at more reasonable intervals for the purposes of testing because that way you can see if you are, as I’ve said before, overloading the LMiC stack. Two summers back I printed the source out and read it so I could understand what it did and I can assure you, some parts of the internal mini-RTOS/scheduler are quite convoluted. The scheduler can queue internal messages. I’ve not tried putting two messages in the queue, which would be a challenge as soon as the scheduler sees a send job it normally rejects a request to put another one in the queue.

The most scientific test will be to remove all the other code and just run an LMiC stack on a loop at 30 second intervals (I’ve done that before on constrained devices) and then run a binary search on where it starts to drop uplinks.

I have a good idea of what the answer will be as I’ve just had a device go off piste, possibly related to the extreme cold it experienced that affected something - it’s on the bench / in the freezer for investigation - but it got out of hand with a series of transmissions until it drained the battery. And my first backup tracker for HAB with backup SMS on it got stuck up a tree and spent 36 hours sending me texts every minute until the batteries ran out. I’ve not added & tested some backoff code to that. So I know how easy it is to end up with some seemingly good code that goes rogue when you haven’t figured out all the situations it can end up in.

Once that’s been verified or not, we can look at other potential reasons.

Why are you using a 915MHz gateway and the EU handler? And with a TTIG, how do you manage that?

There is a possibility that the TTIG can’t keep up if there is any significant latency in the WiFi.

I’m not sure a TTGO T-Beam has the build quality I’d want to use to track the Helium/Hydrogen and Latex I’ve spent money on. And its firmware builds can add in an RTOS so you end up with LMiC’s mini-RTOS being scheduled by a bigger RTOS …

Some things to note:

@cslorabox is building a custom gateway at present
I fly HAB with my own trackers and create LoRaWAN solutions for companies.
@LoRaTracker’s not called LoRa Tracker for nothing, his boards are used for HAB (and his own personal satellite)
@Jeff-UK was around when LoRa was invented back in the 1600’s

So you have the attention of some reasonably knowledgeable people.

You should also consider that your device will be being received by other peoples gateways for quite a distance and then making use of the TTN back end servers, so as well as duty cycle, the Fair Use Policy there is also the Fair Play strategy.

sunbutncat · March 20, 2021, 2:29pm

Apologies for taking so long to respond here. We were hit by a winter storm here in Dallas, TX shortly after this conversation. And then my work on the issue stopped for a while after we got power back.

To answer unresolved questions, I’m using a 915MHz gateway here in the U.S. for testing, but the devices are sent to Africa under 868MHz band and EU handler. There are essentially no gateways in the region I am transmitting (the ones in southern Nigeria that appear on TTN are my own). But after this discussion, I will explore packing multiple observations into a single packet to reduce airtime.

In case anyone comes to this post with a similar original problem, it turned out to be unrelated to the selected LoRaWAN port. The problem was actually that the statement os_runloop_once() needs to be called continuously, presumably to clear out LMIC’s internal buffers.

Since I had the “while(!ifLaunched)” loop in the Arduino setup() function, the first status message sent fine and the following messages did not. Simply placing os_runloop_once() inside the while loop (and outside the if statement) allowed all subsequent messages to flow naturally using LMIC.

descartes · March 20, 2021, 2:54pm

Good to hear you resolved the problem and you got over, as the British would say, your ‘cold snap’!

Not so much to clear the internal buffers, more like time to do anything at all. Everything it does is called from that loop.

system · March 21, 2021, 2:55pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.