normalizeUplink vs the mandatory schema validation

Hello,

I am trying to understand as to why the validation of the output of the normalizeUplink() function against the Normalized Payload Schema is mandatory?

I really like the idea of the decoupling between decodeUplink() and normalizeUplink(). It allows the usage of a vendor provided decoder as-is but having my own normalizer to restructure the output the way I want it.

I am feeding an Elastic cluster through webhooks and the output of the decoder is not how I want the data structured. Without touching the decoder, I have 2 choices; either sending my data thought some intermediate ingest pipeline to reformat or using the normalizeUplink() function and keep all this inside ttn.

My problem is that by making the normalizeUplink() output validation against the schema mandatory this leads to two possible problems:

  • I am in the obligation of following that schema, although it may not be of value to me since this data is being fed into my infrastructure and I may not necessarily want/need to follow that schema.
  • The schema does not support some of the fields/metrics of the device I am using. This has just happened to me with the SenseCap 2120. That problem has also been discussed in SenseCAP S2120 8-in-1 payload edit.

Wouldn’t it make sense to either allow disabling the schema validation of the normalizeUplink() output or allow providing and alternate schema for validation?

If this makes sense, I’ll be happy to file an issue about it. LMKWYT.

Thanks,
Colin

Not really - the idea is that it is standardised - you push in a temperature and everyone knows what’s coming out the other side. It totally stops being useful if you can bend its output to your will. However if there are missing schemes then yes, propose away, in detail, on GitHub.

You can replicate the exact same functionality with a standard hand-rolled payload formatter.

That said, PF are for whims :wink: - real programmers decode on their own platform - for two very very good reasons:

  1. One source of truth entirely in your control
  2. PF are not 100% guaranteed to run - if the server is under pressure and it times-out, no decoding.

Thanks for the quick answer.

the idea is that it is standardised - you push in a temperature and everyone knows what’s coming out the other side. It totally stops being useful if you can bend its output to your will.

You can replicate the exact same functionality with a standard hand-rolled payload formatter.

I’m not sure I fully understand the logic here. On one hand, you can bend the payload formatter output to your will with the decoder, and “everyone” would need to adjust to its format. On the other hand, the normalizer output schema is mandatory so that “everyone knows what’s coming out.” But who exactly is this “everyone”? Ultimately, the relationship between the producer and the consumer seems either top-down (I get to choose the format of what will be fed to the consumer — in my case, I am on both sides) or bottom-up (I have to conform to a specific schema provided by the consumer). Why does TTN need to police that relationship between the producer and the consumer?

Maybe I’m missing something here?

For me, having a decoupling between the decoder and the normalizer allows me to retain the vendor-provided decoder as-is (which simplifies maintenance) while handling the formatting in the normalizer. But I don’t quite see the value in necessarily adhering to your schema. As I said, I might be missing a bigger vision — please enlighten me if that’s the case!

real programmers decode on their own platform

This really depends on the overall architecture. Your vision of how to build things may not align with that of the people designing it. As long as the limitations (such as PFs not being 100% guaranteed to run) are understood, choosing to do some or all decoding upstream can be entirely reasonable.

Thanks,
Colin

You’re missing who you are talking to - this is the TTN forum, 99.999% volunteer run, with a mix of community, maker & professional users. TTI are the police, they invented it, it’s their baby, you need to talk to them.

LOL, having run a few integration webinars for TTN/TTI plus summer school and having my own code quoted back at me on the forum, lodged a number of issues when v3 was launched regarding the PF and was in the room when they announced the normalised payload system I think I’ve got a good handle on “the people designing it”.

During the v3 launch period there were compromises made to the PF size because it transpired that the majority of vendors just chucked everything in to one and got their marketing staff to write it for good measure. There are plenty of topics on here about vendor PF’s not working, one as recent as a couple of weeks ago that I answered/fixed with my WOPR.

Apart from not being guaranteed to run, which admittedly is very rare, it also suckers people in to only saving the numbers rather than saving the raw payload and decoding locally in the language of their choice - quite a few people can’t translate an imperative language (JS) in to another imperative language (PHP, Python, Perl and others that don’t start with P) or vice versa - so it’s easier for them to decode locally in the language of their choice.

As well as delivering solutions, I do eat my own dog food - two days ago I saw a spike in a graph and it turned out an MCU had dropped below 0degC which was rather unexpected for even the far flung northern parts of England, minus temperatures outside, sure, but getting through the case and chilling the MCU - not something I’d anticipated. As I had the raw payload this was easily debugged.

TL;DR: Nothing wrong with using the PF as long as you have a backup in terms of the raw payload. If there are missing data types in the normalise schema, post a request on GitHub and haggle it out with the TTI dev team.

1 Like

Again, thanks for your response and perspective on this. I do not yet fully grasp the power dynamics of TTI/TTN or the pulse of this open-source community. Having a discussion here feels like the right way to step in and get a sense of that, I guess.

I think I’ve got a good handle on “the people designing it.”

Yikes, I didn’t mean to question your perspective on TTN/TTI. My comment wasn’t clear—sorry about that. I wasn’t questioning your knowledge of the design of TTN per se; I was referring to the design of a system using TTN. My point was about your statement on how things should be done: “real programmers decode on their own platform”, which I assumed to be TTN’s perspective on how I should approach things. What I meant to say is that the people designing systems that use TTN might not necessarily share your vision—or TTN’s vision, for that matter.

In the world of data ingestion, transformation, storage, ETL, etc., there are so many architectural considerations and visions that there’s no single “right” way of doing things.

Open sourcing (part of) the stack, providing free access, and having public forums is a fantastic way to gather community feedback, which is hopefully valuable for TTI. I hope this discussion about a TTN feature, its purpose, and usefulness, along with my perspective as a user, is of some value.

That being said, I still feel like I’m at the same place in my (mis)understanding of the usefulness of the normalizeUplink() function and its mandatory output schema validation. Here’s what I understand so far (please correct me if I’m wrong):

  • We have complete control of the output format of decodeUplink().
  • normalizeUplink() was added as a way to standardize the output of the PFs against a TTN-provided schema.
  • The rationale for this, as you explained, is:

the idea is that it is standardized — you push in a temperature, and everyone knows what’s coming out the other side. It totally stops being useful if you can bend its output to your will.

  • The conclusion, as I understand it, is that you can perform any transformation you want with decodeUplink(), but if you use normalizeUplink(), you must respect the provided schema so that “everyone knows what’s coming out.”

Maybe it’s just me, but I still don’t understand who this “everyone” is that TTN is trying to protect by strictly enforcing the output schema of normalizeUplink(). As far as I know, the data coming out of normalizeUplink() is ultimately only valuable for me and the consumers I work with. So why can’t I decide what the schema for normalization should be?

Also, the fact that the schema is not (and will never be) complete significantly limits its usefulness. I understand that we can submit issues or PRs for additions, but practically speaking, who wants to wait for an issue/PR cycle (weeks? months? years?) for a field or type to be added for a device they’re trying to connect? I just don’t get it. At the very least, we should be able to dynamically install a new schema if having mandatory schema validation is a central idea of this feature, no?

On the other hand, I do like the idea of decoupling decodeUplink() and normalizeUplink(). I just don’t understand why the schema has to be dictated by TTN, especially since anything in any format can already come out of decodeUplink(). This makes normalizeUplink() useful only for what is supported today, rendering it unusable for devices with fields that aren’t supported in the current schema version.

Thanks,
Colin

You need to create three separate Voodoo Dolls, one of me, one for TTN and one for TTI - you are muddling up the community that is given this stuff for free, the members of the community who answer stuff on here and the benefactors, TTI, that create solutions based on commercial requirements with a side offering of input from the smaller developers (like myself, small in staff count, not in stature or kg). I’m not TTN’s perspective, but I’ve been around long enough of late to have a good handle on the bigger picture of just TTN and I’m prone to sharing my opinion.

But there are a few well known things. 1, you should PLAN for 10% packet loss but figure on it being 2% on a bad day and 2, you should not rely on the decoder, it is deliberately coded with a timeout so if you do too much or the servers are under pressure with multiple items being processed, they are dropped.

This isn’t opinion, this is in the docs. Hence real programmers decode on their own platform.

I think you’ll find that the number of people on here using the normaliseUplink is very close to zero. It’s not well publicised, it’s very much a corporate install sort of thing. TTN isn’t trying to protect anyone excepting we do need everyone to stick to the Fair Use Policy to keep the server loads manageable. TTI however are definitely not going to want to break a design based on their experience of deploying LW solutions which started about 10 minutes after it was born, so that’s a fair few years. They have many many T shirts, mostly from the annual conference they run each year that gets visitors from around the world.

As for you lodging a request, if it’s straight forward to fulfil then it will be included in an update in about two or three cycles - a cycle is a fortnight. However if you’re really on the ball, you could change the JSON doc and submit a PR - it will be reviewed and if found good, incorporated in the next update.

It would also be jolly useful if you could say what you think is missing so we have some tangibles - at present this is just a philosophical debate about who’s in charge of the spec, which is the money, which is definitely NOT us, so not much of a debate, but once the penny drops that we get many tens of thousands of £$€ of servers in three Amazon data centres with staff looking after those servers, all for free, then you’ll understand who to ask to get changes implemented.

For those that pay the bills to keep the lights on, aka the customers of TTI, they work on weeks if not months, they are deploying tens of hundreds if not thousands of sensors. They buy them in bulk and test them to destruction. More than enough time to get a new schema put in place.

And to that end, as you keep referring to for me and the consumers I work with, may I introduce @rish1 who will walk you thorough the commercial onboarding and can then direct your requirements for changes to the engineer who’d enact it for you as a new customer, because TTN is not for commercial use, just testing & community activities.

Thanks again, Nick. I appreciate the context you’ve shared about TTN, TTI, and the challenges around maintaining the platform, as well as your insight into the practical realities of its operation.

That said, I feel like my original questions about the mandatory schema validation for normalizeUplink() and the possibility of more flexibility (e.g., allowing user-defined schemas or bypassing validation) remain unanswered. While I understand the infrastructure and policy constraints you’ve outlined, my intention was to explore practical alternatives or clarify the rationale behind the current design for this feature.

I also want to point out that this discussion isn’t entirely philosophical. In my first message, I referred to a real issue I encountered with the SenseCAP S2120 device and its missing schema fields, which has already been discussed in a related thread. These missing schema elements are exactly what prompted me to start this conversation because they directly impacted my ability to use normalizeUplink() as intended. I believe this issue serves as a tangible example of the kind of challenges I was hoping to discuss here.

I don’t want to muddle the conversation further, but I’m genuinely curious if there’s space for these kinds of changes or if this is simply a “fixed point” in the system. I think this discussion could benefit others as well, so I hope we can keep the dialogue open. Thanks again for your time and perspective!

Hello @rish1,
I don’t want to spoil the fun here, but I think there’s been a misinterpretation of my usage of TTN. The confusion seems to stem from my use of the term “consumers,” which in this context refers to what’s consuming my data (e.g., my Elastic cluster), not people or entities in a commercial sense.

I can assure you that I’m not using TTN in any commercial way—I’m essentially prototyping my own environmental monitoring setup. If this ever evolves into a commercially viable project, I promise we’ll have that conversation when the time comes.

Thanks for understanding!

Colin

TTN has several default integrations available and some of them expect data in a certain format. Allowing users to define their own format will impact those integrations.

I believe I’ve been clear, post an issue or create a PR on GitHub - that’s what the rest of us do when we have a change in mind.

Be explicit - you’ve not said “I need X measurement with units of Z” - everyone here are volunteers, we just don’t have the time to figure out what you need from that device that’s currently missing, mostly because most of us don’t use the normalisedUplink

I’ve posted suggested changes, I have colleagues I work with closely that post changes we thing would be helpful, I see others post changes. Some are approved, some are refined before approval, some are rejected. If it make TTS better, it will result in changes.

To be blunt, stop talking and propose something tangible, copy it here but to actually get it actioned, create an issue on GitHub or create a PR.

Thanks, @kersing, for the heads-up—this is actually very helpful. I looked into integrations and searched related issues about normalization, and I came across this issue, which provides details on the rationale behind normalization.

Having a known data model/format will allow building integrations/apps using data coming from The Things Stack without caring about what device is providing that data, the units used…just caring about the capabilities/sensors that it has.

This makes sense. Sorry if this was obvious to everyone else—I’m still a noob on TTN, learning as I go.

My reference for trying to make sense of normalizeUplink() and its implementation in the PF was from the docs. Based on what I’ve learned, I’ll move forward with the following actions:

  • Open an issue to suggest improvements to the documentation to better explain the context and expectations of normalizeUplink().
  • Open an issue for possible additions to the schema for the SenseCAP 2120’s UV Index and Rain Gauge fields, which are currently unsupported.

Thanks again for pointing me in the right direction!

2 Likes
  • Open an issue to suggest improvements to the documentation to better explain the context and expectations of normalizeUplink().