I’m breaking my head on the storage-part of my application. Actually I can’t choose between which database to use for the storage of the iot-sensor data.
As I want to store different kind of sensors, with sometimes 1 temp sensor, soemtimes with an additional UV sensor etc etc. Programmatically I’ve built several device profiles with these parameters; but storing the data is not yet implemented.
I thought of several approaches:
As of the multitude of ‘schemas’ i was thinking of using a No-SQL solution like mongodb.
An other approach was using ‘normal’ databases and split all the different sensors in ‘types’: integers, float etc etc. And store these in seperate table’s in eg a PostrgreSQL database.
InfluxDB (never heard of it before last week); but as it describes itself as “The platform for time Series” sounds like a good fit.
So, in short: did anyone on this forum found him or herself in the same position to choose the right database?
As always it depends, InfluxDB is designed for ‘time series’ (metrics) on a large scale, performs really wel and can include ‘Grafana’ as a dashboard. InfluxDB downsample the data; keep the high precision raw data for only a limited time, and store the lower precision, summarized data for much longer or forever. I your application need the original data and not the summarized over time InfluxDB is not your solution.
If you need the exact data i would suggest Elasticsearch, it scales, its fast and have a beautiful dashboard grafana.
Can’t advise you on Mongo or other No-SQL due to lack of experience…
I would not recommend to use Elasticsearch a as a primary datastore. It has been designed as an analytics engine and has some drawbacks (which will be corrected).
InfluxDB sounds a cool competitor; but the fact it loses resolution over time is something to bear in mind.
I Also know elastic search as an ‘add-on’ to your database to have proper full text search and indexing. Never knew it also could act as primary storage. (with the limitations as told by jym)
Maybe @johan knows what the backend of the storage handler is going to be?
@jpmeijers What do you use for ttnmapper? Plain MySQL or postgresql?
I’m working on Ubuntu VM’s so not on RDS or similar; not yet that is.
Connecting to MQTT brokers, a simple UDP server for cellular connections, as well as a socketio server is handled in NodeJS.
The web-facing application itself is built in PHP.
Hi,
I’m a big fan of PostreSQL, since it offers very powerful JSONB columns where you can query and index JSON properties like in MongoDB. Plus you have all benefits of a relational database. I would always get started with the PostgreSQL and later think about moving some data into different storage, e.g. a better scalable key-value store.
As simple key-value database Accumulo is an option, for search features ElasticSearch or SOLR.
Hi Vannut,
Yes, we had exactly the same challenge a year ago. We eventually chose both a SQL and a NoSQL database for different purposes; SQL for structured use (reporting, analysis) and NoSQL for ‘near real time’ actions (geofencing etc).
What you need depends on your use case, so here are some of the things we ran into - i hope it helps!
We use the Orion context broker (https://hub.docker.com/r/fiware/orion/) which provides an http API and some handy functions like geofencing and publish/subscribe to changes. Under the hood it uses MongoDB. There is a mapper from TTN to Orion provided by a Dutch company: https://hub.docker.com/r/fiware/orion/ but we wrote about 30 lines of Groovy code which does the same for us. We use Orion/Mongo to store the ‘Current state’ of our sensors.
Pros: You can mess around easily and quickly without having to worry about data migrations and schemas. Ideal for prototypes and mvp’s. If you are using javascript or NodeJS, many no-SQL databases fit nicely because they accept JSON or BSON so you don’t have to map to a schema.
Cons: Your data quickly becomes a mess with duplicates and such. Problems like:
Oh we renamed that attribute last week but last months data with the old attribute name is still in there as well
Typos in programs not returning errors: attribute names dateReceived vs date_received vs date-received all accepted.
Then we have a SQL backed API for historical data.
Pros: you know exactly how the data is structured and how to relate different datasets to each other.
Cons:
new data structures require a schema change and extra code to create, read, update etc.
you need to know how to design a proper database schema, otherwise you will curse the database for not doing what you want it to and have to write really complex SQL statements to compensate. Which of course kill performance.
the less you know about your use-case, the harder it is to design your database
Thanks for the insightfull responses. We’re (also) choosing for a hybrid solution:
SQL (Postgresql in production, mysql in dev)
Containing all the structured data like users, nodes, teams etc etc
InfluxDB - Which handles the sensor data as time series.
The problem with loosing resolution over time is user-defineable by Retention Policies and Continous Queries