Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED

Starting from version 4.2, Oracle NoSQL now offers drivers for Node.js and Python, in addition to the existing ones for Java, C, and C++; this is good news for data science people, like myself, since we are normally not accustomed to code in Java or C/C++. So, I thought to build a short demo project, putting into test both the new drivers as well as some recently acquired skills on IoT development and Node-RED in particular.

In this post we will walk through building a demo application, in which we get processor temperature measurements from a Raspberry Pi and store them to a remotely-hosted Oracle NoSQL database using a Node-RED flow; we will also get a quick glimpse at the Python driver, using it to write rows in Oracle NoSQL outside the main Node-RED flow.

Prerequisites & system configuration

Here is what we will use:

Raspberry Pi 3
Oracle NoSQL v4.3.10 (hosted in an Oracle Big Data Lite VM instance)
Node.js v6.9.5 LTS
Node-RED v0.16.2 (you may have issues with older versions)
Python 2.7
The nosqldb-oraclejs Node.js package v4.3.10 (install with npm)
The nosqldb Python package v4.3.10 (install with pip)

Despite the claims in the documentation, we need Python 2.7 – Python 2.6 will not work (see the discussion here); Also, the host running Oracle NoSQL must be known by name to the other devices involved (the IP will not suffice – see the discussion here), so we need to add the following entry to Pi’s /etc/hosts file:

192.168.70.74   bigdatalite.localdomain

Overview & preliminaries

Our application will involve two tables in Oracle NoSQL: a parent table named devices, holding information about the device ID (MAC address), type, and location; and a child table named measurements, holding the timestamped measurements from a particular device. Here are the necessary DDL statements from the kvlite console:

[oracle@bigdatalite ~]$ java -jar $KVHOME/lib/kvstore.jar runadmin -port 5000 -host localhost
kv-> connect store -name kvstore
Connected to kvstore at localhost:5000.
kv-> execute "CREATE TABLE devices (
  ->  device_id STRING,
  ->  type STRING,
  ->  location STRING,
  ->  PRIMARY KEY (device_id)
  ->  )";
Statement completed successfully
kv-> execute "CREATE TABLE devices.measurements (
  ->  time STRING,
  ->  temperature DOUBLE,
  ->  PRIMARY KEY (time)
  ->  )";
Statement completed successfully
kv-> show tables;
Tables: 
	[...]
	devices
	  devices.measurements
	[...]

Notice that, although in the second DDL statement above it appears that our measurements table has only the time field as its primary key, the fact that it is a child table of devices means that the parent’s primary key is included , not only as a field, but as a primary key as well:

kv-> execute "DESCRIBE AS JSON TABLE devices.measurements"
{
  "type" : "table",
  "name" : "measurements",
  "owner" : null,
  "parent" : "devices",
  "shardKey" : [ "device_id" ],
  "primaryKey" : [ "device_id", "time" ],
  "fields" : [ {
    "name" : "device_id",
    "type" : "STRING",
    "nullable" : false,
    "default" : null
  }, {
    "name" : "time",
    "type" : "STRING",
    "nullable" : false,
    "default" : null
  }, {
    "name" : "temperature",
    "type" : "DOUBLE",
    "nullable" : true,
    "default" : null
  } ]
}

This makes our demo project scalable in a real situation with possibly hundreds of devices, since the combination of MAC address (device_id) and timestamp will always be unique, hence there is no danger of record loss due to primary key duplication. This is also the reason why we keep our time field as type STRING instead of TIMESTAMP, since the latter type cannot be used as a table primary key in Oracle NoSQL.

Get the MAC address of the Pi

Getting the MAC address of our Pi is straightforward in Node.js:

pi@raspberrypi:~ $ node
> require('os');
[...]
> tt = os.networkInterfaces()
{ lo: 
   [ { address: '127.0.0.1',
       netmask: '255.0.0.0',
       family: 'IPv4',
       mac: '00:00:00:00:00:00',
       internal: true },
     { address: '::1',
       netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
       family: 'IPv6',
       mac: '00:00:00:00:00:00',
       scopeid: 0,
       internal: true } ],
  eth0: 
   [ { address: '192.168.70.73',
       netmask: '255.255.255.0',
       family: 'IPv4',
       mac: 'b8:27:eb:0a:0e:e8',
       internal: false },
     { address: 'fe80::7a93:3806:deea:da60',
       netmask: 'ffff:ffff:ffff:ffff::',
       family: 'IPv6',
       mac: 'b8:27:eb:0a:0e:e8',
       scopeid: 2,
       internal: false } ] }
> tt.eth0[0].mac
'b8:27:eb:0a:0e:e8'

Proxy server and Python driver

Before proceeding to the Node-RED flow for capturing and storing our measurements, notice that we now have enough information to populate our parent table devices with the first entry. For test & demonstration purposes, we choose to do it using the Python driver.

Both Python & Node.js drivers require a lightweight proxy server in order to communicate with Oracle NoSQL; the necessary jar files are already included in the respective packages nosqldb (Python) and nosqldb-oraclejs (Node.js), so we first need to provide their location in the CLASSPATH environment variable (empty by default in Pi) and then start the proxy:

pi@raspberrypi:~ $ export CLASSPATH="/usr/local/lib/python2.7/dist-packages/nosqldb/kvproxy/lib/*"
pi@raspberrypi:~ $ java oracle.kv.proxy.KVProxy -port 7010 -helper-hosts 192.168.70.74:5000 -store kvstore
Feb 08, 2017 4:35:55 PM oracle.kv.proxy.KVProxy <init>
INFO: PS: Starting KVProxy server
Feb 08, 2017 4:35:55 PM oracle.kv.proxy.KVProxy <init>
INFO: PS: Connect to Oracle NoSQL Database kvstore nodes : 192.168.70.74:5000
Feb 08, 2017 4:35:56 PM oracle.kv.proxy.KVProxy <init>
INFO: PS:   ... connected successfully
Feb 08, 2017 4:35:58 PM oracle.kv.proxy.KVProxy startServer
INFO: PS: Starting listener ( Half-Sync/Half-Async server - 20 no of threads on port 7010, version 4.3.10, protocol version 3)

We have prepared two separate Python scripts – one for connecting to the store (nosqldb_connect.py) and one for inserting a row to the parent table devices (nosqldb_write_parent.py); here they are:

# nosqldb_connect.py

import logging, sys, os

# set logging level to debug and log to stdout
def setup_logging():
    logger = logging.getLogger("nosqldb")
    logger = logging.StreamHandler(sys.stdout)
    logger.setLevel(logging.DEBUG)
    formatter = logging.Formatter('\t%(levelname)s - %(message)s')
    logger.setFormatter(formatter)
    rootLogger.addHandler(logger)
    
from nosqldb import ConnectionException
from nosqldb import IllegalArgumentException
from nosqldb import Factory
from nosqldb import StoreConfig
from nosqldb import ProxyConfig

storehost = "bigdatalite.localdomain:5000"
proxy = "localhost:7010"

# configure and open the store
def open_store():
    try:
       kvstoreconfig = StoreConfig('kvstore', [storehost])
       kvproxyconfig = ProxyConfig()
       print 'Store connection open'
       return Factory.open(proxy, kvstoreconfig)
    except ConnectionException, ce:
       logging.error("Store connection failed.")
       logging.error(ce.message)
       sys.exit(-1)
       
# follow by
# store = open_store()

# nosqldb_write_parent.py

row_parent = { 'device_id' : 'b8:27:eb:0a:0e:e8',
               'type'      : 'Raspberry Pi 3',
               'location'  : 'office' }

try:
    store.put("devices", row_parent)
    print "Store write succeeded."
    logging.debug("Store write succeeded.")
except IllegalArgumentException, iae:
    logging.error("Could not write to table.")
    logging.error(iae.message)

# follow by	
# store.close()

Let us now run them from a Python console in our Pi:

>>> execfile('/home/pi/scripts/nosqldb_connect.py')
>>> store = open_store()
Store connection open
>>> execfile('/home/pi/scripts/nosqldb_write_parent.py')
Store write succeeded.
>>> store.close()

and confirm that we indeed have one row inserted at the parent table devices in kvstore:

kv-> execute 'SELECT * FROM devices';
 +-------------------+----------------+----------+
 |    device_id      |      type      | location |
 +-------------------+----------------+----------+
 | b8:27:eb:0a:0e:e8 | Raspberry Pi 3 | office   |
 +-------------------+----------------+----------+

1 row returned

Looks we are OK!

We have kept the scripts for connection and writing to Oracle NoSQL deliberately distinct; that way, the writing script can be easily modified in order, say, to read the MAC addresses and locations of new devices programatically from a text file, and even to call the connection script once in the beginning with execfile().

The main Node-RED flow

Having finished with the preliminaries and the short demonstration of the Python driver, we now turn to the main task of our demo project, i.e. constructing a flow in Node-RED which will get processor temperature measurements from our Pi every 5 seconds and store them in the Oracle NoSQL store. Here is the flow schematic in Node-RED:

Node-RED flow

The unconnected sub-flow in the lower part is executed only once in the beginning, and it is needed in order to automatically get the MAC address of our Pi and use it in the measurements transmitted to Oracle NoSQL.

Recall how we got the Pi MAC address above, using require('os')? Unfortunately, the require() command is not available for Node-RED nodes; what we have to do is modify accordingly the functionGlobalContext in the /home/pi/.node-red/settings.js file, so as to make any such additional requirements available to Node-RED through its global context. And apart from os, we have to include also the nosqldb-oraclejs package. In my case, the said function begins at line 159 of the file, and here is how it must be modified (highlighted lines):

functionGlobalContext: {
        os:require('os'),
        nosqldb:require('nosqldb-oraclejs'),
        // octalbonescript:require('octalbonescript'),
        // jfive:require("johnny-five"),
        // j5board:require("johnny-five").Board({repl:false})
        crc:require("crc")
    },

We have to restart Node-RED in order for the changes to be applied. Notice that if you use an old version of Node-RED, you may encounter some issues here with nosqldb-oraclejs; but with Node-RED 0.16.2 (latest version at the time of writing), everything runs smoothly:

pi@raspberrypi:~ $ node-red-start

Start Node-RED

[...]

Started Node-RED graphical event wiring tool..
Welcome to Node-RED
===================
8 Feb 19:42:56 - [info] Node-RED version: v0.16.2
8 Feb 19:42:56 - [info] Node.js  version: v6.9.5
8 Feb 19:42:56 - [info] Linux 4.1.18-v7+ arm LE
8 Feb 19:42:56 - [info] Loading palette nodes
pi : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/python -u /usr/lib/node_modules/node-red/nodes/core/hardware/nrgpio.py info
pam_unix(sudo:session): session opened for user root by (uid=0)
pam_unix(sudo:session): session closed for user root
8 Feb 19:42:59 - [info] Settings file  : /home/pi/.node-red/settings.js
8 Feb 19:42:59 - [info] User directory : /home/pi/.node-red
8 Feb 19:42:59 - [info] Flows file : /home/pi/.node-red/flows_raspberrypi.json
8 Feb 19:42:59 - [info] Server now running at http://127.0.0.1:1880/
8 Feb 19:42:59 - [info] Starting flows
8 Feb 19:42:59 - [info] Started flows

Getting the Pi MAC address in Node-RED

Here is the Node.js code inside our get_mac node:

// We need to include 'os' in functionGlobalContext
// located at /home/pi/settings.js
var tt = global.get('os').networkInterfaces();
global.set('mac', tt.eth0[0].mac);

Notice the difference in accessing the os module compared with our plain Node.js code above: now os is included in our global context, and we access it using global.get('os').

The `format` node

Here is the code inside our format node:

var temp = parseFloat(msg.payload.replace( /[^\d\.]*/g, ''));
var time = new Date().toISOString().replace(/\..+/, ''); 
var mac = global.get('mac');

msg.payload = {'device_id'    : mac,
               'measurements' : [ { 'time'        : time }, 
                                  { 'temperature' : temp } ]
};

return msg;

As the name implies, its role is simply to parse the temperature and time measurements, bundle them with the global variable mac (set in the node get_mac), and pass the data along to the next node write_db.

The `write_db` node

This is where we invoke the Node.js driver for connecting and writing to the Oracle NoSQL store; here is the node code:

var nosqldb = global.get('nosqldb');

// Create a configuration object
var configuration = new nosqldb.Configuration();
configuration.proxy.startProxy = true;
configuration.proxy.host = 'localhost:7010';
configuration.storeHelperHosts = ['192.168.70.74:5000', 'bigdatalite.localdomain:5000'];
configuration.storeName = 'kvstore';

// Create a store with the specified configuration
var store = nosqldb.createStore(configuration);

// Create the row to be inserted into the child table 'devices.measurements'
var childRow = {'device_id'    : msg.payload.device_id,
                'time'         : msg.payload.measurements[0].time,
                'temperature'  : msg.payload.measurements[1].temperature
};

// Define the insert action
store.on('open', function () {
         console.log('Store opened.');
         
         store.put('devices.measurements', childRow,
                   function (err) {
                               if (err)
                                  throw err;
                               else
                                  console.log("Child row inserted.");
});
   }).on('error', function(error) {
          console.log('Error in the store.');
          console.log(error);
});

// Perform the row insert
store.open();

// for diagnostics in Node-RED only:
msg.payload = childRow; 
return msg;

A nice feature of the Node.js driver (currently missing from the Python one) is that we don’t need to manually start the proxy server; it suffices to set configuration.proxy.startProxy = true in the code, as in line 5 of the above snippet.

We are now ready to start inserting measurements from our Raspberry Pi into Oracle NoSQL; deploying the flow from the Node-RED web UI, we get the following info at the console:

9 Feb 16:49:18 - [info] Starting flows
9 Feb 16:49:18 - [info] Started flows
Store opened.
Child row inserted.
Store opened.
Child row inserted.
Store opened.
Child row inserted.
Store opened.
Child row inserted.

At which point we can confirm that we have already 4 records in the devices.measurements table, with 5 seconds separation between them (the trigger time interval in our flow, defined in the inject node):

kv-> execute 'SELECT * FROM devices.measurements';
 +-------------------+---------------------+-------------+
 |     device_id     |        time         | temperature |
 +-------------------+---------------------+-------------+
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:23 |        36.9 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:28 |        40.1 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:33 |        38.5 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:38 |        37.9 |
 +-------------------+---------------------+-------------+

4 rows returned

Deploy the flow to more devices

It seems we are set. But there is one final step: we have shown how to connect to and get measurements from just one Raspberry Pi device; what if we had 50 (or 500…) such devices? Would we have to connect manually to each one of them with a browser, load the flow in Node-RED GUI, and press ‘Deploy’?

Fortunately no. Although rather buried and obscured in the documentation, there is a straightforward way for deploying a flow to a remote machine running Node-RED via a REST API.

So, as a last demonstration, here is how we can deploy the flow remotely to a second Raspberry Pi located, say, in the roof; first, we get its MAC address, and we use the Python driver as above to register it to our parent table devices, so now it looks like that:

kv-> execute 'SELECT * FROM devices';
 +-------------------+----------------+----------+
 |     device_id     |      type      | location |
 +-------------------+----------------+----------+
 | b8:27:eb:00:98:1b | Raspberry Pi 3 | roof     |
 | b8:27:eb:0a:0e:e8 | Raspberry Pi 3 | office   |
 +-------------------+----------------+----------+

2 rows returned

Then, after we have saved our flow in a file nosqldb_demo.json in a remote machine (here we use the Oracle Big Data Lite VM), we send a POST request to the Node-RED service of our second Pi like this:

[oracle@bigdatalite scripts]$ curl -v -X POST http://192.168.70.77:1880/flows -H "Content-Type: application/json"  --data @nosqldb_demo.json
* About to connect() to 192.168.70.77 port 1880 (#0)
*   Trying 192.168.70.77... connected
* Connected to 192.168.70.77 (192.168.70.77) port 1880 (#0)
> POST /flows HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: 192.168.70.77:1880
> Accept: */*
> Content-Type: application/json
> Content-Length: 3517
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 204 No Content
< X-Powered-By: Express
< Date: Mon, 13 Feb 2017 10:49:10 GMT
< Connection: keep-alive
< 
* Connection #0 to host 192.168.70.77 left intact
* Closing connection #0

We see that we get a 204 response (highlighted) meaning success. Here is the Node-RED console output in this remote Pi:

13 Feb 12:49:10 - [info] Starting flows
13 Feb 12:49:10 - [info] Started flows
Store opened.
Child row inserted.
Store opened.
Child row inserted.
Store opened.
Child row inserted.

i.e. 3 new records added to our devices.measurements table, which we can easily verify in the kvlite console:

kv-> execute 'SELECT * FROM devices.measurements';
 +-------------------+---------------------+-------------+
 |     device_id     |        time         | temperature |
 +-------------------+---------------------+-------------+
 | b8:27:eb:00:98:1b | 2017-02-13T10:49:15 |        39.7 |
 | b8:27:eb:00:98:1b | 2017-02-13T10:49:20 |        42.9 |
 | b8:27:eb:00:98:1b | 2017-02-13T10:49:25 |        40.2 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:23 |        36.9 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:28 |        40.1 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:33 |        38.5 |
 | b8:27:eb:0a:0e:e8 | 2017-02-09T14:49:38 |        37.9 |
 +-------------------+---------------------+-------------+

7 rows returned

Summary

We have demonstrated a complete and scalable end-to-end application for storing data from multiple Raspberry Pi devices into Oracle NoSQL via Node-RED; to the best of our knowledge, this is the first demonstration of a connection to Oracle NoSQL using Node-RED. Despite the fact that the Node.js driver for Oracle NoSQL was probably not built with Node-RED in mind, we can happily report that it indeed works smoothly with Node-RED.

As always, comments and remarks most welcome.-

Author
Recent Posts

Christos - Iraklis Tsatsoulis

Christos - Iraklis is one of our resident Data Scientists. He holds advanced graduate degrees in applied mathematics, engineering, and computing. He has been awarded both Chartered Engineer and Chartered Manager status in the UK, as well as Master status in Kaggle.com due to "consistent and stellar results" in predictive analytics contests.

Latest posts by Christos - Iraklis Tsatsoulis (see all)

Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED - February 13, 2017
Dynamically switch Keras backend in Jupyter notebooks - January 10, 2017
sparklyr: a test drive on YARN - November 7, 2016