This is the second part of our three-part blog post series (see the first part here), which deals with incremental data updates. In our scenario we assume that we acquire small batches of data updates using some kind of web scraping mechanism. We will not deal with the details of that mechanism, as it is beyond the scope of this …
Graph analysis of Stack Overflow tags with Oracle PGX – Part 1: Data Engineering
Intoduction Oracle Parallel Graph Analytics (PGX) is a toolkit for graph analysis, both for running algorithms such as PageRank and for performing SQL-like pattern-matching against graphs. Extreme performance is offered through algorithm parallelization, and graphs can be loaded from a variety of sources such as flat files, SQL and NoSQL databases etc. So, in order to get a deeper feeling, …
Bulk load data to HBase in Oracle Big Data Appliance
I ran into an issue recently, while trying to bulk load some data to HBase in Oracle Big Data Appliance. Following is a reproducible description and solution using the current version of Oracle Big Data Lite VM (4.4.0). Enabling HBase in Oracle Big Data Lite VM (Feel free to skip this section if you do not use Oracle Big Data …
Cloudera Manager configuration issues in Oracle Big Data Lite VM 4.1.0
Oracle has recently announced the release of a new version (4.1.0) of its Big Data Lite VM. Compared to the previous release (4.0.1), we now have more recent versions of Oracle Enterprise Linux (6.5), Oracle NoSQL database (3.2.5), Cloudera distribution of Apache Hadoop (CDH 5.3.0) and Cloudera Manager (5.3.0). The new version of CDH, by itself, also brings forward several …