In the just-released version 4.4.0 of Oracle Big Data Lite VM, as in the previous one (4.3.0.1), there is a rather large number of additional R packages to be installed by the provided script install_additional_packages.sh
, i.e. 28 packages without counting their dependencies (the respective number in version 4.2.1 was only 10).
Unfortunately, what has also changed is the form of the commands issued for installing these additional packages. Consider for example the package igraph
; while in the previous VM versions, the command in the script was
Rscript --verbose -e 'install.packages("igraph",repos="http://cran.us.r-project.org",dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library")'
the respective command now is
Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'
i.e. the packages are now referenced down to specific file names (including versions), with the argument repos
being now NULL
instead of "http://cran.us.r-project.org".
I suspect that the reason for this change is a kind of version control for the packages to be installed, since in the past there have been some issues, mainly due to the fact that Oracle R Distribution (ORD), lagging behind the latest version of GNU R, was sometimes incompatible with the latest versions of some R packages (package arules
was such an example). Nevertheless, the fact that now ORD is in version 3.2.0 seems to have not been taken into account here: in the VM, ORD still ships with an older version of the package arules
(1.1-9), despite the fact that the latest arules version (1.3-1 at the time of writing) is indeed supported by R 3.2.0. This, in turn, has implications on the dependent packages – in this case arulesViz
, which depends on arules
and it is included in the additional packages to be installed.
Anyway, whatever the reason, the net result of this change is that the majority of the 28 packages simply fail to install, for two different reasons. Some, like package igraph
, fail to install due to missing dependencies:
[oracle@bigdatalite scripts]$ Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")' running '/usr/lib64/R/bin/R --slave --no-restore -e install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")' trying URL 'http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz' Content type 'application/x-gzip' length 3328353 bytes (3.2 MB) ================================================== downloaded 3.2 MB ERROR: dependencies ‘magrittr’, ‘NMF’, ‘irlba’ are not available for package ‘igraph’ * removing ‘/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library/igraph’ Warning message: In install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz", : installation of package ‘/tmp/Rtmp6BgT76/downloaded_packages/igraph_1.0.1.tar.gz’ had non-zero exit status
while some others, like arulesViz
, fail due to a wrong URL provided (it should be https://cran.r-project.org/src/contrib/Archive/arulesViz/arulesViz_1.0-4.tar.gz, since 1.0-4 is not the current version):
[oracle@bigdatalite scripts]$ Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")' running '/usr/lib64/R/bin/R --slave --no-restore -e install.packages("http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")' trying URL 'http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz' Error in download.file(p, destfile, method, mode = "wb", ...) : cannot open URL 'http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz' In addition: Warning message: In download.file(p, destfile, method, mode = "wb", ...) : cannot open: HTTP status was '404 Not Found'
Why the missing dependencies? Well, it is simply due to the repos=NULL
argument in the used install.packages()
commands, which deactivates the dependencies
argument, as clearly mentioned in the documentation of install.packages()
:
dependencies logical indicating whether to also install uninstalled packages which these packages depend on/link to/import/suggest (and so on recursively). Not used if repos = NULL.
Here is a detailed table with the installation results for all 28 additional packages, along with the respective reason when installation fails (packages with an asterisk are already pre-installed, but they are included in the script nevertheless) – only 7 out of 28 packages are indeed successfully installed (i.e. the ones where the requested version is the latest one and have no dependencies):
[table] #, Package,Installed successfully?,Reason (if no), Latest CRAN version compatible with ORD 3.2?1,igraph,NO,Missing dependencies, Yes
2,arulesViz,NO,Wrong URL (404), Yes (requires update of arules)
3,tseries,NO,Missing dependencies, Yes
4,fracdiff,YES,,Yes
5,Rcpp,NO,Wrong URL (404),Yes
6,RcppArmadillo,NO,Wrong URL (404),Yes
7,nnet*,NO,Wrong URL (404),Yes
8,colorspace,YES,,Yes
9,timeDate,YES,,Yes
10,forecast,NO,Missing dependencies, Yes
11,sandwich,NO,Missing dependencies, Yes
12,gmm,NO,Missing dependencies, Yes
13,kernlab,NO,Wrong URL (404),Yes
14,nlme*,NO,Wrong URL (404),Yes
15,minqa,NO,Missing dependencies, Yes
16,nloptr,YES,,Yes
17,RcppEigen,NO,Wrong URL (404),Yes
18,lme4,NO,Wrong URL (404),Yes
19,glmnet,NO,Missing dependencies, Yes
20,RSNNS,NO,Missing dependencies, Yes
21,neuralnet,YES,,Yes
22,NeuralNetTools,NO,Wrong URL (404),Yes
23,assertthat,YES,,Yes
24,R6,NO,Wrong URL (404),Yes
25,lazyeval,YES,,Yes
26,BH,NO,Wrong URL (404),Yes
27,dplyr,NO,Missing dependencies, Yes
28,tidyr,NO,Wrong URL (404),Yes
[/table]
I have included the rightmost column in order to highlight the argument I made earlier: now, with ORD in version 3.2, there is no need for such tight version control of the additional packages (of course I only assume that this is the reason for the particular format of install.packages()
used here), and we can safely install the latest package versions available at CRAN. Hence, here is a way finally for installing the additional packages (we have included arules
to update it in its latest version, which is required by arulesViz
; also, we have omit Rcpp
, since it will be installed as a dependency of the other packages); first, save the following R script in the ~/scripts directory (name it additional_packages.R
):
pkgs = c("arules", "igraph", "arulesViz", "tseries", "fracdiff", "RcppArmadillo", "nnet", "colorspace", "timeDate", "forecast", "sandwich", "gmm", "kernlab", "nlme", "minqa", "nloptr", "RcppEigen", "lme4", "glmnet", "RSNNS", "neuralnet", "NeuralNetTools", "assertthat", "R6", "lazyeval", "BH", "dplyr", "tidyr") install.packages(pkgs, dependencies=TRUE, repos="http://cran.us.r-project.org", lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library", type="source")
Then, in the same folder, save the following bash script as additional_packages.sh
:
echo Configuring JAVA Environment for R sudo R CMD javareconf echo Installing additional packages Rscript --verbose 'additional_packages.R'
and make it executable with chmod +x additional_packages.sh
.
There is a certain advantage in installing all necessary packages in a single command as in our code above, instead of issuing separate Rscript
commands for each package: this way, all dependencies are handled globally, i.e. a package like Rcpp
, which is a dependency of more than one package, will only be downloaded and installed once (instead of once for every package of which is a dependency).
Here is a part of the output when running the bash script:
Warning: dependencies ‘graph’, ‘Rgraphviz’, ‘pbkrtest’ are not available also installing the dependencies ‘memoise’, ‘xtable’, ‘gtools’, ‘gdata’, ‘SparseM’, ‘MatrixModels’, ‘mime’, ‘optextras’, ‘bitops’, ‘whisker’, ‘rstudioapi’, ‘git2r’, ‘withr’, ‘curl’, ‘openssl’, ‘digest’, ‘crayon’, ‘praise’, ‘pkgmaker’, ‘registry’, ‘rngtools’, ‘stringr’, ‘gridBase’, ‘RColorBrewer’, ‘doParallel’, ‘plyr’, ‘munsell’, ‘labeling’, ‘TSP’, ‘qap’, ‘gclus’, ‘gplots’, ‘fma’, ‘expsmooth’, ‘quantreg’, ‘Formula’, ‘latticeExtra’, ‘acepack’, ‘gtable’, ‘gridExtra’, ‘evaluate’, ‘formatR’, ‘highr’, ‘markdown’, ‘yaml’, ‘ucminf’, ‘BB’, ‘Rcgmin’, ‘Rvmmin’, ‘setRNG’, ‘dfoptim’, ‘svUnit’, ‘iterators’, ‘htmltools’, ‘caTools’, ‘chron’, ‘jsonlite’, ‘rex’, ‘devtools’, ‘httr’, ‘pmml’, ‘XML’, ‘testthat’, ‘magrittr’, ‘NMF’, ‘irlba’, ‘igraphdata’, ‘rgl’, ‘ape’, ‘scales’, ‘scatterplot3d’, ‘vcd’, ‘seriation’, ‘iplots’, ‘quadprog’, ‘zoo’, ‘its’, ‘longmemo’, ‘urca’, ‘Rcpp’, ‘RUnit’, ‘pkgKitten’, ‘mvtnorm’, ‘dichromat’, ‘date’, ‘fpp’, ‘car’, ‘lmtest’, ‘strucchange’, ‘AER’, ‘stabledist’, ‘timeSeries’, ‘Hmisc’, ‘inline’, ‘knitr’, ‘PKPDmodels’, ‘MEMSS’, ‘ggplot2’, ‘mlmRev’, ‘optimx’, ‘gamm4’, ‘HSAUR2’, ‘numDeriv’, ‘foreach’, ‘lars’, ‘reshape2’, ‘caret’, ‘microbenchmark’, ‘pryr’, ‘rmarkdown’, ‘RSQLite’, ‘RMySQL’, ‘RPostgreSQL’, ‘data.table’, ‘Lahman’, ‘nycflights13’, ‘stringi’, ‘covr’, ‘gapminder’ [...] Warning messages: 1: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org", : installation of package ‘car’ had non-zero exit status 2: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org", : installation of package ‘AER’ had non-zero exit status 3: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org", : installation of package ‘caret’ had non-zero exit status
From the three packages reported as “not available”, graph
and Rgraphviz
reside in Bioconductor and not in CRAN, hence it is natural for the installation script to not be able to locate them; they are merely suggested by arulesViz
, so their absence is not critical (the interested reader can always install them from the Bioconductor repo, following the instructions posted there).
The third not available package, pbkrtest
, is the only one (out of about 150 packages we have just installed, including the dependencies) that indeed requires an R version later than ours (3.2.3), and it is also the root cause for the installation failure of car
, AER
, and caret
– none of which is included in our initial package list.-
- Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED - February 13, 2017
- Dynamically switch Keras backend in Jupyter notebooks - January 10, 2017
- sparklyr: a test drive on YARN - November 7, 2016