H2O installation in the R capsule
2 options are available, you can download H2O from internet every time you launch a job, or you can install it from HDFS to speed up the process, both options are described below
Option 1: Install from internet
pkgs <- c("RCurl","jsonlite") for (pkg in pkgs) { if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) } } install.packages("h2o", type="source", repos="http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/R") library(h2o)
Option 2: Install from HDFS
Download H2O from the following URL: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/h2o-3.20.0.1.zip
Unzip it, go to the R/ folder, and upload the file "h2o_3.20.0.1.tar.gz" the HDFS in the folder of your choice. Then use the following code in you script to install H2O:
# Install the package directly from hdfs. Replace webhdfs_ip by the correct value install.packages('http://webhdfs_ip:50070/webhdfs/v1/user/hdfs/h2o_3.20.0.1.tar.gz?op=OPEN', repos = NULL, type = 'source') library(h2o)
If the previous code does not work you can try the alternatives below:
# This line works in the R capsule and notebooks. Replace webhdfs_ip by the correct value download.file('http://webhdfs_ip:50070/webhdfs/v1/user/hdfs/h2o_3.20.0.1.tar.gz?op=OPEN', destfile = 'h2o_3.20.0.1.tar.gz') # This line is simpler but only works in the capsule # system('hdfs dfs -get /user/hdfs/h2o_3.20.0.1.tar.gz', intern = T) install.packages('h2o_3.20.0.1.tar.gz', repos = NULL, type = 'source') library(h2o)
Connection from R to H2O
# Replace the 'docker_adress' by the correct value h2o.connect(ip = 'docker_adress', port = 443, strict_version_check = T, https = T, insecure = T)
Creating a new model in H2O
# Import the dataset into H2O RAM iris_h2o <- as.h2o(iris) # Create a split for train and test dataset iris.split <- h2o.splitFrame(iris_h2o) train <- iris.split[[1]] test <- iris.split[[2]] # Create a Random forest model with our dataset as input rf <- h2o.randomForest(y = 'Species', training_frame = train, validation_frame = test) # Print the result in console rf # Results are also available in the H2O web interface, with more details than this simple print
Complete documentation on data import / processing and model creation is available here : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html