Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Complete documentation on data import / processing  and model creation is available here : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html

Installation

2 options are available, you can download H2O from internet every time you launch a job, or you can install it from HDFS to speed up the process, both options are described below

Option 1 (recommended): Install from

...

Code Block
pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
  if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
}
install.packages("h2o", type="source", repos="http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/R")
library(h2o)

...

HDFS

Automatic upload (recommended)

Run the script found here Upload H2O library to HDFS.

Manual upload

Download H2O from the following URL: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/h2o-3.20.0.12.zip

Unzip it, go to the R/ folder, and upload the file "h2o_3.20.0.12.tar.gz" the HDFS in the folder of your choice . Then use (recommended in /user/h2o/install_R/).

Install to R

Use the following code in you script to install H2O:

Code Block
# Install the package directly from hdfs. Replace webhdfs_ipnn1 by the correct value if needed
install.packages('http://webhdfs_ipnn1:50070/webhdfs/v1/user/hdfsh2o/install_R/h2o_3.20.0.12.tar.gz?op=OPEN', repos = NULL, type = 'source')

library(h2o)

...

Code Block
# This line works in the R capsule and notebooks. Replace webhdfs_ipnn1 by the correct value if needed
download.file('http://webhdfs_ipnn1:50070/webhdfs/v1/user/hdfs/h2o_3.20.0.12.tar.gz?op=OPEN', destfile = 'h2o_3.20.0.12.tar.gz')

# This line is simpler but only works in the capsule
# system('hdfs dfs -get /user/hdfs/h2o_3.20.0.12.tar.gz', intern = T)


install.packages('h2o_3.20.0.12.tar.gz', repos = NULL, type = 'source')
library(h2o)

Option 2: Install from internet

Code Block
pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
  if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
}
install.packages("h2o", type="source", repos="http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/R")
library(h2o)

Connection from R to H2O

Code Block
# Replace the 'docker_adress'ip by the correct value
h2o.connect(ip = 'docker_adressh2o_custom_url.internal.pX', port = 443, strict_version_check = T, https = T, insecure = T)

...

80)

Import data

From HDFS

Code Block
# Change the url as needed
iris_h2o <- h2o.importFile('hdfs://nn1:8020/user/h2o/data/iris/iris.csv')

From a local R object

Code Block
# ImportChange the dataseturl intoas H2O RAMneeded
iris_h2o <- as.h2o(iris)

_local)

Creating a new model in H2O

Code Block
# Create a split for train and test dataset
iris.split <- h2o.splitFrame(iris_h2o)
train <- iris.split[[1]]
test <- iris.split[[2]]

# Create a Random forest model with our dataset as input
rf <- h2o.randomForest(y = 'Species', training_frame = train, validation_frame = test)

# Print the result in console
rf

# Results are also available in the H2O web interface, with more details than this simple print

...

Saving model to HDFS

Code Block
# Change the url as needed
h2o.saveModel(rf, 'hdfs://nn1:8020/user/h2o/models/', force = T)