Github project : https://github.com/saagie/example-R-querying-impala/tree/master/JDBC
Dependencies
R package
rjdbc : https://cran.r-project.org/web/packages/RJDBC/index.html
This allows R to connect to any DBMS that has a JDBC driver.
Impala JDBC drivers
Download the Impala JDBC drivers
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
download.file('https://github.com/Mu-Sigma/RImpala/blob/master/impala-jdbc-cdh5.zip?raw=true', 'jdbc.zip',
method = 'curl', extra = '-L')
unzip('jdbc.zip') |
Parameters
impalaConnectionUrl : Url used to communicate with Impala through a datanode without authentication.
If no athentication, the format is as follows : "jdbc:hive2://datanode1dns:port/;auth=noSasl". The impala port is usually 21050.
Code explanation for querying impala
Loading JDBC driver and connection
Code Block |
---|
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver", classPath = list.files("impala_jdbc_folder_path-jdbc-cdh5", pattern = "jar$", full.names =T TRUE), identifier.quote = "`") impalaConnectionUrl <- "jdbc:hive2://datanode1dns:port/;auth=noSasl" conn <- dbConnect(drv, impalaConnectionUrl) |
Query and Insert : Examples
Show Tables
Code Block |
---|
# All databases dbListTables(conn) # The database "default" dbGetQuery(conn, "show tables") |
Get all elements of a table
Code Block |
---|
# In database "default" d <- dbReadTable(conn, "table_name") # OR d <- dbGetQuery(conn, "select * from table_name") # Other that the database "default") d <- dbReadTable(conn, nameBDD.table_name) |
Create a table in parquet format
Code Block |
---|
dbSendUpdate(conn, "CREATE TABLE table_name (attribute1 string, attribute2 int) STORED AS PARQUET") |
Insert data into a table
Code Block |
---|
dbGetQuery(conn, "INSERT INTO table_name VALUES ('test', 1)") |
Refresh Tables
Code Block |
---|
dbGetQuery(conn, "INVALIDATE METADATA") |