Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Current »

Dependencies

R package

rjdbc : https://cran.r-project.org/web/packages/RJDBC/index.html

This allows R to connect to any DBMS that has a JDBC driver.

Impala JDBC drivers

Download the Impala JDBC drivers

Parameters

impalaConnectionUrl : Url used to communicate with Impala through a datanode without authentication.

If no athentication, the format is as follows : "jdbc:hive2://datanode1dns:port/;auth=noSasl". The impala port is usually 21050.

Code explanation for querying impala

Loading JDBC driver and connection

drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
            classPath = list.files("impala_jdbc_folder_path",pattern="jar$",full.names=T),
            identifier.quote="`")
impalaConnectionUrl <- "jdbc:hive2://datanode1dns:port/;auth=noSasl"
conn <- dbConnect(drv, impalaConnectionUrl)

Query and Insert : Examples

Show Tables

# All databases
dbListTables(conn) 
# The database "default"
dbGetQuery(conn, "show tables")

Get all elements of a table

# In database "default"
d <- dbReadTable(conn, "table_name")
# OR
d <- dbGetQuery(conn, "select * from table_name")

# Other that the database "default")
d <- dbReadTable(conn, nameBDD.table_name)

Create a table in parquet format

dbSendUpdate(conn, "CREATE TABLE table_name (attribute1 string, attribute2 int) STORED AS PARQUET")

Insert data into a table

dbGetQuery(conn, "INSERT INTO table_name VALUES ('test', 1)")

Refresh Tables

dbGetQuery(conn, "INVALIDATE METADATA")



  • No labels