R - Creating dynamic tables Hive
Github project : https://github.com/saagie/Create_Table_Hive_R
The R script to automatically create SQL tables Gross from an HDFS directory.
The script will create the database if it does not exist, then the script goes through all subdirectories of files, to create the raw Hive tables associated with the gz file of each subfolder.
The name of the table is the same as the sub-folder name.
To run the script:
- must upload 'Create_Table_Hive.tar' directly on the platform
- add this command:
Rscript Create_Table.R "http://IP_HDFS:PORT_HDFS/webhdfs/v1" "jdbc:hive2://IP_HIVE:PORT_HIVE/;ssl=false" "USER_HDFS" "PWD_HDFS" "NAME_BDD" "PATH_DIRECTORY" "SEPARATOR_FILE" "QUOTE_FILE"
Parameters
IP_HDFS: Internet Protocol of HDFS
PORT_HDFS: Port of HDFS
IP_HIVE: Internet Protocol of Hive
PORT_HIVE: Port of Hive
USER_HDFS: User of HDFS
PWD_HDFS: Password of HDFS
NAME_BDD: Name of database
PATH_DIRECTORY: path of the directoy
SEPARATOR_FILE: separator field in the files
QUOTE_FILE: quote field in the files