R - Creating dynamic tables Hive

Github project : https://github.com/saagie/Create_Table_Hive_R


The R script to automatically create SQL tables Gross from an HDFS directory.

The script will create the database if it does not exist, then the script goes through all subdirectories of files, to create the raw Hive tables associated with the gz file of each subfolder.

The name of the table is the same as the sub-folder name.

To run the script:

- must upload 'Create_Table_Hive.tar' directly on the platform
- add this command:

Rscript Create_Table.R "http://IP_HDFS:PORT_HDFS/webhdfs/v1" "jdbc:hive2://IP_HIVE:PORT_HIVE/;ssl=false" "USER_HDFS" "PWD_HDFS" "NAME_BDD" "PATH_DIRECTORY" "SEPARATOR_FILE" "QUOTE_FILE"


Parameters

  • IP_HDFS: Internet Protocol of HDFS
  • PORT_HDFS: Port of HDFS
  • IP_HIVE: Internet Protocol of Hive
  • PORT_HIVE: Port of Hive
  • USER_HDFS: User of HDFS
  • PWD_HDFS: Password of HDFS
  • NAME_BDD: Name of database
  • PATH_DIRECTORY: path of the directoy
  • SEPARATOR_FILE: separator field in the files
  • QUOTE_FILE: quote field in the files