SQOOP - Import data from Postresql
Preamble
This article describe how import data from Postresql database.
Be sure to have enought rights to access database.
Download data from table
Sqoop import
#Driver driver=postgresql #IP or DNS ip=127.0.0.1 #Port port= 5432 #User username=myuser #Password password=mypwd #Database database=mydb #Table table=mytable #Folder in HDFS hdfsdest=/user/hdfs/$table #Importation of tables sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Options
Option | Operation |
---|---|
--verbose | Show more logs |
--delete-target-dir | Delete HDFS directory before import |
--query "SELECT * FROM $table WHERE \$CONDITIONS" | Import data from a request instead of a table. The option --table must be removed from command line |
--num-mappers n (replace n by a number) | Cut the data flow in n data flow, n is the number of datanodes. This option must be used with option --split-by 'column' witch specify the column to split. |
--as-textfile | Import data in text mode |
--null-non-string '\\N' | Change null values of non-string type in '\\N'. Used for Impala and Hive |
--null-string '\\N' | Change null values of string type in '\\N'. Used for Impala and Hive |
--hive-import | Create a Hive/Impala table with imported data |
--hive-table $table | Specify the name of the Hive/Impala table |
--hive-overwrite | Overwrite the data of the table. If this option is not present data are concatenate |
--hive-database db (change db by your database) | Specify the name of the Hive/Impala database |
--hive-drop-import-delims | Drops \n, \r, and \01 from string fields when importing to Hive |