/
SQOOP - Import data from Postresql

SQOOP - Import data from Postresql

Preamble

This article describe how import data from Postresql database.

Be sure to have enought rights to access database.

Download data from table

Sqoop import
#Driver
driver=postgresql
#IP or DNS
ip=127.0.0.1
#Port
port= 5432
#User
username=myuser
#Password
password=mypwd
#Database
database=mydb
#Table
table=mytable
#Folder in HDFS
hdfsdest=/user/hdfs/$table

#Importation of tables
sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \
--target-dir $hdfsdest \
--num-mappers 1 \
--table $table



Options

OptionOperation
--verboseShow more logs
--delete-target-dirDelete HDFS directory before import
--query "SELECT * FROM $table WHERE \$CONDITIONS"Import data from a request instead of a table. The option --table must be removed from command line
--num-mappers n (replace n by a number)Cut the data flow in n data flow, n is the number of datanodes. This option must be used with option --split-by 'column' witch specify the column to split.
--as-textfileImport data in text mode
--null-non-string '\\N'Change null values of non-string type in '\\N'. Used for Impala and Hive
--null-string '\\N'Change null values of string type in '\\N'. Used for Impala and Hive
--hive-importCreate a Hive/Impala table with imported data
--hive-table $tableSpecify the name of the Hive/Impala table
--hive-overwriteOverwrite the data of the table. If this option is not present data are concatenate
--hive-database db (change db by your database)Specify the name of the Hive/Impala database
--hive-drop-import-delimsDrops \n, \r, and \01 from string fields when importing to Hive