Talend - Query & Insert from Hive
Github Project : example-talend-query-insert-from-hive
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository: With VALUES
Query from Hive (with all versions of Data Fabric)
Example: Count the number of lines.
Create a new job
Add the component "tHiveConnection" : allows the creation of a Hive connection
Add the component "tHiveInput" : reads an Hive table and extracts fields based on a Hive Query
Add the component "tLogRow" : displays results
Create links:
"tHiveConnection" is connected with "tHiveInput" (through "OnSubjobOk")
"tHiveInput" is connected with "tLogRow" (through "Main")
Double click on "tHiveConnection" and set its properties:
Add a "Cloudera" distribution and select the latest version of Cloudera
Enter a host
Enter a port
Enter a name of BDD
Enter a user
Enter a NameNode URL
Enter a user Hadoop
Enter a password
Double click on the component "tImpalaInput":
Tick "Use an existing connection"
Enter a table name
Enter your request: " SELECT COUNT(*) FROM "+context.Table_Hive+" "
Use context variables if possible: "+context.Table_Hive+"
Run the job
Insert from Hive (with all versions of Data Fabric)
Example: Create a new table and insert ten rows (since old table in Impala)
Create a new job
Add the component "tHiveConnection": allows the creation of a Hive connection.
Add the component "tHiveRow": executes an SQL query at each of Talend flow's iterations
Create links:
"tHiveConnection" is connected with "tHiveRow" (through "OnSubjobOk")
Double click on "tHiveConnection" and set its properties:
Add a "Cloudera" distribution and select the latest version of Cloudera
Enter a host
Enter a port
Enter a name of BDD
Enter a user
Enter a NameNode URL
Enter a user Hadoop
Enter a password
Double click on the component "tHiveRow":
Tick "Use an existing connection"
Enter a table name
Enter a new table name
Enter your request: "CREATE TABLE "+context.New_Table_Hive+" AS SELECT * FROM "+context.Table_Hive+" LIMIT 10 "
Use context variables if possible : "+context.Table_Hive+"
Run the job