Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository: With VALUES
Query from Hive
Example: Count the number of lines.
- Create a new job
- Add the component "tHiveConnection" : Allows allows the creation of a Hive connection.)
- Add the component "tHiveInput" : Reads reads an Hivetable and extracts fields based on a Hive Query.)
- Add the component "tLogRow" : Display the result.displays results
- Create links:
- "tHiveConnection" is connected with "tHiveInput" (through "OnSubjobOk")
- "tHiveInput" is connected with "tLogRow" (through "Main")
- Double click on "tHiveConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- Enter a port
- Enter a name of BDD
- Enter a user
- Enter a NameNode URL NameNode
- Enter a user Hadoop
- Double click on the component "tImpalaInput":
- Tick "Use an existing connection"
- Enter a table name
- Enter your request: " SELECT COUNT(*) FROM "+context.Table_Hive+" "
Use context variables if possible: "+context.Table_Hive+"
- Run a the job
Insert from Hive
Example: Create a new table and insert ten rows (since old table in Impala)
- Create a new job
- Add the component "tHiveConnection": Allows allows the creation of a Hive connection.
- Add the component "tHiveRow": Executes executes an SQL query at each of Talend flow's iterations
- Create links:
- "tHiveConnection" is connected with "tHiveRow" (through "OnSubjobOk")
- Double click on "tHiveConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- Enter a port
- Enter a name of BDD
- Enter a user
- Enter a NameNode URL NameNode
- Enter a user Hadoop usert
- Double click on the component "tHiveRow":
- Tick "Use an existing connection"
- Enter a table name
- Enter your request: " CREATE TABLE sample2 AS SELECT * FROM "+context.Table_Hive+" LIMIT 10 "
Use context variables if possible : "+context.Table_Hive+"
- Run a the job