Talend - Query & Insert from Impala
Github Project : example-talend-query-insert-from-impala
This article isn't available with Data Fabric >= 1.6 because it's not secure.
Solution :
- Use Hive componants
- Modify file of configuration in Talend
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository: With VALUES
Query from Impala (with Data Fabric <= v.1.5)
Example: Count the number of lines.
- Create a new job
- Add the component "tImpalaConnection" : Allows the creation of an Impala connection.
- Add the component "tImpalaInput": Reads an Impala table and extracts fields based on an Impala Query.
- Add the component "tLogRow': Display the result.
- Create links:
- "tImpalaConnection" is connected with "tImpalaInput" (through "OnSubjobOk")
- "tImpalaInput" is connected with "tLogRun" (through "Main")
- Double click on "tImpalaConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- Enter a port
- Enter a name of BDD
- Double click on the component "tImpalaInput":
- Tick "Use an existing connection"
- Click on "Edit a schema"
- Enter a variable "flow"
- Enter a variable "flow"
- Enter a table name
- Enter your request: " SELECT COUNT(*) FROM "+context.Table_Impala+" "
Use context variables if possible : "+context.Table_Impala+"
- Run the job
Insert from Impala (with Data Fabric <= v.1.5)
Example : Create a new table and insert ten rows (since old table in Impala)
- Create a new job
- Add the component "tImpalaConnection": Allows the creation of an Impala connection.
- Add the component "tImpalaRow": Excutes an Impala Query at each of Talend flow's iterations.
- Create links:
- "tImpalaConnection" is connected with "tImpalaRow" (through "OnSubjobOk")
- Double click on "tImpalaConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- Enter a port
- Enter a name of BDD
- Double click on the component "tImpalaInput":
- Tick "Use an existing connection"
- Enter a table name
- Enter a new table name
- Enter your request : "CREATE TABLE "+context.New_Table_Impala+" AS SELECT * FROM "+context.Table_Impala+" LIMIT 10"
Use context variables if possible : "+context.Table_Impala+"
- Run the job