Talend - Query & Insert from Impala

This article isn't available with Data Fabric >= 1.6 because it's not secure. 

Solution : 

  • Use Hive componants
  • Modify file of configuration in Talend

Preamble

Configuration: Context

To create the different jobs displayed in this article, you have to create a repository: With VALUES

Query from Impala (with Data Fabric <= v.1.5)

Example: Count the number of lines.

  • Create a new job
  • Add the component "tImpalaConnection" : Allows the creation of an Impala connection.
  • Add the component "tImpalaInput": Reads an Impala table and extracts fields based on an Impala Query.
  • Add the component "tLogRow': Display the result.
  • Create links:
    • "tImpalaConnection" is connected with "tImpalaInput" (through "OnSubjobOk")
    • "tImpalaInput" is connected with "tLogRun" (through "Main")

  • Double click on "tImpalaConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter a host
    • Enter a port
    • Enter a name of BDD

  • Double click on the component "tImpalaInput":
    • Tick "Use an existing connection"
    • Click on "Edit a schema"
      • Enter a variable "flow"
    • Enter a table name
    • Enter your request: " SELECT COUNT(*) FROM "+context.Table_Impala+" "
      Use context variables if possible : "+context.Table_Impala+"

  • Run the job

Insert from Impala (with Data Fabric <= v.1.5)

Example : Create a new table and insert ten rows (since old table in Impala)

  • Create a new job
  • Add the component "tImpalaConnection": Allows the creation of an Impala connection.
  • Add the component "tImpalaRow": Excutes an Impala Query at each of Talend flow's iterations.
  • Create links:
    • "tImpalaConnection" is connected with "tImpalaRow" (through "OnSubjobOk")

  • Double click on "tImpalaConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter a host
    • Enter a port
    • Enter a name of BDD

  • Double click on the component "tImpalaInput":
    • Tick "Use an existing connection"
    • Enter a table name
    • Enter a new table name
    • Enter your request : "CREATE TABLE "+context.New_Table_Impala+" AS SELECT * FROM "+context.Table_Impala+" LIMIT 10"
      Use context variables if possible : "+context.Table_Impala+"

  • Run the job