Talend - Query & Insert from Hive

Talend - Query & Insert from Hive

Preamble

Configuration: Context

To create the different jobs displayed in this article, you have to create a repository: With VALUES

Query from Hive (with all versions of Data Fabric)

Example: Count the number of lines.

  • Create a new job

  • Add the component "tHiveConnection" : allows the creation of a Hive connection

  • Add the component "tHiveInput" : reads an Hive table and extracts fields based on a Hive Query

  • Add the component "tLogRow" : displays results

  • Create links:

    • "tHiveConnection" is connected with "tHiveInput" (through "OnSubjobOk")

    • "tHiveInput" is connected with "tLogRow" (through "Main")

  • Double click on "tHiveConnection" and set its properties:

    • Add a "Cloudera" distribution and select the latest version of Cloudera

    • Enter a host

    • Enter a port

    • Enter a name of BDD

    • Enter a user

    • Enter a NameNode URL

    • Enter a user Hadoop

    • Enter a password

  • Double click on the component "tImpalaInput":

    • Tick "Use an existing connection"

    • Enter a table name

    • Enter your request: " SELECT COUNT(*) FROM "+context.Table_Hive+" "
      Use context variables if possible: "+context.Table_Hive+"

  • Run the job

Insert from Hive (with all versions of Data Fabric)

Example: Create a new table and insert ten rows (since old table in Impala)

  • Create a new job

  • Add the component "tHiveConnection": allows the creation of a Hive connection.

  • Add the component "tHiveRow": executes an SQL query at each of Talend flow's iterations

  • Create links:

    • "tHiveConnection" is connected with "tHiveRow" (through "OnSubjobOk")

  • Double click on "tHiveConnection" and set its properties:

    • Add a "Cloudera" distribution and select the latest version of Cloudera

    • Enter a host

    • Enter a port

    • Enter a name of BDD

    • Enter a user

    • Enter a NameNode URL

    • Enter a user Hadoop

    • Enter a password

  • Double click on the component "tHiveRow":

    • Tick "Use an existing connection"

    • Enter a table name

    • Enter a new table name

    • Enter your request: "CREATE TABLE "+context.New_Table_Hive+" AS SELECT *  FROM "+context.Table_Hive+" LIMIT 10 "
      Use context variables if possible : "+context.Table_Hive+"

  • Run the job