Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Preamble

Configuration : Context

To create the different jobs displayed in this article, you have to create a repository : With VALUES


Query from Hive

Example: Count the number of lines.

  • Create a new job
  • Add the component "tHiveConnection": Allows the creation of a Hive connection.
  • Add the component "tHiveInput" : Reads an Hivetable and extracts fields based on a Hive Query.
  • Add the component "tLogRow" : Display the result.
  • Create links:
    • "tHiveConnection" is connected with "tHiveInput" (through "OnSubjobOk")
    • "tHiveInput" is connected with "tLogRow" (through "Main")

  • Double click on "tHiveConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter a host
    • Enter a port
    • Enter a name of BDD
    • Enter a user
    • Enter a URL NameNode
    • Enter a user Hadoop

  • Double click on the component "tImpalaInput":
    • Tick "Use an existing connection"
    • Enter a table name
    • Enter your request: " SELECT COUNT(*) FROM "+context.Table_Hive+" "
      Use context variables if possible : "+context.Table_Hive+"

  • Run a job

Insert from Hive

Example : Create a new table and insert ten rows (since old table in Impala)

  • Create a new job
  • Add the component "tHiveConnection": Allows the creation of a Hive connection.
  • Add the component "tHiveRow" : Executes an SQL query at each of Talend flow's iterations
  • Create links:
    • "tHiveConnection" is connected with "tHiveRow" (through "OnSubjobOk")

  • Double click on "tHiveConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter a host
    • Enter a port
    • Enter a name of BDD
    • Enter a user
    • Enter a URL NameNode
    • Enter a user Hadoop

  • Double click on the component "tHiveRow":
    • Tick "Use an existing connection"
    • Enter a table name
    • Enter your request: " CREATE TABLE sample2 AS SELECT *  FROM "+context.Table_Hive+" LIMIT 10 "
      Use context variables if possible : "+context.Table_Hive+"

  • Run a job

Update from Hive

Example : 

  • No labels