Talend - Download a file OpenData and write file with HDFS

Preamble

Configuration: Context

To create the different jobs displayed in this article, you have to create a repository: With VALUES


Download a file Open Data and write file with HDFS (with all versions of Data Fabric)

  • Create a new job
  • Add the component "HDFSConnection" : Allows the creation of a HDFS connection.
  • Add the component "tREST" : Send a message for RESTful webservice and retrieve the response.
  • Add the component "tHDFSOutput" : Writes data to HDFS.
  • Create links:
    • "HDFSConnection" is connected with "tREST" (through "OnSubjobOk")
    • "tREST" is connected with "tHDFSOutput" (through "Main")

  • Double click on "tHDFSConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter the Name Node URL. 
      The URL has to respect this format : "hdfs://ip_hdfs:port_hdfs/"
      Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" 
    • Add the user
    • Uncheck "Use Datanode Hostname"

  • Double click on "tREST" and enter a URL
  • Double click on "tHDFSOutput" :
    • Check "Use an existing connection"
    • Enter a name file

  • Run a job