Talend - Download a file OpenData and write file with HDFS
Github Project :Â example-talend-download-file-opendata-and-write-file-with-HDFS
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository:Â With VALUES
Download a file Open Data and write file with HDFSÂ (with all versions of Data Fabric)
- Create a new job
- Add the component "HDFSConnection" : Allows the creation of a HDFS connection.
- Add the component "tREST" : Send a message for RESTful webservice and retrieve the response.
- Add the component "tHDFSOutput" : Writes data to HDFS.
- Create links:
- "HDFSConnection" is connected with "tREST" (through "OnSubjobOk")
- "tREST" is connected with "tHDFSOutput" (through "Main")
- Double click on "tHDFSConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.Â
The URL has to respect this format : "hdfs://ip_hdfs:port_hdfs/"
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/"Â - Add the user
- Uncheck "Use Datanode Hostname"
- Double click on "tREST" and enter a URL
- Double click on "tHDFSOutput" :
- Check "Use an existing connection"
- Enter a name file
- Run a job