Talend - Write files with HDFS
Github Project :Â example-talend-write-files-with-HDFS
Preamble
Configuration : Context
To create the different jobs displayed in this article, you have to create a repository :Â WITH VALUES
Write file on HDFSÂ (with all versions of Data Fabric)
- Create a new job
- Add the component "tHDFSConnection" : Allows the creation of a HDFS connection.
- Add the component "tFileInputDelimited": Reads a file located on your computer.
- Add the component "tHDFSOutput": Writes data to HDFS.
Create links:
"tHDFSConnection" is connected with "tFileInputDelimited" (through "OnSubjobOk")
"tFileInputDelimited" is connected with "tHDFSOutput" (through "Main")
- Double click on "tHDFSConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.Â
The URL has to respect this format : "hdfs://ip_hdfs:port_hdfs/"
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/"Â - Add the user
- Uncheck "Use Datanode Hostname"
- Double click on the component "tFileInputDelimited" :
- Add the name of the local file (with its path)
- If you want, tick ".csv" and set your options.
- Double click on the component "tHDFSOutput" :
- Click on "Edit a schema"
- Enter a variable "flow" in Input and Output
- Enter a variable "flow" in Input and Output
- Check "Use an existing connection"
- Enter the name of your file (in HDFS). If you want, you can change these options.Â
- Click on "Edit a schema"
- Run the job