/
Talend - Read files with HDFS
Talend - Read files with HDFS
Github Project : example-talend-read-files-with-hdfs
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository: With VALUES
Read a file from HDFS (In console) (with all versions of Data Fabric)
- Create a new job
- Add the component "tHDFSConnection" : Allows the creation of a HDFS connection.
- Add the component "tHDFSInput": Read a file in the HDFS.
- Add the component "tLogRow': Display the result.
- Create links:
- "tHDFSConnection" is connected with "tHDFSInput" (through "OnSubjobOk")
- "tHDFSInput" is connected with "tLogRun" (through "Main")
- Double click on "tHDFSConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.
The URL has to respect this format : "hdfs://ip_hdfs:port_hdfs/"
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" - Add the user
- Uncheck "Use Datanode Hostname"
- Double click on the component "tHDFSInput" :
- Click on "Edit a schema"
- Enter a variable "flow"
- Enter a variable "flow"
- Tick "Use an existing connection"
- Enter a file name
- Click on "Edit a schema"
- Run the job
Copy a file from HDFS to local computer (with all versions of Data Fabric)
- Create a new job
- Add the component "tHDFSConnection" : Allows the creation of a HDFS connection.
- Add the component "tHDFSGet": Copy the HDFS file in the local directory.
- Create links:
- "tHDFSConnection" is connected with "tHDFSGet" (through "OnSubjobOk")
- Double click on "tHDFSConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.
The URL has to respect this format : "hdfs://ip_hdfs:port_hdfs/"
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" - Add the user
- Uncheck "Use Datanode Hostname"
- Double Click on the component "tHDFSGet" :
- Tick "Use an existing connection"
- Add a HDFS folder
- Add a local folder
- Add a mask and set a new file name if needed
- Run the job