Java - Read & Write files with HDFS
Github Project : example-java-read-and-write-from-hdfs
Common part
Maven Dependencies
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>HDFS URI
HDFS URI are like that : hdfs://namenodedns:port/user/hdfs/folder/file.csv
Default port is 8020.
Init HDFS FileSystem Object
// ====== Init HDFS File System Object
Configuration conf = new Configuration();
// Set FileSystem URI
conf.set("fs.defaultFS", hdfsuri);
// Because of Maven
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
// Set HADOOP user
System.setProperty("HADOOP_USER_NAME", "hdfs");
System.setProperty("hadoop.home.dir", "/");
//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(hdfsuri), conf);Init Subfolders
//==== Create folder if not exists
Path workingDir=fs.getWorkingDirectory();
Path newFolderPath= new Path(path);
if(!fs.exists(newFolderPath)) {
// Create new Directory
fs.mkdirs(newFolderPath);
logger.info("Path "+path+" created.");
}How to write a file to HDFS with Java?
Code example
//==== Write file
logger.info("Begin Write file into hdfs");
//Create a path
Path hdfswritepath = new Path(newFolderPath + "/" + fileName);
//Init output stream
FSDataOutputStream outputStream=fs.create(hdfswritepath);
//Cassical output stream usage
outputStream.writeBytes(fileContent);
outputStream.close();
logger.info("End Write file into hdfs");How to read a file from HDFS with Java?
Code example
//==== Read file
logger.info("Read file from hdfs");
//Create a path
Path hdfsreadpath = new Path(newFolderPath + "/" + fileName);
//Init input stream
FSDataInputStream inputStream = fs.open(hdfsreadpath);
//Classical input stream usage
String out= IOUtils.toString(inputStream, "UTF-8");
logger.info(out);
inputStream.close();
fs.close();