Java - Read & Write files with HDFS

Github Project : example-java-read-and-write-from-hdfs

Common part

Maven Dependencies

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
 </dependency>

HDFS URI

HDFS URI are like that : hdfs://namenodedns:port/user/hdfs/folder/file.csv

Default port is 8020.

Init HDFS FileSystem Object

// ====== Init HDFS File System Object
Configuration conf = new Configuration();
// Set FileSystem URI
conf.set("fs.defaultFS", hdfsuri);
// Because of Maven
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
// Set HADOOP user
System.setProperty("HADOOP_USER_NAME", "hdfs");
System.setProperty("hadoop.home.dir", "/");
//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(hdfsuri), conf);

Init Subfolders

//==== Create folder if not exists
Path workingDir=fs.getWorkingDirectory();
Path newFolderPath= new Path(path);
if(!fs.exists(newFolderPath)) {
   // Create new Directory
   fs.mkdirs(newFolderPath);
   logger.info("Path "+path+" created.");
}

How to write a file to HDFS with Java?

Code example

//==== Write file
logger.info("Begin Write file into hdfs");
//Create a path
Path hdfswritepath = new Path(newFolderPath + "/" + fileName);
//Init output stream
FSDataOutputStream outputStream=fs.create(hdfswritepath);
//Cassical output stream usage
outputStream.writeBytes(fileContent);
outputStream.close();
logger.info("End Write file into hdfs");

How to read a file from HDFS with Java?

Code example

//==== Read file
logger.info("Read file from hdfs");
//Create a path
Path hdfsreadpath = new Path(newFolderPath + "/" + fileName);
//Init input stream
FSDataInputStream inputStream = fs.open(hdfsreadpath);
//Classical input stream usage
String out= IOUtils.toString(inputStream, "UTF-8");
logger.info(out);
inputStream.close();
fs.close();