Java - Read & Write files with HDFS
Github Project : example-java-read-and-write-from-hdfs
Common part
Maven Dependencies
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency>
HDFS URI
HDFS URI are like that : hdfs://namenodedns:port/user/hdfs/folder/file.csv
Default port is 8020.
Init HDFS FileSystem Object
// ====== Init HDFS File System Object Configuration conf = new Configuration(); // Set FileSystem URI conf.set("fs.defaultFS", hdfsuri); // Because of Maven conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); // Set HADOOP user System.setProperty("HADOOP_USER_NAME", "hdfs"); System.setProperty("hadoop.home.dir", "/"); //Get the filesystem - HDFS FileSystem fs = FileSystem.get(URI.create(hdfsuri), conf);
Init Subfolders
//==== Create folder if not exists Path workingDir=fs.getWorkingDirectory(); Path newFolderPath= new Path(path); if(!fs.exists(newFolderPath)) { // Create new Directory fs.mkdirs(newFolderPath); logger.info("Path "+path+" created."); }
How to write a file to HDFS with Java?
Code example
//==== Write file logger.info("Begin Write file into hdfs"); //Create a path Path hdfswritepath = new Path(newFolderPath + "/" + fileName); //Init output stream FSDataOutputStream outputStream=fs.create(hdfswritepath); //Cassical output stream usage outputStream.writeBytes(fileContent); outputStream.close(); logger.info("End Write file into hdfs");
How to read a file from HDFS with Java?
Code example
//==== Read file logger.info("Read file from hdfs"); //Create a path Path hdfsreadpath = new Path(newFolderPath + "/" + fileName); //Init input stream FSDataInputStream inputStream = fs.open(hdfsreadpath); //Classical input stream usage String out= IOUtils.toString(inputStream, "UTF-8"); logger.info(out); inputStream.close(); fs.close();