Java - HDFS with High Availability

Some plateforms are configured with High Availability, providing 2 Namenodes : 1 active, 1 passive in order to provide high availability in case of failure. Learn more here.

If you wan to write a Java client that fully benefits from this feature, you need to specifiy in the Hadoop configuration of your application the following lines : 


Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("fs.defaultFS", "hdfs://cluster");
conf.set("fs.default.name", conf.get("fs.defaultFS"));
conf.set("dfs.nameservices", "cluster");
conf.set("dfs.ha.namenodes.cluster", "nn1,nn2");
conf.set("dfs.namenode.rpc-address.cluster.nn1", "<url_of_your_namenode_1>:8020");
conf.set("dfs.namenode.rpc-address.cluster.nn2", "<url_of_your_namenode_2>:8020");
conf.set("dfs.client.failover.proxy.provider.cluster","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");

The above configuration can be used as is. Simply replace the dfs.namenode.rpc-address.cluster.nn1 and dfs.namenode.rpc-address.cluster.nn2 parameters with the hostnames of your two namenodes.