Java - HDFS with High Availability
Some plateforms are configured with High Availability, providing 2 Namenodes : 1 active, 1 passive in order to provide high availability in case of failure. Learn more here.
If you wan to write a Java client that fully benefits from this feature, you need to specifiy in the Hadoop configuration of your application the following lines :Â
Configuration conf = new Configuration(); conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); conf.set("fs.defaultFS", "hdfs://cluster"); conf.set("fs.default.name", conf.get("fs.defaultFS")); conf.set("dfs.nameservices", "cluster"); conf.set("dfs.ha.namenodes.cluster", "nn1,nn2"); conf.set("dfs.namenode.rpc-address.cluster.nn1", "<url_of_your_namenode_1>:8020"); conf.set("dfs.namenode.rpc-address.cluster.nn2", "<url_of_your_namenode_2>:8020"); conf.set("dfs.client.failover.proxy.provider.cluster","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
The above configuration can be used as is. Simply replace the dfs.namenode.rpc-address.cluster.nn1 and dfs.namenode.rpc-address.cluster.nn2 parameters with the hostnames of your two namenodes.Â