Github Project : example-talend-high-availability
This article is for using Talend on an HDFS with high availability option. The particularity of high availability is to have two namenodes for one HDFS, in case of failure.
The aim of this job is to work both in classical HDFS and high availability HDFS.
Create a group of context with 2 contexts. You can create with only one context and change variable value in commande line with --context_param option.
In this example DEV have no high availability and PROD have high availability.
The URI for the namenode in DEV is made with the namenode DNS and the port. In PROD the name of HDFS is cluster.
Add 5 properties :
Properties | Value |
---|---|
dfs.nameservices | cluster |
dfs.ha.namenodes.cluster | nn1,nn2 |
dfs.client.failover.proxy.provider.cluster | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider |
dfs.namenode.rpc-address.cluster.nn1 | nn1.p1.saagie.prod.saagie.io:8020 |
dfs.namenode.rpc-address.cluster.nn2 | nn2.p1.saagie.prod.saagie.io:8020 |
To know the names of nn1 and nn2 for dfs.namenode.rpc-address.cluster.nn1 & dfs.namenode.rpc-address.cluster.nn2 create a Sqoop job, type hostname and run.