Hadoop failover (and hopefully failback)
We’ve decided to test using linux HeartBeat together with hadoop , to enavle failover (and failback) capacbilities.
Infrastructure: Take 3 Servers : A is the NN , B is the SNN and will be later used as NN , and a datanode (On C). The hadoop-site.xml file in borh A and B use THE SAME LOCATION as their SNN.
3 servers:
A - Hadoop NameNode - fs.checkpoint.dir: is configured to be on server B under fs.checkpoint.dir
B - Hadoop SNN - fs.checkpoint.dir: is configured to be local , under fs.checkpoint.dir
C - Hadoop DN
Scenarion 1: Failover
- Run the regular (above) configuration.
- Insert some files
- Kill the NN on A.
- Stop the DN on C (this is only required becuase we don’t use the Heartbeat yet).
- (create a /usr/apps/hadoop/name dir on B and updtae the hadoop-site files on B and C)
- Start the NN on B with the flag: haddop namenode -importCheckpoint
- Start DN on C.
- Check if all the relevent files exist.
Status: Works</p>
</span>Scenarion 2: Failback
continue from the previous scenario: (NN and SNN on B , DN on C , Nothing on A)
- Insert some files
- Kill the NN on B.
- Stop the DN on C (this is only required becuase we don't use the Heartbeat yet).
- (updtae the hadoop-site files on B and C)
- Start the NN on A with the flag: haddop namenode -importCheckpoint
- Start DN on C.
- Check if all the relevent files exist.
Status: Fail (not back , just fail) .
The NN on A doesn't failback .It claims it already has a valid image file in it's local location - but then detects errors and shuts down.
Comments powered by Disqus.