Hadoop failover (and hopefully failback)

Posted Nov 13, 2008 Updated Aug 8, 2024

By Yossi Ittach

1 min read

We’ve decided to test using linux HeartBeat together with hadoop , to enavle failover (and failback) capacbilities.

Infrastructure: Take 3 Servers : A is the NN , B is the SNN and will be later used as NN , and a datanode (On C). The hadoop-site.xml file in borh A and B use THE SAME LOCATION as their SNN.

3 servers:
A - Hadoop NameNode - fs.checkpoint.dir: is configured to be on server B under fs.checkpoint.dir
B - Hadoop SNN - fs.checkpoint.dir: is configured to be local , under fs.checkpoint.dir
C - Hadoop DN

Scenarion 1: Failover

Run the regular (above) configuration.
Insert some files
Kill the NN on A.
Stop the DN on C (this is only required becuase we don’t use the Heartbeat yet).
(create a /usr/apps/hadoop/name dir on B and updtae the hadoop-site files on B and C)
Start the NN on B with the flag: haddop namenode -importCheckpoint
Start DN on C.
Check if all the relevent files exist.

Status: Works</p>

</span>Scenarion 2: Failback
continue from the previous scenario: (NN and SNN on B , DN on C , Nothing on A)

Insert some files
Kill the NN on B.
Stop the DN on C (this is only required becuase we don't use the Heartbeat yet).
(updtae the hadoop-site files on B and C)
Start the NN on A with the flag: haddop namenode -importCheckpoint
Start DN on C.
Check if all the relevent files exist.

Status: Fail (not back , just fail) .
The NN on A doesn't failback .It claims it already has a valid image file in it's local location - but then detects errors and shuts down.

Frameworks, Programming

Hadoop Hbase

This post is licensed under CC BY 4.0 by the author.

Trending Tags