When will hadoop NN recognize a failure?

Posted Nov 16, 2008 Updated Aug 8, 2024

By Yossi Ittach

1 min read

I’ve been trying to cause the NN to recognize that it is inconsistent and turn to the SNN for data.

Scenario 1: Will the NN Recognize that its data is not up-to-date? (No)

Started with NN on server A , SNN on B , DN on C
Enter 3 files (testFileX.test)
Wait for an SNN image to be written (usually happens after 5 minutes)
kill NN, SNN , DN .
Start the NN on B with -importCheckpoint , start SNN on B , start DN on C
Enter 3 new files (failbackFileX.test)
stop-dfs.sh
Restart the original structure (NN on A , SNN on B , DN on C)
what files does hadoop recognize? - Only the testFileX.test files (the first ones).

Result : Hadoop NN doesn’t recognize that its data is not updated.

Scenario 2: Force it to load with importCheckpoint

Same scenario as above until 7 (include)

hadoop namenode -importCheckpoint :
… NameNode already contains an image in /usr/apps/hadoop/name …

That didn’t work either.

Scenario 3: Corrupt the VERSION file:

Hadoop recognizes the version file is corrupted , but just fails - no access to the SNN. Doesn’t work.

Scenario 4: Corrupt the fsimage file:

ERROR fs.FSNamesystem (FSNamesystem.java:(277)) - FSNamesystem initialization failed.

No SNN involvment.

Conclusion:
I have no idea when hadoop decides to turn to the SNN - it seems it should have done that in any of the above scenarios , but it won’t.

This post is licensed under CC BY 4.0 by the author.

Trending Tags