When will hadoop NN recognize a failure?

1 minute read

I’ve been trying to cause the NN to recognize that it is inconsistent and turn to the SNN for data.

Scenario 1: Will the NN Recognize that its data is not up-to-date? (No)

  1. Started with NN on server A , SNN on B , DN on C
  2. Enter 3 files (testFileX.test)
  3. Wait for an SNN image to be written (usually happens after 5 minutes)
  4. kill NN, SNN , DN .
  5. Start the NN on B with -importCheckpoint , start SNN on B , start DN on C
  6. Enter 3 new files (failbackFileX.test)
  7. stop-dfs.sh
  8. Restart the original structure (NN on A , SNN on B , DN on C)
  9. what files does hadoop recognize? - Only the testFileX.test files (the first ones).

Result : Hadoop NN doesn’t recognize that its data is not updated.

Scenario 2: Force it to load with importCheckpoint

Same scenario as above until 7 (include)

  1. hadoop namenode -importCheckpoint :
    NameNode already contains an image in /usr/apps/hadoop/name

That didn’t work either.

Scenario 3: Corrupt the VERSION file:

Hadoop recognizes the version file is corrupted , but just fails - no access to the SNN. Doesn’t work.

Scenario 4: Corrupt the fsimage file:

ERROR fs.FSNamesystem (FSNamesystem.java:(277)) - FSNamesystem initialization failed.

No SNN involvment.

I have no idea when hadoop decides to turn to the SNN - it seems it should have done that in any of the above scenarios , but it won’t.