What i don't then see is how someone saying "I don't know" changes any of the above?
Think about what the 'other' players must be seeing.
First let's assume that all players know there is a mix of hats (at least one of each). This could be established by the teacher telling them, but can also be deduced visibly at the start of the game, but is a little complex to be done visibly, for the reasons I've already posted).
Anyway, let's assume that 1 bit of knowledge, as it is crucial (at least 1 of each hat).
Player 1 sees a mix of hats, he has no idea what his hat is, so he says 'don't know'.
Player 2 knows what player 1's hat is, but also sees a mix of colours (2 blacks), so he says 'don't know'.
All other players know player 2 is black, so they know he saw at least 1 other black hat. If he didn't see other black hats, then he's the 1 and only back hat'.
We continue thus until player 4's turn. He also see's two black hat's (2 and 9). Now player 2 might have only seen 1 black (player 9), or he might have seen 2 (4,9). So he's unsure if he (4) is red or black and say's 'don't know'.
When we get to player 9's turn, he knows that if he's red, then player 4 must have only seen 1 black (2). And so player 4 would have won the game, knowing he (4) was the only black.
This means the player 4 must have seen 2 and 9 as black, which means that 9 knows he must be black.
Probably not explained the best, sorry.