I was asked a question the other day in an interview where he had a method for counting number of words characters and new lines. The method was something along the lines of (this is untested I'm just going based on memory)
He then asked could I see any problems if the input were in UTF-8, I wasn't sure what he was talking about, but apparently in UTF-8 characters can "use one to four 8-bit bytes" (pinched from wikipedia).
So does that affect the character count in the above code snippet, is that what he was getting at?
Code:
public void readWordCharLineCount(InputStream in, PrintStream out, PrintStream err) {
int nw = 0;
int nc = 0;
int nl = 0;
byte[] buff = new byte[4096];
try {
while(in.read(buff) != -1) {
for(int i=0; i < buff.length; i++) {
char c = (char) buff[i];
if(c == '\n') {
++nl;
} else if(Character.isWhiteSpace(c)) {
++nw;
}
++nc;
}
}
System.out.println(nl + " " + nw + " " + nc);
} catch(IOException e) {
err.print(e);
return;
}
}
He then asked could I see any problems if the input were in UTF-8, I wasn't sure what he was talking about, but apparently in UTF-8 characters can "use one to four 8-bit bytes" (pinched from wikipedia).
So does that affect the character count in the above code snippet, is that what he was getting at?
Last edited: