(counting) Words, don’t come easy… to MS Word
Sorry for the grammatical blunder in the header - but that song struck me this morning - tough luck. Anyway - I’ve talked before about how MS Word counts words - or to be more exact: how it doesn’t count words but the spaces between the words - or to put it very correctly: MS Word counts how often in a document there is a character or a string of characters followed by a whitespace. Given that MS Word is the mostly used software by clients of translation, you kind of want to be close to their wordcount when you quote for a job.
Now there was this funny thing in a table I saw recently which almost went undetected: for some reason the table contained dots - a long row of them. But instead of just dotting along (…) the dots had spaces in between (. . .). Now to MS Word, this looked like a document that has about 2,800 “words” when doing a wordcount.
I have to admit that I didn’t even look at the file myself and simply did an CAT tool analysis on it. The result was that there were more like 1200 words in the document. When I was challenged on the result I finally had a look at the document and found lots of cases of dotted lines with spaces in between - which MS Word counted as words.
The interesting bit is that doing a wordcount with Open Office Writer will give you 1200 words - which is very intelligent of Open Office and again one of those things… By the way, OO Writer also manages dashes quite smartly by counting two words for “word-count” where MS Word would only count one.
It is going to be interesting if Office 12 will be smarter with wordcounts - there’s no reason not to - it is all about the way they define word delimiters - and there are guidelines and industry standards for that….

