What we love about XML
After we "hinted" extensively in our last issue that we are absolute XML fans, we will today make an attempt to tell you a few reasons for our affection (without being too technical). You might remember from the last article that our fondness goes beyond the avalanche of abbreviations or the technology hype. It would be more accurate to say that XML is our favourite because it is future-proof and because it offers enormous possibilities when it comes to data storage and exchange. This article is going to touch on a couple of important issues which highlight this.
Firstly there is the displaying problem, and secondly what usually is referred to as portability.
"Show it all" courtesy of "Unicode support"
As translators, we face up to language and cross-cultural challenges as well as numerous computer issues. Our goal: the struggle should be invisible to the final user i.e. you the client. But technology did not play along. And users of translations could see our struggles: remember the olden days and maybe even more recently when more and more Asian web pages popped up on the Internet literally 'overnight'. "Back then" you might have come across garble like this when viewing files that originated in Asian countries:
"ã€å¤§å ç²è ¨Šã€'ç¾åœ‹ é§é¦ å°¼æ‹‰å ¤§ä½¿é¤¨ç™¼è".
But what looks like a random "string of characters at first glance is actually (did you guess?) the following Chinese sentence:
"【大公網訊】美國駐馬尼拉大使館發表聲明稱,由於收到「可能的威脅資訊」"
The problem is: it is displayed wrongly.
I should explain that you just came across one of a translation company's common nightmares: we've translated a sentence into Chinese, but the application in which it is being used is not able to display our translation correctly. The technical issue is that every computer application (like Word, Internet explorer and even your accounting software) is capable of displaying a range of character sets. Historically, there were fewer character sets - and the most common one was the USA set called "ASCII". It contained all characters necessary to display English sentences and punctuation. But because it didn't contain, for example, French accents or German Umlauts (or any Asian characters for that matter) - ASCII was very limiting when it came to displaying translations. The first uncoordinated solution brought the emergence of a raft of character sets - almost one for each language. Some were free and others were proprietary and had to be purchased. Due to deep dissatisfaction about this, Unicode was developed and agreed on as a standard. Unicode is a character set that contains 65,536 characters - covering almost every language and every language rule in the world. Of course 'our' XML supports Unicode - (but it allows any other character set too). The ingenious part of this solution is that XML allows and even asks you to declare which character set the file content format is based on. We love it for that, because we can identify at one glance if you are likely to encounter any problems with the translation and, if so, we can help you to solve it before you know it. The reason we can promise this has to do with the other issue we want to highlight today:
Move it along - Portability
Portability doesn't necessarily mean that you can take your XML file with you on holiday (although you actually can) but rather that XML can be exchanged between applications rather easily. The beauty of this is not only that different software applications can do lots of different things with the data. Furthermore XML became somewhat of a killer application because it can be used across all kinds of platforms - be it Windows, Linux, Apple or even your mobile phone. The other aspect of this is "what" you store: XML is so flexible that you can use it to "parallel store" source text with the translated target text in the same file.
So not only you have a file that is quite likely to survive version changes and hardware development, but you also have options as to what you want to store. If you wanted to, you could, for example, store the content in 23 languages in the same file and publish your content in 23 languages from that one source to all the applications you want to. File management becomes a breeze and your publications can be easily managed across languages.
We've kept this article pretty general, so next time we might have a look at XML solutions that are particularly designed to address translation related issues. And we will tell you about the XML Localisation Interchange File Format (or XLIFF for short).

