Data Simplification
Cut the cost and accelerate the translation of large data sets without impacting quality
What does it mean and where is it suitable?
Data Simplification extends Translation Memory savings with techniques to help identify “new content” within large data sets, reducing the scope of translation projects
The best applications are large product data sets with keywords and related attributes with small variations across version/category.
The result is quicker, cheaper projects, making best use of language resources for long-term benefits.
Customers have saved up to 30%, and projects were completed in a fraction of the time they thought possible.
Case Study: US-based DIY chain
The results: 12 million words translated in the project with number of new words reduced to 2 million combining translation memory and data simplification savings.
This was achieved by filtering and simplifying content based on SKU, product size (e.g. ¼ inch vs ½ inch) and product batch size (e.g. 150 in a pack va 250 in a pack).
These simplification approaches had a massive impact on what was then considered “new” content. Beyond this, the word identified other patterns in subsets of translation units which could be broken down safely into constituent parts.
Example savings
Project 1:
Reduced 35% of word count, resulting in £40,000 of translation savings.
Project 2:
Reduced “new” word count by 32%, which equated to a reduction in total word count of 9%, or around a £20,000 saving per language (with 10 language combinations).
Process workflow
A step beyond Translation Memory savings
While Translation Memory looks for matches at a segment level, we use a variety of techniques to increase the granularity and structure of content matching. This means that through Data Simplification we can spot similar content in ways that would be missed by Translation Memory.
Extending catalog management software capability
PIM and other catalogue management tools regularly use concatenation to create a unique set of descriptions from a finite set of attributes. We extend this to identify cases where even smaller sub-elements could be treated independently whether concatenation was used or not.
Going further with Neural Machine Translation
Content which benefits from Data Simplification is typically well suited to neural machine translation. So not only can we reduce the amount of content that needs to be touched, but speed up the translation process on the bits that do need translation.