skip to Main Content

Data Simplification

Cut the cost and accelerate the translation of large data sets without impacting quality.

What does it mean and where is it suitable?

Data Simplification extends Translation Memory savings with techniques to help identify “new content” within large data sets, reducing the scope of translation projects  

The best applications are large product data sets with keywords and related attributes with small variations across version/category.

The result is quicker, cheaper projects, making best use of language resources for long-term benefits.

Customers have saved up to 30%, and projects were completed in a fraction of the time they thought possible.

Reduce
word count

Get more products to more markets quicker

Reduce costs to get the most from your budget

No impact
on translation quality

Case Study: US-based DIY chain

The results: 12 million words translated words in the project with number of new words reduced to 2 million combining translation memory and data simplification savings.

This was achieved by filtering and simplifying content based on SKU, product size (e.g. ¼ inch vs ½ inch) and product batch size (e.g. 150 in a pack va 250 in a pack).

These simplification approaches had a massive impact on what was then considered “new” content.  Beyond this, the word identified other patterns in subsets of translation units which could be broken down safely into constituent parts.

Example savings

Project 1:
Reduced 35% of word count, resulting in £40,000 of translation savings.

Project 2:
Reduced “new” word count by 32%, which equated to a reduction in total word count of 9%, or around a £20,000 saving per language (with 10 language combinations).

Process workflow

1. Content received

Variety of formats accepted. We can provide advice on transferring large content volumes.

2. Identify content for translation

Isolates content from document skeleton and also captures standard code (HTML for example) as tags to give a clean translation view for the team.

3. Content matching to identify new material

Variety of techniques to identify previously translated material in the document and costs discounted accordingly.

4. Project proposal confirmed

Prices and project timelines provided based on reduced wordcount.

A step beyond Translation Memory savings

While Translation Memory looks for matches at a segment level, we use a variety of techniques to increase the granularity and structure of content matching.  This means that through Data Simplification we can spot similar content in ways that would be missed by Translation Memory.

Extending catalog management software capability

PIM and other catalogue management tools regularly use concatenation to create a unique set of descriptions from a finite set of attributes.  We extend this to identify cases where even smaller sub-elements could be treated independently whether concatenation was used or not.

Going further with Neural Machine Translation

Content which benefits from Data Simplification is typically well suited to neural machine translation. So not only can we reduce the amount of content that needs to be touched, but speed up the translation process on the bits that do need translation.

X