Statistics

../../_images/crawl_statistics.pngCrawl statistics

To understand the way statistics work in Easyling, we have to start with the most important fact: in our application, the largest unit of measurement is a block, which is usually represented by a <p> or a <div>. Blocks break down to segments, segments to words, words to letters. Since the translation proxy deals exclusively with content in webpages, HTML tags also play an important part in weighing the repetitions.

It is also important to note that the statistics in the translation proxy are different degrees of repetitions. During a crawl, the website’s content is “repetitioned” against itself, simulating a translation process not unlike the Homogeneity feature in MemoQ.

Repetitions

Below is a breakdown of the various repetition percentages in the Statistics.

102% - Strong contextual repetitions: These are block repetitions. Every segment in the block is a 101% repetition, and all the tags are identical. We do not charge for these repetitions and they are propagated automatically within the project.

101% - Contextual repetitions: These repetitions are comparable to the 101% repetitions in MemoQ, or Context Matches in SDL Trados Studio. Both tags in the segment, and contexts (segments immediately before and after) repeat.

100% - Regular repetitions: This one is straightforward, and comparable to the Repetitions count in MemoQ or Trados. The segment is repeated exactly, including all tags.

99% - Strong fuzzy repetitions: In this case, a repetition is found after few transformations on the segment before comparing: tags from the ends are stripped out, words lowercased, numbers ignored.

98% - Weak fuzzy repetitions:Here, all tags are stripped out, not just the ones in the end; words lowercased, numbers ignored.

102% repetitions

102% repetitions warrant special mention. The proxy deals with HTML block entries, and has a very “overarching” view of them, since it strives to ensure that the same entry is never translated twice, regardless of which page it shows up on.

Consider a navigation bar of a website. During a Discovery, the proxy will come across the navbar for the first time on the landing page. It will aggregate the word count of the block elements in the navigaion bar as unique and moves on to analyze the next page.

Of course, navigation bars are such that they are shown on all pages of a website, so when the proxy sees the same navigation bar for the second time somewhere else, it doesn’t count it as unique: instead, it adds the associated word count as a 102% repetition.

So, the navigation bar is liable to be counted as many times as there are pages: so don’t be alarmed if you see large numbers in the 102% repetition row – any sort of repeated content is mercillesly added there. Just keep in mind that this is work that the proxy is saving you.

Cost

102% repetitions should not be counted when creating cost projections based on a word count result. So, remember the formula: for any given word count result, total minus 102% repetitions is the maximum amount that you need to extract or translate.

Examples

You will find an illustration of the various repetitions in the table below. Hover your mouse over any of the matches to highlight the differences.


Original Repetition Explanation
The quick, brown fox jumps over the lazy dog. The dog gets really angry, and chases away the fox. The fox regrets the whole thing and quits jumping, leading to its ultimate demise. The dog lives happily ever after. The End of story 1.
The quick, brown fox jumps over the lazy dog. The dog gets really angry, and chases away the fox. The fox regrets the whole thing and quits jumping, leading to its ultimate demise. The dog lives happily ever after. The End of story 1.
102% match. They are completely identical.
The quick, brown fox jumps over the lazy dog. The dog gets really angry, and chases away the fox. The fox regrets the whole thing and quits jumping, leading to its ultimate demise. The dog lives happily ever after. The End of story 1.
The quick, brown fox jumps over the lazy dog. The dog gets really angry, and chases away the fox. The fox regrets the whole thing and quits jumping, leading to its ultimate demise. The dog lives happily ever after. The End of story 1. Not!
101%, 5 repetitions. By adding another segment to the block, the first 5 sentences become 101% repetitions, but the last one is unique, therefore it is not a 102% match.
The quick, brown fox jumps over the lazy doge. The doge gets really angry, and chases away the fox. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise. The doge lives happily ever after. The End of story 2. The End of story 2. The doge lives happily ever after. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise. The doge gets really angry, and chases away the fox. The quick, brown fox jumps over the lazy doge. 100%, 5 repetitions. The contents are the same, but the order is reversed, thefore they are not 101 matches anymore.
The quick, brown fox jumps over the lazy doge. The doge gets really angry, and chases away the fox. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise. The doge lives happily ever after. The End of story 3. The End of story 4. The DOGE lives happily ever after. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise.<br/>
The doge gets really angry, and chases away the FOX. The quick, brown fox JUMPS over the lazy doge.
99%, 5 repetitions. Aside from the reversed order, some words are in a different case, and one segment even has a tag at the end (<br/>) which is not found in the source.
The quick, brown fox jumps over the lazy doge. The doge gets really angry, and chases away the fox. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise. The doge lives happily ever after. The End of story 3. The End Of Story 4. The DOGE lives happily ever after. The foxe regrets the whole thing and quits jumping, leading to its ultimate demise.<br/>
The doge gets Really Angry, and chases away the FOX. The quick, brown fox JUMPS over the lazy doge.
98%, 5 repetitions. Many tags are changed and/or inserted, and more words are in different cases now.