木村 尚磨

Announcing a new translator quality score


We know that quality is never black and white, nor can be summed up perfectly in a number. However, as a data-driven translation platform, we believe in empowering our translators with information and statistics on their performance so they can deliver quality translations at scale.

So, we’re happy to introduce a new and improved translator quality score (TQS) that will provide our translators with one number as a fair and representative indicator of their quality performance. More importantly, we believe it will serve as a helpful tool for our translators to gauge their quality and further foster a culture of learning and progression.

Quality scores are provided through an assessment tool we developed to numerically assess the quality of a translation. This is based on objective rule-based errors rather than relying on subjective opinions, and gives our active translators new check scores each month.

However, until now, translators were only shown a simple average of their last five check scores. This TQS proved to be too simplistic and failed to reflect the recent performance of a translator. In addition, we often received feedback that a translator’s TQS was being brought down by an outlying low score they received several months ago. After experimenting with different models, we’ve found one that resolves these issues and puts the translator’s needs at the forefront.

For example, with the new calculation, a 500-word job submitted two weeks ago will influence a translator’s TQS much more than a 100-word job submitted five months ago.

quality-score (1)

The main differences between the old and new TQS are summed up below.

  • Non-weighted vs. weighted average

The old TQS was a non-weighted average (the most normal type of average). This meant that all five check scores equally influenced the TQS and, thus, an older outlying low score could substantially bring down a translator’s TQS.

The new TQS is instead a weighted average. This means that certain scores will influence the TQS more than others. The two determinants that will set the influence level of each score are the job size and job recency.

Our research shows that longer texts more accurately reflect the ability of a translator, so a score of a larger job will influence the TQS more than that of a smaller one.

We also recognize that the TQS should take into account quality improvements (or deterioration), so a score of a job recently submitted will influence the TQS more than that of an older job. Job size and recency have the same weight (50/50).

  •  Five vs. 20 check scores

The old TQS was implemented when less checks were performed on translators’ work. Our current quality system handles a much higher volume of randomized checks, and so more check scores are included in the new TQS—a higher representative sample of a translator’s work.

  • All-inclusive vs. one score per language pair

The old TQS was language-pair agnostic, so some of our most active translators couldn’t differentiate their quality performance between the various language pairs they are qualified in.

We believe it’s essential for translators to know their quality performance for each of their language pairs, so the TQS is now calculated on a per-language basis for each translator. For now, the translator dashboard will display the TQS for a translator’s first language pair (by qualification date), but we plan to display one TQS per language pair in the future.


Want to become a Gengo translator?

木村 尚磨
木村 尚磨
トランスレーター オペレーション 責任者。アメリカ出身。スコットランドの University of St Andrews にて経済学修士課程を修了。卒業後、日本に来日し、滞在中にインターンとしてGengoに参画。趣味は旅行。最も思い出に残っている旅は、ノルウェーでの犬ぞり体験、南アフリカでのW杯観戦、ミャンマーの寺院から見た日の出。
  • hwemudua57

    Since Gengo is supposedly a translation community, these changes should be voted upon and discussed with the people doing the actual dirty work shouldn’t it? I was disappointed to see my quality scores change over night because of something “you decided”. The system should not be based on nor implemented merely on logic and research, but the statistical characteristics of most translators and the impact it will have on us. Changing the TQS like this is destructive since it either implies the scores where inaccurate before and our rating suffered, or you were wrong and our rating improved. I think it should best be kept simple; a client can see the performance of my past translations and we should be able to show our skills and leanings.

    • Shoma Kimura

      Hi hwemudua57, thanks for sharing your thoughts on the new TQS. we agree that the TQS, nor any other number for that matter, can perfectly sum up one’s translation quality perfectly. Rather, the TQS aims only to provide a general summary of your recent quality check scores in an easy to understand number.

  • Lali

    I’m not happy with these new rules. They aren’t fair at all! I’d like to make bigger translations with pleasure if I could grab them. You, Gengo, should create a survey to check other translator’s opinion.

    • Shoma Kimura

      Hi Lali, thanks for writing in. The reason that larger translations count more in the TQS is because we believe that a longer translation, say 1000 words, tells us more about a translator’s performance than a short job, say only 10 words long. Rest assured, we allow for more errors in a larger translation to ensure that jobs of all sizes are assessed fairly.

      • Lali

        But you choose what to review! I haven’t been getting less than 10 points for the last two years, but still have 9.5.

  • Nadia

    Do I understand it correctly that now one mistake in a larger job will have more negative influence on translator’s score than a similar mistake in a small job (since the score will be weighted based on job volume)? If so, it doesn’t seem fair.

    • Shoma Kimura

      Hi Nina, thanks for your question. No, one mistake in a larger job will not have more negative influence on your score than one mistake in a small job. This is because a larger job with one error will have a much higher check score than a smaller job with one error. For example, a 500 word job with one wrong term error will result in a higher check score than a 100 word job with the same one wrong term error. So, in this case, the larger job with a higher score will actually bring up your translator quality score.

  • JDAmadeo

    All jobs I did were all approved by the consumer, but when it came to rating my work, it was all smashed by the senior translator, whose objectivity was doubtful. This has been my experience.

    • Shoma Kimura

      Hi JDAmadeo, thanks for sharing your experience. The translation review process is meant to be a two way street. If you feel the Senior Translator was not clear in their review of your work, please let us know so that we can look into the review and determine the correct assessment. Please send an email to our support team,, and they will follow up with you. Thanks!

  • Misty

    I’m torn with this. It’s great that having one bad day will affect our scores less… Especially with the questionable way that Gengo sometimes handles reviews (ESPECIALLY for off-site jobs… never making that mistake again!)

    But I went through my email to see how often I’ve been reviewed, and I had to go back a full three years to find my last 20 reviews. It seems unfair that a bad score could affect you for a full three years. But perhaps this is a sign that Gengo is planning to do more thorough reviewing of work… Who knows?

    I hope it works out well. Gengo has usually been pretty fair with me, so I can’t complain, but I’ve had friends who were fired for incredibly questionable reasons, so I have a suspicion that I’ve just been lucky so far. We’ll just have to see what happens. :)

    • Yes. We are planning to review a higher proportion of your work in the future.

      The older the job is, the less it will affect your translator quality score so should not affect you for a prolonged period of time.

  • dennise

    The idea is good, but it won’ actually work. My average quality score was calculated twice in Gengo’s history, and both times the result was displaced. Why? The first score was calculated based on delivery time, so had nothing to do with translation quality.

    The second was based on god knows what. The only thing that I know for sure is that most of my delivered translations have never been checked by senior translators and even those few that have been reviewed didn’t contain any errors except for some missing commas. But my current score is still average – 8,4

    Now you plan to calculate the score based on the last 20 checked jobs. My question is: where do you plan to get so much checked jobs when most of them have never been reviwed?

    • Misty

      I have the same concern. :/ I translate 100,000+ words per month for my language pair, and I still had to go back three years to find 20 reviews.

    • Our automated system randomly selects jobs to be reviewed based on your activity. Translators who are more active will have more jobs reviewed.

      • hwemudua57

        Well unfortunately that creates bias in the system doesn’t it? If quality is important for translation, then it is also important that all translator scores do not contain bias for people that do not translate frequently. If jobs are not reviewed then you never recognize potentially talented translators.