word compo unding

A colleague (non-native speaker of German) was filling out some form and asked for advice about the following:

> Bitte ergänzen Sie wenn möglich eine deutsche Voranschrift der letzten 5 Jahre:

Interestingly, Google Translate could not help him, since it produces this in English:

> Please fill in if possible, a German font previews over the past 5 years:

Why the „font previews“? This might be due to faulty German word compounding. It seems that Google tries to split German compounds in order to break up the out-of-vocabulary tokens (here: _Voranschrift_ meaning _Vor_/“previous“ and _anschrift_/“address“) but messes this up and instead splits into _Voran_ and _schrift_, which translates into „before“ and „font“. The occurrence of „font“ then probably influences the language model to prefer something with _previews_ since we have the tokens „German“ and „font“ in the translation hypothesis and a „German font preview“ makes some sense in a (trigram) context. 😉 Furthermore, _Voran_ as part of _Voransicht_ means „preview“.

Ah, behold the wonders of statistical machine translation. You never know what ya gonna get!™

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden.