Localization (also known as L10n) is the adaptation of a product, software, application or document so that it meets the requirements of the specific target market or locale. [1] The localization process revolves around translation of the content. However, it can also include other elements such as:

  • Modifying graphics to target markets
  • Redesigning content to suit the market audience's tastes
  • Changing the layout for proper text display
  • Converting phone numbers, currencies, hours, dates to local formats
  • Adding relevant or removing irrelevant content to the target market
  • Following legal requirements and regulations [2]
  • Considering geopolitical issues/factors and changing it properly to the target market

The goal of localization (l10n) is to make a product speak the same language and create trust with a potential consumer base in a specific target market. To achieve this, the localization process goes beyond mere translation of words. An essential part of global product launch and distribution strategies, localization is indispensable for international growth. [3]

Localization is also referred to as "l10n," where the number 10 represents the number of letters between the l and n.

Subject classification: this is a localization resource.
Type classification: this resource is a course.

History

edit
  Educational level: this is a tertiary (university) resource.

The history of localization dates back to the 1980s. Desktop computers were not just for engineers anymore; they began to make an appearance in homes and offices. These everyday users needed software that would help them do their work efficiently and in a way that met the local language, standards and habits. [4]

Software companies also began to look for international audiences. They had successfully achieved target market goals and looked to expand. Everyday users in other countries would need the software to be adapted in a way where they could work efficiently. [5] In addition to translating the content, for the software to feel truly local, dates had to be formatted, layout had to adapted to display text properly and legal requirements had to be met.

In the early 1980s most software vendors started in-house translation departments or outsourced translation work to freelance translators or in-country product distributors. The increasing size and complexity of localization projects soon forced companies to an outsourcing model. In the mid 1980s, the first multi-language vendors (MLVs) were formed. New companies such as INK (now Lionbridge) or IDOC (now Bowne Global Solutions) specialized in the management and translation of technical documentation and software. Existing companies with other core competencies, such as Berlitz, started translation divisions that could handle multilingual translation and localization projects. [6]:5

As the localization industry developed, standards became increasingly important. The development of Unicode in 1987, and its success at unifying character sets, had an enormous impact. Another important change was the introduction of a "single world-wide binary", i.e. development of one version of a program that supports all languages. This single binary was often combined with "resource-only DLL files", where all user interface text elements, such as dialog box options, menus and error messages, were centralized. All program code was separated from the resources which meant that applications could be run in another language by replacing the resource-only DLL with a localized version.[6]:9

  Educational level: this is a research resource.

Key Concepts and Terms

edit

Translation is the process of recreating a written text from a source language into a target language, in a way that sounds natural to the target audience. A specialized linguist will typically translate a text into their native language. Translation is often performed using a CAT tool, which contains a translation memory and terminology for the project. Linguists also usually reference a style guide and other materials provided by the client.

Localization is the linguistic and cultural adaptation of digital content to the requirements and the locale of a foreign market; it includes the provision of services and technologies for the management of multilingualism across the digital global information flow. Thus, localization activities include translation and a wide range of additional activities.[7] True localization considers language, culture, customs and the characteristics of the target locale. It frequently involves changes to the software’s writing system and may change keyboard use and fonts as well as date, time and monetary formats. The common abbreviation for localization is l10n, where the number 10 refers to the number of letters between the l and the n.[8]

Localizability

edit

Designing software code and resources so that resources can be localized with no changes in the source code.

Internationalization

edit

The process of developing a program core whose features and code design are not solely based on a single language or locale. Instead, their design is developed for the input, display, and output of a defined set of Unicode-supported language scripts and data related to specific locales. [9] The common abbreviation for internationalization is i18n, where the 18 refers to the number of letters between the i and the n.

Globalization

edit

Designing software for the input, display, and output of a defined set of Unicode supported language scripts and data relating to specific locales and cultures. The common abbreviation for globalization is g11n, where the 11 refers to the number of letters between the g and the n.

World-Readiness/Global Readiness

edit

The state of a product when it is properly globalized and is easy to customize and localize. It has to consider the readers' cultures, beliefs, languages, locations, etc. in order to let the readers get the original intended meaning without distortion or being offended.

Customizability

edit

Designing software that is componentized and extensible to allow for replacement, addition and/or subtraction of features necessary for a given market.

Locale

edit

A language and geographic region that also includes common language and cultural information. Thus French-France (fr-fr), French-Canada (fr-ca), French-Belgium (fr-be) are different locales. Locale also refers to the features of a user’s computing environment that are dependent on geographic location, language and cultural information. A locale specifically determines conventions such as sort order rules; date, time and currency formats; keyboard layout; and other cultural conventions.

A locale is a collection of international preferences, generally related to a language and geographic region that a (certain category) of users require, which are represented as a list of values. The locales are usually identified by a language and a sort order, and they include a local name and a shorthand identifier or token, such as a language tag. It can be composed of: a base language, the country (territory or region) of use, and a set of codes.

More than one locale can be associated with a particular language, which allows for regional differences. For example, Spanish for Spain vs. Spanish for Latin America, or French for France vs. French for Canada.

Usually, in the L10N industry, locales are displayed in a combination of 2-letter code for Language (ISO 639[10]) and 2-letter code for Country (ISO 3166[11]). For example, Microsoft has the list of Locale ID (LCID)[12] they use (e.g. French-France as fr-FR, Spanish-Spain as es-ES, Dutch-Netherlands as nl-NL.)

A language tag is a string used as an identifier for a language. These language tags consist of one or more subtags.

A subtag is a sequence of ASCII letters or digits separated from other subtags by the hyphen-minus character and identifying a specific element of meaning withing the overall language tag. Subtags are limited to no more than eight characters.

Notes:

  • Language tags are not case-sensitive. en-AU, en-AU, en-au, En-Au, etc., are all the same tag, and denote the same language (Australian English).
  • Language tags are not for computer languages.
  • Language tags are not country codes

Locale names and codes

edit
Locale Name Local Code Native Name Language Family
Afrikaans af Afrikaans
Albanian sq Shqip
Algerian Arabic arq الدارجة الجزايرية
Amharic am አማርኛ
Arabic ar العربية
Armenian hy Հայերեն
Aromanian rup Armãneashce
Arpitan frp Arpitan
Assamese as অসমীয়া
Azerbaijani az Azərbaycan dili Azərbaycan
Azerbaijani (Turkey) az-TR Azərbaycan Türkcəsi Azerbaijani
Balochi Southern bcc بلوچی مکرانی
Bashkir ba башҡорт теле
Basque eu Euskara
Belarusian bel Беларуская мова
Bengali bn বাংলা
Bosnian bs Bosanski
Breton br Brezhoneg
Bulgarian bg Български
Catalan ca Català
Cebuano ceb Cebuano
Chinese (China) zh-CN 简体中文 Chinese
Chinese (Hong Kong) zh-HK 香港中文版 Chinese
Chinese (Taiwan) zh-TW 繁體中文 Chinese
Corsican co Corsu
Croatian hr Hrvatski
Czech‎ cs Čeština
Danish da Dansk
Dhivehi dv ދިވެހި
Dutch (Netherlands) nl-NL Nederlands Dutch
Dutch (Belgium) nl-BE Nederlands (België) Dutch
Emoji art-xemoji 🌏🌍🌎 (Emoji)
English (United States) En-US English (United States) English
English (Australia) en-AU English (Australia) English
English (Canada) en-CA English (Canada) English
English (New Zealand) en-NZ English (New Zealand) English
English (South Africa) en-ZA English (South Africa) English
English (UK) en-GB English (UK) English
Estonian et Eesti
Faroese fo Føroyskt
Finnish fi Suomi
French (Belgium) fr-BE Français de Belgique French
French (Canada) fr-CA Français du Canada French
French (France) Fr-FR Français French
Frisian fy Frysk
Friulian fur Friulian
Galician gl Galego
Georgian ka ქართული
German (Germany) de-DE Deutsch (Deutschland) German
German (Luxembourg) de-LU Deutsch (Luxemburg) German
German (Switzerland) de-CH Deutsch (Schweiz) German
Greek el Ελληνικά
Greenlandic kal Kalaallisut
Guaraní gn Avañe'ẽ
Hawaiian haw Ōlelo Hawaiʻi
Hazaragi haz هزاره گی
Hebrew he עִבְרִית
Hindi hi हिन्दी
Hungarian hu Magyar
Icelandic is Íslenska
Indonesian id Bahasa Indonesia
Irish ga Gaelige
Italian it Italiano
Japanese ja-JP 日本語
Javanese jv Basa Jawa
Kabyle kab Taqbaylit
Kinyarwanda kin Ikinyarwanda
Korean ko-KR 한국어
Lao lo ພາສາລາວ
Latvian lv Latviešu valoda
Limburgish li Limburgs
Lithuanian lt Lietuvių kalba
Luxembourgish lb Lëtzebuergesch
Macedonian mk Македонски јазик
Maori mri Te Reo Māori
Mongolian mn Монгол
Nepali ne नेपाली
Norwegian (Bokmål) nb Norsk bokmål
Occitan oci Occitan
Ossetic os Ирон
Persian fa فارسی
Polish pl Polski
Portuguese (Brazil) pt-BR Português do Brasil Portuguese
Portuguese (Portugal) Pt-PT Português Portuguese
Romanian ro Română
Russian ru Русский
Serbian sr Српски језик
Slovenian sl Slovenščina
Somali so Afsoomaali
Spanish (Argentina) es-AR Español de Argentina Spanish
Spanish (Chile) es-CL Español de Chile Spanish
Spanish (Colombia) es-CO Español de Colombia Spanish
Spanish (Guatemala) es-GT Español de Guatemala Spanish
Spanish (Mexico) es-MX Español de México Spanish
Spanish (Peru) es-PE Español de Perú Spanish
Spanish (Puerto Rico) es-PR Español de Puerto Rico Spanish
Spanish (Spain) es-ES Español de España Spanish
Spanish (Venezuela) es-VE Español de Venezuela Spanish
Sundanese su Basa Sunda
Swedish sv Svenska
Tahitian tah Reo Tahiti
Thai th ไทย
Tibetan bo བོད་སྐད
Turkish tr Türkçe
Ukrainian uk Українська
Urdu ur اردو
Vietnamese vi Tiếng Việt
Walloon wa Walon
Welsh cy Cymraeg
Yoruba yor Yorùbá

References:

Transcreation

edit

Transcreation is typically employed by marketing teams and describes the process of translating a message from one language to another, preserving the emotion and implications [13] of the original message, not necessarily the exact same sentence and paragraph structure. While localization never employs word-to-word translation, transcreation goes a step farther and provides greater freedom in the cultural adaptation of its content, sometimes extending to the adaptation of company logos and images used in print ads.

The transcreator needs to understand the cultural background of both locales, where the original content was created and where the translated text will be used. The linguist will recreate the text in every aspect; conveying not only the message but also the style, and images and emotions it evokes. It requires not only a lot of creativity and cultural sensitivity but also familiarity with the product being advertised.[14] There are many advantages to use a transcreator instead of a copywriter in the target language, for example, it tailors a global message to a specific market, hence making a global advertising campaign more flexible, while aligning to a primary strategy.[15]

Variant

edit

When a language is spoken in more than one country or is used by more than one culture, it develops specific forms that are called variants [16]. For example, the variants of German spoken in Germany and Austria would differ, as well as the variants of Russian spoken in Russia and Belarus.
The aspects of a language that may differ include phonemes (accent), morphemes, syntactic structures (grammar) and meanings (lexicon)[17]. One can also find different variants of the same language within one country. In this case, variants would reflect the difference in social status or even different types of situations they are used in.

Term extraction

edit

Also called term mining or term harvesting, term extraction occurs when a given text (or corpus) is analyzed to identify terms relevant to the task at hand within their context. It is the first step in terminology management, with the second step being the elimination of any inconsistencies. As one of the core concepts in of terminology management, term extraction is important because it improves the quality of communication through maintaining consistency while also reducing both translation costs and time to market for the project.

Term extraction can be monolingual or bilingual, and there are several existing tools to help by automating this task, each with positive elements and negative elements. In short, there is no one solution for all situations. These tools can be separated into 3 main categories: linguistic, statistic and hybrid. The linguistic category of tools searches for language patterns which match a defined set of rules. The statistical set of tools determines repetitions of a certain sequence of terms. The hybrid tools, the most frequently used, feature a combination of the above two approaches. These tools can also be stand-alone tools or web-based tools.

Term extraction can be completed using different strategies: 1) Automated analysis and term extraction from source files and existing TMs using terminology extraction software such as Term Extract from SDL Studio. 2) Human review and manual extraction of terminology from reference materials: source & target files (print, web, software, multimedia), TMs, existing legacy glossaries, list of non-translatable terms, lists of forbidden words, translatable and non-translatable database exports; 3) A combination of automated and manual terminology extraction from reference materials, which provides the best results allowing a human linguist to capture key terminology omitted by software based on linguistic or statistical parameters.

The goal of term extraction is to identify and list key terminology such as: company names, legal names, brand names, product names, service(s) names, copyright names, service marks, programs, promotions, plans, discounts, features, functions, general terminology, legal terms, business terms... or any other term that is important depending on the purpose of the content to be translated. As part of the extraction process, it is important to provide context (complete sentence from which source term originates) and a visual (screenshot of application UI) for each extracted term to make sure translation team provides a recommendation that is both accurate, suitable, and useful.

Before Localization

edit

Getting a product ready for international markets

Internationalization often abbreviated as I18n, is the process of designing and developing a software application in a way that it can be adapted to different languages and regions without changing the source code. It is also called translation or localization enablement. During the Internationalization process there are many differences to be taken into account that go way beyond the mere translation of words and phrases. For example different national conventions and standard locale data.

A collection of such differences is provided by the Unicode Common Locale Data Repository.

The internationalization process is to enable easy localization for the target locale.

Internationalization typically entails:

  • Designing and developing in a way that removes barriers to localization or international deployment. This includes such things as enabling the use of Unicode, or ensuring the proper handling of legacy character encodings where appropriate, taking care over the concatenation of strings, avoiding dependance in code of user-interface string values, etc.
  • Providing support for features that may not be used until localization occurs. For example, adding markup in your DTD to support bidirectional text, or for identifying language. Or adding to CSS support for vertical text or other non-Latin typographic features.
  • Enabling code to support local, regional, language, or culturally related preferences. Typically this involves incorporating predefined localization data and features derived from existing libraries or user preferences. Examples include date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address, etc.
  • Separating localizable elements from source code or content, such that localized alternatives can be loaded or selected based on the user's international preferences as needed.[18]
edit

Localizability is an intermediate process for verifying that a globalized application is ready for localization.[19]. Localizability testing verifies that the user interface of the program being tested can be easily adapted into any local market (locale) without modifying source code. This important process helps ensure the functionality of the application by discovering and fixing errors in source code.

Culturalization

edit

Culturalization is a key component of the localization process that transcends language and focuses on the cultural, geopolitical, and historical adaptation of the content into different languages for specific locales. The term has been newly introduced in the localization industry in the past decade and it has been mainly associated with the video gaming industry. Increasingly, companies who generate non-gaming content have started integrating culturalization as part of their globalization and localization process. With culturalization, companies create content that is globally appropriate and locally relevant, while ensuring that it is adapted to local cultural sensitivities and free of local offensive statements. Where localization has the power to meet users’ language needs, culturalization goes one step further to ensure that users from different cultures can interact with a product in a more meaningful way.

People who work in culturalization are included in brainstorming sessions and in the initial design of the game. Culturalization experts have deep international historical cultural, and geopolitical knowledge that they apply while considering adapting content to local audiences. Short experience has shown that it is easier and more thoughtful to include culturalization into the overall design and marketing strategy than to mend mistakes after launch. Political sensitivities include border delimitations and historical events. Cultural sensitivities can evolve around misappropriation of religious symbols and belief systems and ethnicity.

Below are a few areas to be taken into consideration in the culturalization phase of a localization project:

  • Avoiding offending users

Culture has a huge impact on the way we do and say things, on what we read, we listen to, and on the way we think. This is why the impact of culture on website localization is enormous. Companies have invested a lot of money to make sure that they are not offending and are meeting the cultural appropriateness steps to satisfy their audience.

  • Use of images and symbols

Images and symbols are the first thing on a website that people pay attention to. Just because they are not in a written language, it doesn't mean they are without meaning. They carry many subtle cultural messages within them and can tell a lot of information about a service or product. For this reason, it is very important to be aware that they could have negative connotations that will affect the consumers. For example, a travel website showing women in bikinis, partying and being informal will not be considered appropriate and will not be successful in Muslim countries. In some circumstances, it could even lead to legal restrictions and penalties. It is usually recommended to avoid the use of hand symbols, gestures and body parts, religious symbols that are not globally recognized, animal symbols, graphical elements with text or a single letter. They can typically be replaced by abstract illustrations, geometric shapes, globally recognized symbols or other standardized images.

  • Historical background of a locale and Maps

The treatment of maps in localization can present problems for countries or regions with disputed borders or territories. Current examples of disputed areas include Kashmir (India and Pakistan), the West Bank (Israel and Palestine), Taiwan (The People's Republic of China and The Republic of China (Taiwan)), and Crimea (Russia and Ukraine). For localization, decisions have to be made about how to portray the disputed territories. This problem is especially relevant when localizing map applications and GIS software. One solution is to mark such territories as "disputed" on maps. But this is not always possible if local regulations stipulate how the maps should be displayed. In some cases, there can also be conflicting geographic names, e.g. when the countries or entities that lay claim to disputed regions have different official languages and/or writing systems.

Issues with maps can also arise in marketing, advertising, and other creative domains. For example, in January 2016 a Coca Cola campaign on the Russian social media site vKontakte posted an Orthodox Christmas greeting using a map of Russia without Crimea. Groups of Russians criticized the omission of Crimea in the advertisement, and Coca Cola responded by publishing a new advertisement with a map of Russia that included Crimea. But Coca Cola's second attempt drew criticism from groups of Ukrainians, who protested the company's decision indicated Crimea as part of Russia and not Ukraine.[20]

India claims the entire erstwhile princely state of Jammu and Kashmir based on an instrument of accession signed in 1947. Pakistan claims Jammu and Kashmir based on its majority Muslim population, whereas China claims the Shaksam Valley and Aksai Chin.

  • Flags

Problems may also arise from the use of flags. A clear example of how a little mistake can affect an entire population is the event that took place during the Olympic Games in Rio 2016. On the first day of the games, the incorrect Chinese flag was used during the medal ceremony of the women's volleyball. The errors in the flag were immediately pointed out and new correct flags were used for the remaining medal ceremonies in which China was involved. Unfortunately, the mistake reappeared at the end of the games with the display of the wrong flag once again. The proper Chinese flag features one big star surrounded with four smaller stars against a red background. The larger yellow star represents the Communist Party of China, and the four smaller stars symbolize the solidarity of Chinese people of all social classes and ethnic groups under the leadership of the CPC. In the correct version the four smaller stars all point towards the big star. But in the incorrect one that appeared at Rio 2016 the four smaller stars were parallel to each other. [21]

When used to represent languages, flags related issues can be seen, for example, in a menu option for changing a website or application's display language. Some examples to illustrate this point:

  • Some countries have more than one widely spoken language. In Belgium, French and Dutch are spoken by large portions of the population, and German is a third official language. In India, more than 30 languages are each spoken by over 1 million people[22].
  • There are many languages that are commonly spoken in more than one country. For example, the Portuguese language could in theory be represented by the flags of Portugal, Brazil, or a number of other Lusophone countries such as Cape Verde or East Timor.
  • Some countries have very similar flags. The national flags of Romania and Chad are almost identical, whereas the official language of Romania is Romanian, and Chad's official languages are Arabic and French.

In each of these cases, using a single national flag to represent a language could be at best ambiguous, and at worst contentious or offensive.

Having a good understanding of what is and is not acceptable is crucial especially for countries where religion is present in law.

Countries like Saudi Arabia, the United Arab Emirates (UAE) and Pakistan operate mostly or entirely under Sharia law (the moral code and religious law for the Islamic faith). The usage of scantily-clad women in bikinis, beer drinking and gambling in marketing materials is forbidden in countries under Sharia law and food must not be advertised during Ramadan, a time of fasting (see also: Fasting in Islam). Localization strategy in these countries have to take their religious law into considerations.

Another example of localization without religious consideration is the case with a Microsoft Xbox game. In 2003, Kakuto Chojin: Back Alley Brutal, developed by Dream Publishing and published by Microsoft Game Studios in 2002, was pulled of shelves just a few months after it was released after receiving an extremely vocal and negative reaction due to religious content deemed offensive. The Xbox Game featured a level with a background sound effect featuring a passage from the Quran being repeated over and over. Since the majority of Muslims believe the Qur’an should be handled with the utmost respect as it is the literal word of God, there was considerable outrage among many groups for the perceived lack of deference afforded to the Qur’an in the Xbox game. To date, there has not been any evidence that Kakuto Chojin was re-released.

  • Animals

Animals can have different meanings between cultures and people from different cultures could have positive or negative feelings towards the same animal.

For example, when Apple launched their iPhone X with the new Animoji function, there is an introduction sentence in their US webpage "Reveal your inner panda, pig, or robot.[23]" The 3 animals are localized in China as panda, rabbit, robot[24]; As panda, robot, unicorn in Portugal[25]; and panda, monkey, robot in Egypt[26].

  • Colors

Colors mean different things to different people. They can have different connotations based on age, gender or even on the cultural symbolism they can acquire. For instance, younger consumers and women have a tendency towards bright and warm colors, whereas older consumers and men usually prefer darker and cooler colors. This is just a basic example to lead us towards the importance of establishing our target audience. Colors are also loaded with cultural meanings and sensitive to many different interpretations that need to be taken in consideration in website localization. For example, in Japan and China white is commonly associated with mourning and death while in Western cultures is usually considered pure and holy. In Europe and Western countries red is associated with danger, death and passion, while in China is good luck and happiness. In Africa certain colors represent different tribes. This doesn't mean that other cultures are not aware of the different meanings of colors or how are they used. However, it is important to know what they are representing in the product and what message that color choice is attempting to communicate.

Design for Language Switching

edit

Concepts

edit
  • Language fallback

Language fallback is a concept by which if a product user in a specified region does not have their settings to the locale that has been chosen for localization in that region, the product will automatically revert to, or fallback, to a default language. In many cases, this may be English for a global audience, but that is not necessarily the case. For example, countries and regions which have multiple official languages and/or regional languages may have the fallback language as the most widely spoken language. As a case in point, Spain could have up to 3-5 locale options for users depending on how specified a product's user base would be in that country. Using this example, if the product were being used in Catalunya, the likely default setting would be Catalan (Català), but a likely fallback would be Spanish.

Language fallback can also apply to situations where products are not completely localized to a specified market. In these situations, a product would be partially localized and the specified language may not support all features. Therefore, for the unsupported features, a default language fallback will either need to be selected by the user or will be automatically chosen by the product's settings. To illustrate this point, imagine that a product has features ABCD and the default product language is English. However, the user would like to use the product in Swedish, but Swedish is only currently supported for features ABC. The likely outcome to this situation would be that the user would be able to use features ABC that have been localized into Swedish and if they choose to use feature D, then the product will revert to the English version.

To further illustrate, one advantage and disadvantage to language fallback settings. If properly chosen, language fallback can help to keep the user engaged in the product by continuing to use the product's features even though it may not be in their preferred language. On the contrary, this is only relevant if the user has a high competency and comprehension of the fallback language chosen. If not, the outcome may lead to less customer satisfaction.

Since user satisfaction is a primary concern, a lot of attention is often paid to chasing the right language fallback to get the best possible engagement. Some popular blog site's such as Wordpress.org even offer plugins which can help to ensure that a proper fallback language is chosen. Other methods include adding rules to specify the process of falling back to other related dialects where a high level of mutual intelligibility would be apparent. American English and British English would be one common situation. Brazilian Portuguese to Continental Portuguese could also be an obvious transition.

Language Switching in Software

edit

Software developers use different solutions for detecting and displaying different languages. Common methods include detecting the OS region setting or the default language or simply allowing the user to choose what language they wish to install. Additionally, software may be localized to different languages but will require the user to install additional language files in order to enable them. Language switching interface buttons, usually seen on websites, are not common in software. Most solutions allow the user to change the display language of the software through a settings or preferences menu and may require a restart of the program before the new language can display.

For Windows OS, every locale, such as English (United States), corresponds to a language identifier in the registry[27]. Every supported language of the operating system has a unique identifier. Identifiers for the default system locale and the user-defined locale exist for use, as well. While programming their software for Windows, developers may use the Multilingual User Interface (MUI)[28] which enables support for localization of user interfaces and allows the application to retrieve these language identifiers.

Some older software is not coded with Unicode support. If the user tries to run this software with a mismatched locale setting, the text displayed within the program may become unreadable. This happens because Windows loads the code page for the user's locale and matches text bytes within the software to corresponding characters in the code page; if the proper character does not have an equivalent in the code page, it will not display properly or it will display a corresponding character in the current locale's code page but not a character that is used in a separate locale's code page. In order to solve this, the user must change their locale in the "Language for non-Unicode programs"[29] setting within Control Panel and restart their system to load the correct code page.

Language Switching on Websites

edit

The are many things to consider when designing a language switcher for a website, such as:

  • Number of available languages.

The number of available languages will influence the style of the language switcher to be chosen. It's always good to have an UI that requires the least number of clicks possible.

  • Fallback language, in case no default language is found.

It’s very important to programmatically set the current language of the webpage, even when is not possible to determine the user's language. The reason for that is so braille devices or screen readers can identify the language that should be used at the start point.

  • Graphic representation of the available languages.

The use of flags to represent a language might not be good depending on the languages available and the possibility of the addition of new languages.

  • Location of the language switcher on the page.

The language switcher has to be in a visible area, so that if any language selection error occurs and the user cannot read the current language, he/she can still find the switcher and change the language. The best approach is to follow the standard area where the language switcher is present in most of the international websites, which is on the top right. However, if your UI is good enough, you might be able to have the language switcher in a different place and still succeed.

  • Accessibility

The language switcher should use html tags that are made for listing options, so the screen reader is able to present the languages available to the user. Make use of ARIA attributes whenever necessary to make it easier for the user to find the language switcher.

Localization Strategy

edit

Picking markets

edit

Companies and organizations wanting to branch out to other regions in the world and localize their products should consider the following things:[1][23][22][22][22][22][22][22][22][22][22][22][22][22][22][22][21][21][21][21][19][19][19][19][16][16][16][15][11][7][7][7][7]

  • Total Online Population (TOP)
  • Share of TOP
  • World Online Wallet (WOW)
  • Share of WOW

One example of a market ready to break deeper into is the language-rich population of India.  Hindi and English are co-official languages of India.  Hindi, spoken more in the northern part of the country, has speakers totaling about 20% of the population.  In the southern part of the country, people who don’t speak Hindi will instead communicate in English or their local language.  English, at over 125 million speakers, is generally the language of India’s elite and since colonial times has been the language of government, education, technology, media, and science.  However, the Indian Constitution guarantees all Indian citizens the right to express themselves to any government agency in their own tongue, and individual states have the right to adopt any regional language as a language of state government and education. No one single language is spoken across the subcontinent, or even within one state. 

India is home to more than one billion people and over 1,500 mother languages and, additionally, several hundred dialects.  In 2001, census data showed that 29 of these languages are spoken by over one million citizens each, and 60 of the total languages are spoken by over 100,000 citizens each.  The Indian government only uses 23 of these languages officially in settings such as government, hospitals, and business. 

This tells us that there are millions of Indian citizens, who speak neither English, Hindi or the other 21 official languages, whom multinational business have not been able to reach. 

People all over the world are hungry to have access to the internet and instant communication in the language they feel most secure in.  With the reach of technology, such as smart phones, into more rural areas, for example, there is a much higher potential to close the gap between international companies and millions of new potential clients.  Samsung and a handful of other companies have already begun making mobile devices that support 22 regional Indian languages.  Localizing websites, mobile devices, and apps into more Indian languages will open the door to generating more profit in India’s untapped markets.  

edit
European Union – medical device information
edit

The medical device information on the product label and instructions for use must be available in the national language(s) of the final user. Refer to European Union Council Directive 93/42/EEC, Article 4(4).

French language laws (Canada, France)
edit

The Charter of the French Language, also known as Bill 101, is a 1977 law in the province of Quebec in Canada that defines French as the only official language of Quebec and frames fundamental language rights of all Quebecois. The provincial government body responsible for enforcing the Charter is the Office québécois de la langue française (OQLF). The law stipulates that product labels, their instructions, manuals and warranty certificates as well as public signs, posters and commercial advertising must be in French. If a sign is bilingual or if separate signs are used for different languages, the French text must be predominant (e.g. the French text is twice as big as the other language and/or there are twice as many signs in French). The Charter also regulates the use of the French language in business and commerce. For example, software used by employees must be available in French, unless no French version exists. Similarly, on September 10, 2007, the OQLF and the Entertainment Software Association of Canada announced a new agreement regarding the distribution of video games in the province of Quebec:

  • Since Sept. 10, 2007, the packaging and instructions of any new video game sold in Quebec must be in French.
  • Since Oct. 1st, 2007, any new computer software must be available in French if a French version exists elsewhere in the world.
  • Since April 1st, 2009, any new generation console video game (Microsoft Xbox 360, Nintendo Wii, Nintendo DS, Sony PlayStation 3, Sony PSP and any newer model) must be available in French if a French version exists elsewhere in the world.
  • If the French and English versions are available separately, any retailer wanting to sell or rent the English version must also offer the French version.
  • If no French version exists, the English version may be sold only if the packaging and instructions are in French.


The Toubon law of 1994 is a French law mandating the use of the French language in several areas, among them official government publications, advertisements, commercial contracts, government-financed schools or the work place. Since it stipulates that "any document that contains obligations for the employee or provisions whose knowledge is necessary for the performance of one's work must be written in French", software developed outside of France must have its user interface and instruction manuals translated into French.

Turkish consumer protection law
edit

Consumer Protection Law No. 6502 regulates sales to consumers over the internet and other digital platforms and defines the rules of advertisement to protect consumers. The Regulation for Distance Contracts also aims to protect consumer rights in e-commerce transactions.[30]

Simplied Chinese language regulations (Mainland China)
edit

Organizations of People’s Republic of China, such as Translators Association of China (TAC) and Standardization Administration of China (SAC), are responsible for establishing uniform standards for language services. Belows are major regulations that have been guiding the translation and localization industry:

  • Standardization Administration of the People's Republic of China, (2016). ZYF 001-2016 Quality Evaluations Code for Localization Translation and DTP. Beijing: Standards Press of China
  • Standardization Administration of the People's Republic of China. (2005). GB/T 19682—2005 Target Text Quality Requirements for Translation Services. Beijing: Standards Press of China.
  • Translators Association of China. (2016). Localization for Beginners. Beijing: Standards Press of China.
  • Translators Association of China. (2016). 2017-2021 China Language Service Industry Development Plan. Beijing: Standards Press of China.
  • General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China. (2008). GB/T 19363.1—2008 Specification for Translation Service Part 1: Translation. Beijing: Standards Press of China.
  • General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China. (2006). GB/T 19363.2—2006 Specification for Translation Service Part 2: Interpretation. Beijing: Standards Press of China.
  • General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China. (2005). GB/T 17532-2005 Terminology Work--Computer Applications—Vocabulary. Beijing: Standards Press of China.
  • General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China. (2011). GB/T 15834—2011 General Rules for Punctuation. Beijing: Standards Press of China.


Language Access in The United States
edit

The United States is made up of an extremely diverse population. Although the official language of the country is English, some of its immigrant population knows little to no English. This makes it harder for this population to access services. Due to this fact, in 2000 President Bill Clinton signed the Executive Order 13166, “Improving Access to Services for Persons with Limited English Proficiency." The Executive Order states: “…requires Federal agencies to examine the services they provide, identify any need for services to those with limited English proficiency (LEP), and develop and implement a system to provide those services so LEP persons can have meaningful access to them.” Furthermore, the Executive Order also requires that any recipient of federal funding provide access to their limited English proficiency people. Due to this Executive Order, the need for translators and interpreters increased in order to adhere to this mandate especially in government, health and education services. These organizations could focus their translations and interpretations on the population of LEP they serve. According to the Census Bureau reports, data collected from 2009 to 2013, 350 different languages are spoken at home in the U.S. The 3 most spoken languages are Spanish, Chinese (Cantonese, Mandarin, other varieties) and French. For more information regarding the Executive Order 13166, you can visit: https://www.justice.gov/crt/executive-order-13166. For more information regarding the language information from the U.S Census Bureau, you can visit: https://www.census.gov/newsroom/press-releases/2015/cb15-185.html.

Data Localization

edit

Data localization refers to the set of laws enacted by a state or country that requires foreign companies to store citizens’ data within its borders. This means that data of the specific nation’s citizens has to be collected, processed, and stored inside the country, before being transferred internationally, and transferred only after meeting local privacy or data protection laws. Such regulations impact email communication, personal records, and social media services.

Data localization derives from the concept of data sovereignty which requires that records about a nation's citizens or residents follow its personal or financial data processing laws, however, data localization goes a step further and requires that initial collection, processing, and storage of data occur first within the national boundaries of the particular country enacting the law.

For example, Russia, China, Indonesia, and others have enacted economy-wide localization policies that require data to be stored on servers within their respective borders, while other countries, such as Australia, Germany, South Korea, and Venezuela, have enacted industry-specific laws that require certain financial, health and medical information, online publishing, and telecommunications data collected from their citizens to be stored on local servers.[31]

In terms of localizing an internet-based product and/or service, data localization is a key factor to consider when drafting a localization strategy. Localizing into a country with data localization policies will imply larger IT investment and stringent security measures for data related to business operations.


Data Localization Policies Around the World

Currently, there are at least 34 different countries with data localization policies, with China featuring a dozen of them, plus major countries, such as Russia, Indonesia, and Vietnam.

To see a list of data localization measures around the world, see the Technology Industry Council’s Snapshot of Data Localization Measures[32]

The aforementioned document captures most of the world’s data-localization policies. The entries with icons show where countries have enacted and implemented data localization policies targeting specific types of data. Other entries cover cases where countries have proposed, but not enacted, data localization policies or provide context for data-related policies, such as in the European Union. The list shows that data localization comes in many forms: while some countries enact blanket bans on data transfers, many are sector specific, and others target specific processes or services. One of the basic problems for companies complying with data localization laws is the difficulty in determining which categories of data need to be locally stored and which can be moved abroad.[33]


Future of Data Localization

For the detractors of data localization, adding restrictions on how and where data is stored or transferred, poses a fundamental threat to the free flow of information across borders and the maintenance of global supply chains. As cross-border trade increasingly moves towards e-commerce and relies on the use of internet technologies such as cloud computing and big data, data localization policies pose a major threat to the economy and businesses’ bottom line. In addition, privacy and security suffer as companies are forced to store data in a way that is not the most efficient or effective and in most cases data security results affected, which is often the officially stated purpose of this type of measure.

The reality is that data localization laws are here to stay. As companies invest in compliance and governments without these laws see the short-term benefits that accrue to the localizing government in the form of increased access to data and a boost to the local economy, more nations may want to get in the localization game.[34]


Localization Economics

edit

Terms and Concepts

edit
  • Localization ROI

Localization ROI, or return on investment refers to the calculation of the benefits or outcome gained as a result of spending money or allocating resources. For instance, it can mean the money earned back in sales resulting from the money a company has invested in a localization program. The return can also refer to increased brand awareness, expanded market share, growth in foreign visitor traffic, etc. Corporate executives often use ROI as a key indicator of global value for their firm, and localization managers would often use ROI to demonstrate the business value of localization. [35].

The dynamics of market development plays an important role in localization ROI. As a company progresses from market entry to market expansion and maturity, the localization ROI is likely to pick up. The more mature a product in one country, the higher return on localization investment is expected to be. For instance, if a product has been released in Japan for six years, and consistently available in updated, localized versions for four years, the ROI for localizing the next Japanese release is likely to increase. One reason is that the by leveraging the Translation Memory (TM) after a first release, the efforts and cost of localization will reduce; another reason is that the sales channel, brand awareness, and installed base have been established in the target market already. Therefore, one could use separate benchmarks for different market stage. [36].

  • e-GDP
  • Economic significance of a language
  • Politically-driven localization
  • Localization based on humanitarian grounds

Localization of a product or services driven by the wish to improve the quality of life of people in a less developed economy rather than to pursue economic benefit. Even simple access to translated information has a positive impact on health and could prevent deaths.[37]

  • Profit Margin (LSP Only)

One of the key economic indicators that LSPs measure is the profit margin of a project. The profit margin is defined as the net sales less the cost of goods sold (COGS) divided by net sales. [38] The net sales is the total revenue generated from the project less any allowances or discounts. [39] The COGS is the cost of external labor directly used to create the service. [40] For LSPs the COGS would represent all fees paid to external vendors who worked on the project which could include: translators, editors/reviewers, desktop publishers, quality assurance testing, voice over artists, video editors, etc. Profit margin is typically calculated on a project by project basis. Project Management applications may calculate profit margin at the inception of a project when vendor purchase orders are created against the quote. If additional vendor purchase orders are required during the life of the project then the profit margin will be reduced unless the LSP issues a change order to the client. An LSP may have a target profit margin per client depending on the nature of the work. For example higher volume clients that represent a larger percentage of net sales for the LSP and request large projects with high word counts may have a lower target profit margin per project than lower volume clients that request smaller projects less frequently. Volume discounts can either be initiated by the LSP or requested by the client.

  • Operating Profit Margin (LSP Only)

The Operating Profit Margin is defined as Operating Earnings divided by Revenue. [41] Operating earnings is defined as Net Sales less COGS less General & Administrative Costs. [42] For the sake of simplicity let’s define General & Administrative costs as LSP Employee wages for the purpose of this article. While calculating Operating Profit Margin on a company-wide level is straightforward, calculating this metric on a per project level is challenging as it requires the accurate measurement of LSP employee(s) time on the project. In lieu of tracking internal project time, some LSPs may apply a flat Project Management fee per project to offset internal project cost. The problem with this approach is that if internal time is not measured on the project then it cannot be known to what extent the PM fee covers internal cost. It raises the question, did the PM fee on the project cover, exceed or fall short of the time the LSP employee(s) spent on the project? Some Project Management applications like XTRF allow the LSP employee(s) to input their time on each project. The problem with this approach is that the LSP employee is required to manually track their time on each project which can be difficult when juggling multiple projects on a daily basis. By measuring LSP Employee time on a per project level, LSP Management will have a clearer picture of the Operating Profit margin per project. This metric can answer questions like: Did we work efficiently on that project? Why or why not?

Localization

edit

Pioneers of Machine Translation

Franco-Armenian Georges Artsouni conducted some of the first experiments with machine translation. In 1933 he designed a storage mechanism out of paper, with which for any word the equivalent in another language could be found and which thus could be used as a bilingual dictionary. A patent was filed for this device, and apparently in 1937 a prototype was presented.

Russian Petr Smirnov-Troyanskii applied for a patent in 1933. His mechanism used a bilingual dictionary and a method to correlate grammatical rules in different languages. The translation process was divided into three phases: transformation of the source text into a logical form based on the source language; transformation of this logical form into a second logical form based on the target language; and transformation of this second logical form into a text in the target language.

Englishman Andrew Donald Booth (1918–2009) was a crystallographer but researched the mechanization of a bilingual dictionary at Birkbeck College (London), for which he was funded in 1947 by the Rockefeller Foundation. He also helped develop mechanical calculators.

Englishman Richard Hook Richens (1919–1984) was a botanist at Cambridge but conducted research on machine translation. In 1956 he invented the first semantic network for computers, which served a pivot language for machine translation.

American Warren Weaver (1894–1978) was a mathematician and is often referred to as the father of machine translation. He regarded Russian as a code “in some strange symbols” that just needs to be decoded. In 1947 he presented a series of essays on machine translation, a memorandum, entitled simply “Translation,” in which he made four basic assumptions to overcome a simplistic word-by-word approach: 1) the translation must be performed within context, 2) there is a logical component in the language, 3) cryptographic methods are possibly applicable and 4) there may be linguistic universals.

Localization by type

edit

We usually distinguish between software (SW) and user assistance (UA) localization (see Localization/UA). These two deliverable types are further defined below. As the creation process and the workflows of both deliverables are becoming more and more connected, we see the boundaries between UA and SW become blurred in many areas. Many localization areas, processes and tools can meanwhile be applied for both UA and SW.

Software (SW) localization is the process of adopting software to meet the language, cultural, and other requirements of a specific target market. When the software is localized effectively, the end-user will experience a seamless and native feel of the product. In the past, software is released annually or semi-annually, but in today’s world, it is required to be released as quickly as it can with all language simultaneously.

1. Tasks

Common localization tasks:

- MT
- AT
- Terms
- Dictionary
- Verification
- File Parser
- Commenting
- String Status
- Review/Approval
- DTP

To increase productivity and consistency, and to work collaboratively with stakeholders, CAT tools and TMS are used as tools for localization tasks.

2. Issues

Common localizability issues:

- Hardcoded text
- Overlocalization
- String dependencies
- Resource constraints
- Strings concatenation
- Usage of variables
- UI layout

Hardcoded text is the source language text that appears in the UI. It can be avoided by code review or Pseudo Localization. On the other hand, Overtranslation is to translate the text that should remain in its original language. For example, overtranslating path names and directories can lead to some of the product not functioning. String dependencies, Resource constraints, and String concatenation can lead to the product not working properly and can be avoided by product design. UI layout can cause the translated text to be too long (i.e. German) and cause a truncation.

UA localization refers to the translation and adaptation of the content that accompanies a particular program or application. This can include printed materials (in-the-box instructions), but most commonly refers to internet-based user help files, API documentation, etc.[43]. The reasons for localizing UA include meeting legal requirements, reducing customer support requests, and providing a detailed documentation of a program's functionality for advanced users.

Pre-handoff

edit

Typically, UA localization only begins after the software itself (i.e. the user interface) has been localized. This is to ensure that references to UI elements in the documentation match with the names of those elements in the actual interface. Additionally, it is helpful to include illustrative screenshots in the localized UA (just as you would in the development version), so a localized build should ideally be available so that locale-specific screenshots can be inserted into the localized UA.

Before translation begins, the translator should ideally receive a termbase, style guide, and TM(s) containing legacy translations from past iterations of the project (if applicable)(see 4.7). UA is typically stored in a CMS as files in a markup language like .xml (Extensible Markup Language) or .dita (Darwin Information Typing Architecture and organized by locale. Source language documents that have changed since the previous update are gathered together and packaged as a dual-language .xliff (XML localization Interface File Format) to be handed off to the translator along with the rest of the localization kit (see 4.10). Recyclable content from past .xliff versions and/or .tmx files (w:Translation memories in TMX format|Translation memories in TMX format]]) is also typically leveraged before the kit is handed off.

UA localization process
edit

UA localization is performed in a Translation Environment Tool (also known as a CAT, or Computer Assisted Translation tool) (see 4.12). With these tools, the translator is better able to maintain stylistic and terminological consistency with previous versions of the UA. It is particularly important to make sure that all terms are consistent and up-to-date during the translation process, so the translator may additionally want to use a term extraction tool before starting the actual translation in order to clarify any ambiguities ahead of time. As the translator works through the text, they can leverage "fuzzy matches" (i.e. text segments that partially match previously translated content) in order to ensure consistency and speed up the localization process.

Post-handoff
edit

When the translator has finished processing the UA text, they perform QA checks in their CAT tool, approve the segments they translated, and hand back the completed .xliff to the PM or loc engineer. Once the files are checked over, the locale-specific .xml files can then be committed in the CMS and the updated, localized UA can be published.

Localization operations: Models

edit

In-House

edit

For the In-House model, the localization process is handled within the company by employees (as opposed to an LSP or individual contractors) from start to finish. Advantages include speed of information transfer (no bottlenecks) and better communication between the content writers, developers and translators. Disadvantages include a possible increase in overhead and lack of scalability. [44]

Outsourcing

edit

In the outsourcing model, the localization processes are outsourced to a third party in their entirety. Small and middle-sized companies might be the ones who have the greatest interest in this model, especially if they have no prior experience in localization and its practices. For them, outsourcing localization to a third party means being able to cut the costs of tools, education, hiring localization experts and translators, licensing and training which otherwise might override the costs of localization itself. The right vendor will be able to ensure better control and quality of the final product because of the availability of resources. A more detailed description of pros and cons of this model can be found at: Pros and cons However, the outsourcing company should assign the role of a subject matter expert (SME) to one of their employees who will be a point of contact for the vendor. [45]

Hybrid in-house/outsourcing

edit

This model supports localization through a combination of in-house resources and third party outsourcing. In this model you can make use of internal resources to perform pre-localization tasks (e.g., term mining/translation to build/update a term database for the third party translator) and to serve as subject matter experts (SMEs) for both the subject matter. While a 100% in-house model has overhead and scalability constraints, this combination model can allow you to scale to support increased scope while retaining a smaller pool of in-house resources for specialized tasks (terminology, style guides, linguistic QA) and to address high priority/short time-line projects.

Community Localization

edit

Community Localization is the act to taking job traditionally performed by professional translators and outsourcing it to a preexisting community of partners, end users, and volunteers. This may sometimes be confused with collaborative localization, which is the act of assigning translation to a team of translators using an online translation platform with centralized and shared translation memory to speed up translation by using internet-based translation technology.

If confidentiality is essential, for example in the case of a new product release, community translation is not a good solution. It is not possible and reasonable to expect community members to adhere to a non-disclosure agreement. In this case, professional translation or, if possible, in-house translation is the way to go.

The linguistic quality of community translated content will not equal that of content translated by professionals, although meaning typically will be translated correctly if the community is well-matched with the content. Quality can be increased by using glossaries developed in-house or by professional translators, using peer review and a mechanism to flag totally inappropriate translation, and implementing a separate review phase in the workflow carried out by in-house specialists, professional translators, or a very experienced subset of the community.

There are differences in which languages happens to be successful at doing community translation. Some smaller languages are dying to help you translation, so don’t overlook anyone. Being able to access a lot of these smaller pockets of users quickly can be an effective strategy for some.

An important feature of a successful community localization program is the quality of its moderation team. The best moderators are not those who provided the best translations, but those who voted up the best translation. Moderators’ important job is to actively find “broken windows” in the community to repair—things such as putting out grammar wars, organizing translatathons, glossary term discussions, etc. As a moderator, you have to act as a control and monitoring tool for the health of each community, making sure it has what t needs to be active and productive.

Major Challenges with Community Localization
edit
  1. Finding communities—Most examples of community localization involve communities that were already involved with the organization ways other than translation: end users, partners or in-house personnel, or volunteer members. Creating end user or partner communities from scratch requires substantial work and support from experienced consultants. It is also possible to enlist help through an open call to translator marketplaces such as Mechanical Turk, Craig’s List, or oDesk. However, when using such an open call to translation marketplaces, it will be necessary to pay the translators unless the translation is for a charity or other non-profit organization with a perceived substantial social benefit.
  2. Matching the content with the communities—Community members are most often not linguists; instead, they are bilinguals with specific domain knowledge. The selected community needs to have both the capacity and the ability (domain knowledge) to translate the content. For some content, such as sales and marketing materials for a commercial enterprise, it will be very difficult to find communities with the ability to translate the content and the motivation to do so.
  3. Aligning community objectives and organization objectives—Most of the community management effort will go toward motivating the communities. Aligning the goals of the community with those of the organization seeking to use community translation is critical tot eh success of community translation. If there is no alignment, communities will falter or, worse, the effort will be perceived as exploitation by the organization. Community localization is not a means to get free translation for the organization’s content. Attempts by an organization to replace its translation service providers and professional translator with free community translation may have quality problems, and in some cases may fail and be criticized. Often when using semi-professional translators through one of the translation marketplaces, it becomes a trade-off between linguistic quality and cost.
  4. Identifying project management controls—Any project manager involved with community translation will wonder how to control deadlines, confidentiality, and quality. Control over deadlines requires a community translation platform that provides a means to set up deadlines and to report progress on a granular level. If missing a deadline becomes likely, the organization can then switch to professional translators.
Common Misconceptions about Community Translation
edit
  1. Quality will suffer—In reality, the same pitfalls that apply to working with vendors with translations, also apply to working with community localization. Translations should be done in dialog with the initial creators of the content to ensure its meaning is properly interpreted.
  2. The speed at which translations can be completed are not able to be set—In reality, having a good moderation program and keeping a good enough pace is possible by steering users to content with the most priority.
  3. Community localization is cheaper than hiring professional translator—In reality, successful community localization programs costs are even or larger than contracting with a LSP, because salaries, servers, etc. can result in big monthly fees you have to pay even when you do not have a new language or new features to launch.

Crowdsourcing

edit

In brief, crowdsourcing is when an individual, group, or company publishes the user interface strings of their website or product so that anyone with Internet access can help translate/localize the material. Since 2006, crowdsourcing translation and localization has become extremely popular. Some of the most popular websites and products available today were translated through crowdsourcing, including Facebook, Twitter, LinkedIn, Minecraft, Khan Academy, and TED. The participants range anywhere from five to 450 thousand.

Crowdsourcing is efficient for several reasons:

  1. Little to no linguistic limitations — With crowdsourcing, the amount of languages for which a product can be translated/localized is only limited by the demographic of the users. This is especially beneficial for a product that is already popular world-wide.
  2. Near-immediate results — Because of the vast amount of people who typically participate in crowdsource translation/localization, results often come back almost instantaneously, depending on the size of the source text and amount of translation proofreading that is required.
  3. Financially efficient — One of the more obvious and perhaps greatest benefits of using crowdsourcing is that the low-cost, low-maintenance nature of building (or implementing) a crowdsourcing platform makes it the cheapest option for a company to translated hundreds or thousands of strings.

Despite all the advantages, there are some challenges that should be considered:

  1. Technological limitations — People who are not internet or technology-savvy, and even those who don't have convenient access to the internet contribute very little to crowdsourcing. Because of this, some important languages or dialects can be left out from the results. Companies also need to consider the differences that time-zones can play in the release translated materials.
  2. Variable Quality — If efforts in coordinating the quality control of submitted translations aren't taken very seriously, then the hobbyist translators who most often participate in crowdsourcing may end up harming the product's quality with their unprofessionalism.
  3. Motivation — Because crowdsourcing is done by volunteers for free, keeping the translators/localizers motivated is extremely important. Many companies inspire their volunteers through by gamifying the process and even offering rewards to top, or even all contributors.
  4. Control — Managing a group of hundreds or even thousands can be very difficult. The organization of the crowdsourcing platform and its users needs to be carefully considered and executed.


Challenges of Localizing into African Languages
edit
  1. An absence of linguist and cultural equivalencies- Many speakers of indigenous African languages also use European languages such as English, French etc. This has therefore lead many foreign terms especially in relation to technology to lack equivalents in the native tongues which are rarely used in technological contexts. New vocabulary therefore does not develop for such words in the indigenous languages and most end up just using loanwords.
  2. Inadequate communication infrastructure makes if difficult to find and communicate dependably with language experts which prolongs the localization process. It also makes it difficult for translators of indigenous languages to network with each other.
  3. Inadequate resources- Since secondary languages are spoken in most areas in Africa, there is a lack of written resources for indigenous languages such as dictionaries and term-bases. This makes it difficult to accurately localize into these languages.
  4. Insufficient goodwill to maintain indigenous languages-Since the field of localization is relatively new and most countries have a secondary languages any way, not a lot of emphasis or resources are channeled towards training translators by neither the government nor educational institutions. This limits the capacity to localize into indigenous languages even for companies that are willing to do so.

In-Country Review Process: General Review Guidelines (LSPs)

edit

The primary goal of in-country reviews (ICRs) is to ensure that original content has been translated accurately and according to the requirements of the requester. As such, the ICR process represents a fundamental quality assurance milestone to produce high-quality and consistent product information across all target languages. To achieve this essential quality goal, client reviewers must be native speakers of the respective target language and be familiar with the subject matter as well as product. Equally important, the translation must meet the tone, style, and nuances of the specific target locale to appeal to the regional target audience where appropriate. In addition, ICRs provide valuable feedback for establishing and maintaining company-specific terminology that would otherwise not get formally captured and documented. The feedback also enables LSPs to maintain up-to-date terminology glossaries and translation memories as client requirements change. This in turn has a direct impact on the quality and consistency of translations. Likewise, up-to-date and complete translation memories and glossaries improve time to market and reduce overall cost by allowing greater reuse of translation assets for new or revised information. The ICR process should not involve a conceptual review (validation) of the original information (source content). Any conceptual review must occur before the actual ICR, typically during the creation process of the original content (source content). Thus, the ICR is not designed to validate original content, which typically includes legal, regulatory, compliance, and marketing reviews. This means that in-country reviewers should focus their attention on translation and preferred-terminology issues. In fact, changing the content in the target languages can bear a risk for clients because of potential off-label use. Assigned reviewers should therefore not change, remove, or add content in the target language if the meaning deviates from the original content. In addition, the in-country review is not a copy-editing task. That is, reviewers should avoid introducing preferential terminology and rewriting translated content. Preferential changes lead to unnecessary delays and additional effort as well as cost. Preferential changes should be limited and only be introduced if such preferred terminology is clearly defined to allow consistent translations from project to project.

In summary, the main objectives of the ICR process are:

  • Perform a linguistic verification of translation accuracy.
  • Assess the overall quality and consistency of translations.
  • Satisfy translation requirements and expectations set by requester.
  • Meet the tone, style, and nuances appropriate for the regional locale and target audience (e.g., consumer vs. professional).
  • Provide feedback for establishing and maintaining up-to-date and complete translation assets, including translation memories and company-specific terminology.

Localization projects: General workflows

edit

Pre-translation engineering work

edit

In this stage, the loc team will review the UI elements and strings trying to identify possible issues that could happen during the translation process (i.e. hard-coded strings, concatenation, text expansion, etc.) As part of this reviewing process, loc team needs to extract the strings from the source files and run a pseudo-localization test. This will help to identify possible bugs in usability and functionality. Once defined that the project is ready for localization, it's time to provide context for the linguist. To do so, loc team will make comments on the segments and UI elements, listing not translatable items and defining key terms like acronyms, names, product names, titles or other specific terminology. The next step is to import the source files to a CAT tool to find out how much of the translated strings are already available in the Translation Memory. With that, the team will have an idea of the word count for this project and how much it would cost to translate. Then, the team will prepare a translation request indicating the scope of the project, the timeline, target languages, additional context information and send out the source files. Once the Project Manager on the vendor side receives the translation request, (s)he will do the preparation work and deliver the files to the translators or linguists. Then, finally the translation process will begin.

Pre-translation QA work

edit

Pseudo-Localization

edit

Pseudo-localization is considered part of the internationalization testing process, and is a way to simulate language characteristics while maintaining the readability of the UI, during the design or development phase. No linguistic skills are required to test with pseudo-localization and developers and testers can use it to detect and correct a host of i18n/l10n issues, earlier in the development and testing phases for new features. By moving the i18n/l10n bugs detection and correction earlier upper stream we significantly impact development time and cost savings for localization. Pseudo localization works by transforming the resource strings in a way that simulates the characteristics of foreign languages and at the same time the transformation preserves the readability of the UI and messages. It involves localizing your product into an artificial language that includes target language characters but is still readable by your testers.This is ideally done using an automated or semi-automated process. Here are some common language metrics for Pseudoloc.

  • French, Spanish, Portuguese, Italian, Arabic, Hebrew and Polish text can expand between 15% and 30%.
  • German and Dutch text can expand 35% or more.
  • Chinese, Japanese or Korean text can contract 30% to 55%, but some of these languages don't use spaces and that can also cause text expansion.

There are various pseudo localization patterns and tools out there. Android and iOS platforms have added platform level support for pseudo localization features for developers. There are also Web browser extensions to support pseudo localization. These platforms are using the open source Google pseudo localization tool1. Detectable Issues and Best Practices Pseudo-localization makes it possible to perform the following quality checks during localization-readiness testing:

  • Validation of the completeness and usability of the Localization Kit (files and documentation).
  • Validation that the pseudo-localized software builds successfully.
  • Validation that target language characters display correctly.
  • Validation that screen text strings are not concatenated from fragments.
  • Validation that screen layout accommodates expanded localized strings.
  • Validation that non-localizable software resources are identified and documented.

Guidelines for pseudo-localization:

  • Simulate the localization process (documented in the Localization Kit) as closely and completely as possible: pseudo-localize text and graphics, change fonts, date formats, etc.
  • Simulate the localizer’s environment as closely as possible. Perform the pseudo-localization on a “clean” machine, not on a development machine.
  • Include target-language characters in the pseudo-localized strings.
  • Choose target language characters that are most likely to be problematic. (The Generic Internationalization / Localization Issues list below provides some examples.)
  • Make string boundaries (beginning and end) obvious so string concatenation will be apparent during testing.
  • Increase the length of pseudo-localized strings to simulate what often happens during translation.
  • Insert target language characters at the beginning of strings, at the end of strings, and around string separators such as tabs and newlines.This makes it easy to distinguish localizable strings from non-localizable strings during testing.
  • Re-order the arguments in at least a sampling of formatted messages, and test that these messages then appear correctly.
  • During testing, you may discover software resources that, in fact, must remain as-is for the software to function correctly (hopefully these are not visible to the user). As you discover them, be sure to identify these strings, preferably using comments in the resource files, or in the Localization Kit documentation.

Take the example of Google pseudo localization patterns, it contains the following features to achieve specific purposes:

Transformation Purpose Dev Best Practices Correct Behavior
Added [ and ] to indicate the start and end of the string. To detect:

1) String concatenation issues. 2) String truncation.

Use complete sentence, with runtime argument placeholder if needed:

This title can only be downloaded 1 more time before {date}.

[Ţĥîš ţîţļé çåñ öñļý ƀé ðöŵñļöåðéð ① ɱöŕé ţîɱé ƀéƒöŕé {date}· one two three four five six]
Character replacement with letters outside of ASCII and/or Latin1 range To Detect

1) Character Encoding Issues. 2) Font and Display Issues.

1) Use UTF-8 as the storing and display encoding and use UTF-16 as the encoding during data transition.

2) Use the proper font and right font size for the best visual on devices.

Characters should be displayed correctly, should not have any tofu squares or question marks. Should not have vertical clipping or overlapping.
String length expansion (usually by 30%) Detect horizontal truncation or overlapping issues for world’s languages. Design and develop with language string length expansion in mind. Allow longer string to wrap instead of being truncated. When line wraps, allow enough vertical space in between lines. Strings should not be truncated or overlap with other UI elements.
Indicators >> and << around value passed by runtime To give a visual hint that string inside >> and << comes from runtime value. Make sure the source of the runtime argument is also localized if needs be. Value inside >> and << should contain cultural appropriate string as well.
Special Right-To-Left simulations Detect RTL layout issues and Bidi issues Make sure to:

1. Set layout to RTL UI 2. Set paragraph direction to rtl

Should not have layout or Bidi Issues.

Pseudo Localization Available Android https://androidbycode.wordpress.com/2015/04/19/pseudo-localization-testing-in-android/

iOS Devices https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/TestingYourInternationalApp/TestingYourInternationalApp.html

Websites https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/TestingYourInternationalApp/TestingYourInternationalApp.html

Google https://code.google.com/archive/p/pseudolocalization-tool/downloads

Pseudo-localization is most valuable when your product has not been localized before. It is also important when localizing to a language that is unlike any currently supported language (for example, the first time localizing for a language that uses multibyte characters).As your product matures (from an internationalization / local- ization perspective) you will probably scale back your use of pseudo-localization.

References http://blog.globalizationpartners.com/what-is-pseudo-localization.aspx https://www.sajan.com/pseudo-localization-101-localizability-testing-for-software-and-websites/ https://en.wikipedia.org/wiki/Pseudolocalization

For more see Wikipedia article on Pseudolocalization

Pre-translation language work

edit

Style guides

edit

Every language has specific linguistic and grammatical rules.

In addition to grammatical and linguistic rules per language, style guides can include universal guidelines about how you would like certain items to be addressed in your documents such as: 1) what the tone of language should be on marketing documents versus technical documents; 2) how to format items like UI terms, dates, footnotes that may appear in localized content; 3) what fonts can be used.

Some companies publicly provide style guides:

Terminology work

edit

Acronyms, synonyms, homonyms, abbreviations and product-specific terminology are frustrating puzzles that translators often need to decipher. Without the proper resources, simply trying to understand the appropriate definitions of these sort of terms can become a difficult and time-consuming task that, despite a large amount of effort, can still result in misunderstandings and mistranslations. Terminology management is the process of selecting, defining, organizing, storing and maintaining product terms, or designators.[46] With effective terminology management, accurate and efficient translations can be achieved.

A terminology database, or a termbase, is a central repository which stores and manages the chosen designators for both source and target languages. A termbase is an organized and flexible way of collecting and managing designators. It is very useful to share the termbase with everyone engaged in the localization project, as well as those that are creating source files to be localized. Typically, the termbase can be provided either by the client company or created by the LSP.

When selecting designators to add to the termbase, it is important to truly evaluate which terms will be beneficial to store and manage. If termbases are too large or too small, contain too much information or not enough, there is the risk of the termbase being impractical or unhelpful. When selecting terms, there are a number of things worth considering. Two examples would be –

  • Occurrence & Distribution: If a term appears throughout multiple projects and/or is a high-frequency term, it may be worth including this in the termbase so to assure consistent translations.
  • Ambiguity & Uncertainty: If a term’s usage is not clear, or it encapsulates a difficult concept, it may be worth including this in the termbase so to assure accurate translations.

A company’s termbase should be consistently maintained, as source terminology may be removed, added or adjusted throughout the product lifecycle. Similarly, since target terminology may change during the localization cycle, the list may need updates during a localization project.

Terminological candidates can be extracted from existing corpus by a variety of methods depending on the content. Users can potentially utilize Word and Excel, or they can use an assortment of tools which automatically scan content and extract terminology candidates.[47] While automatic extraction tools are expedient, simply relying on these methods may result in the accidental exclusion of designators that may be have been selected via a manual selection process.

Punctuation Style guide

edit

The marks, such as full stop, comma, and brackets, used in writing to separate sentences and their elements and to clarify meaning. Punctuation style guide is usually included in translation style guides.

English Russian French (France) Italian German Spanish Portuguese (Portugal) Japanese Chinese Simplified Chinese Traditional (Taiwan) Korean Arabic
Cases? Yes Yes Yes Yes Yes Yes Yes No No No No No
Quotation marks
- Primary “TEXT” «TEXT» « TEXT » "TEXT" „TEXT“ “TEXT” “TEXT” 「TEXT」 “TEXT” 「TEXT」 “TEXT” "TEXT"
- Nested ‘TEXT’ „TEXT“ "TEXT" 'TEXT' ‚TEXT‘ ‘TEXT’ ‘TEXT’ 『TEXT』 ‘TEXT’ 『TEXT』 ‘TEXT’ n/a
Parenthesis (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT) (TEXT)
Question mark TEXT? TEXT? TEXT ? TEXT? TEXT? ¿TEXT? TEXT? TEXT? TEXT? TEXT? TEXT? TEXT؟
Exclamation mark TEXT! TEXT! TEXT ! TEXT! TEXT! ¡TEXT! TEXT! TEXT! TEXT! TEXT! TEXT! TEXT!
Colon TEXT: TEXT: TEXT : TEXT: TEXT: TEXT: TEXT: - TEXT:

- 14:00

- TEXT:

- 14:00

- TEXT:

- 14:00

TEXT: TEXT:
Semicolon TEXT; TEXT; TEXT ; TEXT; TEXT; TEXT; TEXT; TEXT; TEXT; TEXT; TEXT; TEXT؛
Period TEXT. TEXT. TEXT. TEXT. TEXT. TEXT. TEXT. TEXT。 TEXT。 TEXT。 TEXT. TEXT.
Comma TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT、 - TEXT、

- TEXT,

- TEXT、

- TEXT,

TEXT, ،TEXT
Apostrophe TEXT’s TEXT’s TEXT’s TEXT's TEXT’s TEXT’s TEXT's TEXT’s TEXT’s TEXT’s n/a n/a
Percent 100 % 100 % 100 % 100% 100 % 100% 100% 100% 100% 100% 100% 100%
Decimal separator Period Comma Comma Comma Comma Comma Comma Period Period Period Period Period
10.54 10,54 10,54 10,54 10,54 10,54 10,54 10.54 10.54 10.54 10.54 10.54
Thousand separator Comma Non-breaking space Non-breaking space Period Period Period Period Comma Comma Comma Comma None
100,540 100 540 100 540 100.540 100.540 100.540 100.540 100,540 100,540 100,540 100,540 100540
In 1987 my wife, Nancy, was pleading with me to send out my resume and get a “real” job. She wasn’t too convinced that my business idea of creating graphics on my brand new MacPlus 512k personal computer would ever take off. В большинстве случаев единственное, что необходимо для активации продукта, — это идентификатор установки, который создается самой программой. « Depuis sa création, l'une des missions essentielles du ministère de la Culture est de rendre accessibles au plus grand nombre le patrimoine architectural et artistique ainsi que les oeuvres de création contemporaine. Ravenna è città d'arte e cultura con una grande eredità di monumenti e di edifici religiosi decorati con mosaici così unici che sono stati dichiarati patrimonio dell'umanità. Unterschleißheim, 13. Januar 2011. Im Rahmen einer Ausschreibung vergibt Microsoft Deutschland bis zu 200.000 Euro an ein gemeinnütziges Projekt, das Jugendlichen mit innovativen Angeboten beim Aufbau wichtiger eSkills hilft. Microsoft anunció que ha comenzado a migrar algunas cuentas de Hotmail al nuevo servicio que cuenta, entre otras nuevas funcionalidades, con una capacidad de almacenamiento de 5 GB. Estou aqui — Sou mente, informação viva. Expando o meu alcance— o robô em Io acorda e obedece. Elevo o meu alcance— satélites transmitem a minha alma até aos astros. お客様が私たちのソフトウェアを使ってビジネス上の課題に対しての解決策を見出し、新たな局面につながるアイディアを展開し、最も重要なことに意識を向けられるようにすること。これが、私たちの日々の仕事に対する意欲と原動力の源なのです。 会议内容:云计算是当今 IT 产业快速发展的推动力和重要机会,本次会议讨论云计算如何帮助中小企业快速提高 IT 能力,以及如何帮助他们优化内部资源,提高竞争力,并在经济发展中占得先机。 微軟全球服務與技術支援組織是微軟與客戶、合作夥伴之間溝通的橋樑,同時也是微軟創新的泉源,致力於協助客戶採用適合的技術與產品,加速客戶在雲端技術上的佈署,並藉由客戶的回饋,提供產品精進的方向,以提昇客戶服務經驗的品質及顧客滿意度。 비즈니스와 기술의 경계가 사라지고 클라우드 컴퓨팅이 대세로 자리잡은 현 상황 속에서, 세상을 움직이는 기술의 변화는 개발자와 IT 전문가에게는 새로운 기회의 장을 의미한다. استخدم كلمات سهلة ومباشرة. جٌب أن كٌون الأسلوب التحر رٌي مبسطاً وواضحاً وصح حٌاً. استخدم أكثر الكلمات بساطة ودقة ف نفس الوقت، مثل كلمة "أ ضٌاً"

بدلا من "بالإضافة إلى". تجنب استخدام اللهجات العام ةٌ ح ثٌ إنها لا تلائم هذا المجال وهذا النوع من الكتابة بالإضافة إلى صعوبة فهمها من قبل نطاق عرضٌ من الأشخاص.

Localization Vendors, or Language Service Providers (LSPs), first began appearing in the 1980s when industries such as automotive and medical saw the need to translate their content. It took several years for processes and tools to be developed to make localization work easier. When the world wide web began in 1991, advances in standardized processes and tools developed more quickly.

Today, there are two main models for localization vendors: being a multi-language vendor (MLV), or being a single-language vendor (SLV). Each kind of vendor can work directly with a client. Often though, a client will work with a MLV, who may have internal translation resources, contract with SLVs, freelance translators, or other MLVs. While some Localization Vendors specialize in certain areas, such as translation for the Life Sciences, most often they translate into a variety of vertical markets. Translation is not the only service that Localization Vendors can supply. Their services can also include:

  • Machine translation
  • Translation memory creation and/or management
  • Terminology management: Glossaries, Style Guides, Do Not Translate lists, Term mining
  • Localization engineering
  • Software testing
  • Desktop publishing
  • Website localization
  • Translation review/QA
  • Interpreting
  • Audio voiceover recordings
  • Transcription
  • Transcreation
  • eLearning and MultiMedia
  • Subtitling

Localization Vendors often have tools like client portals or some proprietary version of a translation management system (TMS) that can automate tasks like generating quotes and moving files from one party in a workflow to the next.

Localization Vendors are often certified to various ISO (International Organization for Standardization) standards to ensure quality processes and practices are in place. Depending on what your organization’s requirements are, a LSP’s certifications are more or less important.

Many organizations make use of LSPs (Language Service Provider) to source their translations. Employing translators could be an expensive exercise and is not always feasible.

There are some steps that needs to be followed to not only select a vendor, but to select the right vendor that will meet your project requirements.

These are the steps to follow:

  1. Internal requirements collection:
    • What domain experience should the vendor have e.g. medical, gaming, law, etc.?
    • What's your target languages?
    • Should the vendor have an ISO 9000 certificate?
    • Do you need them to use your CAT tool or do you want to use theirs?
    • Which services do you require; translation, glossary creation, LQA (Localization Quality Assurance), etc.
    • Estimated work volume.
  2. Request For Information
    • Select a few vendors and do a formal request for information.
    • You want information such as their supported languages list, domain experience, rates, SLA (Service License Agreement) etc.
  3. Request For Proposal
    • Select two or more vendors that meets the project's requirements and request them to submit a formal business proposal .
  4. Review the proposals
    • When reviewing you'll take into consideration their rates, location, experience, supported file formats and SLAs.
    • Select two vendors at a minimum.
  5. Submitting a Test Project
    • You want to ensure upfront that their quality is of the utmost best. Finding out in the middle of a project that the translation quality is poor can have a huge effect on the project's budget.
    • Submit a small test project of your product to each of the vendors .
    • After completion of these projects, have the opposite vendor rate the quality of the translations on a LQA CAT tools such as XTM's LQA phase.
  6. Review the Test Projects Quality Rating
    • Review each of the projects' results individually
    • Should one of the vendors have performed very poorly, submit the report to them for evaluation.
  7. Select your Vendor
    • Based on the quality rating, cost, location, etc. differences between the vendors select the most appropriate vendor for your project.
    • Sign the MSA (Master Service Agreement) & NDA (Non Disclosure Agreement), if required, with the selected vendor.

This will ensure you pick a vendor not only based on the vendor's costs and turnaround time, but quality as well.

The localization kit provides materials pertinent to the localization project. This is edited and published by the project owner, i.e. either the client or a representative of the client. The kit provides information to address the needs of all stakeholders involved in the project (translators, engineers, subject matter experts, agencies) to ensure successful execution, collaboration, and delivery of the final product. The LocKit is therefore essentially a set of tools, resources, and instructions necessary for all team members to produce the localized version of a product, whether it be software, websites or marketing material. The kit encompasses the resources, scope, and schedule of the product.

The kit typically includes:

  • Localization instructions
  • Style guides or brand guidelines
  • Translation memory
  • Glossaries/term bases
  • Staging information for online testing
  • Tools and instructions
  • Project timeframe
  • Contact information for stakeholders
  • Files to be localized

For a more detailed explanation of the file types that may be included in a localization kit: http://globalvis.com/2010/02/the-ideal-localization-kit/.

Localization industry standards permit the efficient sharing of translation data and improved workflow among translators, vendors, and other team members. The main benefit of standards is interoperability, which means that more collaboration is possible and data can be used with different tools and different versions of software.

The key localization industry standards include XLIFF, TMX, TBX, and SRX.

Computer-assisted_translation


CAT (Computer-Assisted Translation) Tools are special software programs that allow translators to translate documents and text at a faster, more efficient rate and with accuracy. The specific function of a CAT Tool is to segment text so that it can be translated in parts. There are various forms of CAT Tools such as:

                   SDL Trados
                   Cafetran
                   WordFast
                   Matecat
                   OmegaT
                   memoQ (formerly Kilgray)
                   Maxprograms

The file formats supported by CAT Tools include the following:

  • Microsoft Word (.doc, .docx)
  • Microsoft Excel (.xls, .xlsx)
  • Microsoft PowerPoint (.ppt, .pptx)
  • QuarkXPress
  • Adobe InDesign
  • Adobe Framemaker
  • Adobe Pagemaker
  • HTML (.htm, .html)
  • RESX (.resx)
  • XML (.xml)
  • XLIFF (.xliff)
  • JSON(.json)

Benefits of using this tool:

Allows collaboration among groups of translators who can share Translation Memories to ensure the same sentences are never translated twice.  A database of terminology is also stored for their benefit.  In turn, this results in improved productivity and consistency across translations.  There are also benefits for the client since it decreases their budget and ultimately saves time.

Disadvantages:

Like all automated programs, these tools do result in difficulties or errors which are usually due to insufficient knowledge on the part of the user since it takes time to learn the intricacies of these various associated programs.

How it works: First, the CAT Tool(s) will take any given text and parse it into segments of text to be translated. There will be a side by side comparison of the segmented text (original and what has already been translated).

Machine translation is the process of using artificial intelligence (AI) to automatically translate content from one language (the source) to another (the target) without any human input.[48]

Wikipedia Article on Machine Translation

Benefits Machine Translation in Human Translation Workflow

When being used with a computer assisted translation tool, a well-trained machine translation engine can rapidly handle repetitive tasks, so that human linguists can focus on the very difficult sentences that machine translation fails to handle properly, and can conduct quality assurance on the machine translation output. In effect, a well-trained machine translation engine is performing like a junior linguist who handle many simple and tedious tasks, and the human linguist is the senior linguist who performs the quality assurance and handle the more difficult contents. The result is that the human linguist can have much higher productivity and stay engaged on more challenging and interest tasks. In order to reach full potential, a language service provider should be willing to invest in well-trained machine translation engines, preferably ones that are specially trained for separate clients, locale, brands, and domains.


Weaknesses in Machine Translation

Common issues that come with machine translations include nonsensical grammar, a lack of accuracy, and sometimes text even being left untranslated. Most of what goes wrong with machine translation stems from machines having no way of knowing the context of the source unless they have received some kind of training. This can especially be true in the case of creative content. If, for example, they are translating text from a fantasy video game, they have no way of knowing that themselves, so the end result could come out completely inaccurate and unrelated to the source. Since they are unaware of what the context of the content is, it is also impossible for them to know for what purpose or audience their translation is being done. Because of this, they can create mistranslations reflecting a tone-deafness, such as not knowing if it should use terms from industrial, medical, or geopolitical fields, etc. [49]

History

edit

Rule-based Machine Translation

edit

Rule-based Machine Translation (RBMT) is a type of machine translation in which a source is translated into a target language through linguistic rules encoded by a language expert. In other words, this method of machine translation links the structure of the given input sentence with the output sentence. While regulating such linguistic knowledge poses a higher cost, more control over the system output exists. In addition, RBMT can be leveraged to enhance other methods of machine translation.

Originating in the 1970s, RBMT branched out into a few subcategories: Direct Machine Translation, Transfer-based Machine Translation, and Interlingual Machine Translation. Each variation prompts linguists to write rules in different ways (i.e. rules for each word, rules for syntax, rules for linguistic singularity).

Advantages of RBMT

  • No quality ceiling. Each error can be corrected with a targeted rule, even if the trigger case is extremely rare. This is in contrast to statistical systems where infrequent forms will be washed away by default.
  • Because of the nature of RBMT, it possible to create translation systems for languages that have no texts in common, or even no digitized data at all.

Disadvantages of RBMT

  • There is a lack of really detailed dictionaries and building one from scratch is very expensive.
  • Certain linguistic information needs to be configured manually which can be time consuming.
  • Changes, such as adding new rules or extending the lexicon, can be very costly and the results often do not pay off.

RBMT Workflow

edit
  1. Target language lexicons and grammar rules are formalized by a linguist.
  2. Software parses text to allow the system to analyze sentences in the source language.
  3. The system translates the source into the target language based on the encoded linguistic knowledge.

The initial phase requires a large time investment at a small cost to ensure higher quality translations. Translation quality improves as the linguistic knowledge increases over time. Overall, this method typically entails a hefty investment of time and resources which is the reason RBMT is rarely utilized in the modern era of translation.

The following is an example of how RBMT works:

A boy reads a book. The source is English and the Target is French. The following requirements are necessary in order to produce a minimal translation: A dictionary that correlates a French word with each English word; Rules of English sentence structures and Rules of French sentence structure. The final necessary aspect is a set of rules defining how one relates one structure to the other.

  1. Determine part of speech of each source word: a = indef.article; boy = noun; reads = verb; a = indef.article; book = noun
  2. Determine syntactic information about the verb "to read": read – Present simple, 3rd Person Singular, Active Voice "reads"
  3. Parse the source sentence: (NP a book) = the object of read
  4. Translate into French with matching categories: a (indefinite article) => un (indefinite article); boy (noun) => garçon (noun); read (verb) => lire (verb); a (indefinite article) => un (indefinite article); book (noun) => livre (noun)
  5. Map the translated terms into the appropriate form: A boy reads a book. => Un garçon lit un livre.

Example-based Machine Translation (EBMT)

edit

In 1984 Makoto Nagao from Kyoto University suggested using already existing translations to translate texts.

Statistical Machine Translation

edit

Statistical Machine Translation (SMT) is a type of machine translation that requires bilingual corpora, which are large and structured text data used for statistical analysis and hypothesis testing. Although SMT needs a large amount of text data, new language pairs can be added quickly and at a very low cost. There are different types of SMT, word-based, phrase-based, syntax-based, and hierarchical phrase-based. The phrase-based translation, which translates whole sequences of words, is the most commonly used today.

To find the most likely translation, SMT takes three basic probabilities into account.

  • P(e) - a priori probability. The chance that e (for example, English) happens. It is the chance that a person at a certain time will use the expression instead of saying something else.
  • P(f | e) - conditional probability. The chance that upon seeing e, a translator will produce f (for example, French).
  • P(e,f) - joint probability. The chance of e and f both happening.

All these probabilities are between zero and one.

In other words, we are seeking the e (target language) that maximizes P(f | e) when given f (source language). To find the maximized probability, we can use the expression below:

 

SMT depends heavily on corpora. The more advanced a parallel corpus is, the better the translation SMT can provide. Although SMT is widely adopted today, such as Google Translator, and Microsoft Translator, the fluency of the translation still remains a challenge. In addition, creating and training corpus to optimize SMT is also a big subject for SMT.

References:
Statistical MT Handbook by Kevin Knight
Statistical Machine Translation Wikipedia.org

Phrase-based Machine Translation

edit

Neural Machine Translation

edit

Neural Machine Translation (NMT) is an approach [50]of machine translation which uses artificial neural network to predict the sequence of the source sentences and translate them to the target languages. Unlike traditional machine translations which use multiple models for translation, NMT is an end to end system that only requires one large neural network. [51]

A common model in NMT is encoder-decoder model. In this model, the encoder provides an internal representation of the input sentence[52] by reading it word by word. The decoder uses this internal representation to output words until the end of sequence token was reached[53]. Because of this model supported by neural network, NMT is able to translate the entire context of the source sentence, rather than just words in it.

The advantage of NMT is its strong ability to learn and analyze, which usually leads to more accurate and fluent translation. Some disadvantages of NMT includes relatively slower translation speed (especially when the source sentence is long) and higher translation cost.

Zero-Shot Translation

edit

Zero-Shot Translation is a translation between language that system has never seen before.[54] Zero-Shot Translation is a NMT with a join system for all languages which allows connections for all languages. By doing so, the system will be able to directly translates one language to another even it has never been tough or trained before. [55]

For example, if translation of A⇄C is needed, a SMT will translate from A⇄B, then it will translates from B⇄C as the system has never trained A⇄C translation. However, a machine is trained with A⇄B and B⇄C, which a system shares its parameter to translate between these four different language pairs. Zero-Shot Translation will allow direct translation of A⇄C using the parameters and be able to generate reasonale A⇄C translation even though it has never been taught to do so.[56]

Google has announced Google Neural Machine Translation, which is a NMT develop by Google and it will be able to handle Zero-Shot Translation.

Translation Management Systems

edit

Translation Management Systems (TMS) aka Globalization Management Systems (GMS)
Globalization Management Systems

Prominent examples are:

  • SDL WorldServer
  • Plunet
  • Wordbee
  • TMS Maestro
  • Across Language Server
  • XTRF
  • XTM International
  • MultiTrans
  • memoQ Adriatic

Content Management Systems

edit

A content management system (CMS) is a software application or set of related programs that are used to create and manage digital content. CMSes are typically used for enterprise content management (ECM) and web content management (WCM). An ECM facilitates collaboration in the workplace by integrating document management, digital asset management and records retention functionalities, and providing end users with role-based access to the organization's digital assets. A WCM facilitates collaborative authoring for websites. ECM software often includes a WCM publishing functionality, but ECM webpages typically remain behind the organization's firewall.

Both enterprise content management and web content management systems have two components: a content management application (CMA) and a content delivery application (CDA). The CMA is a graphical user interface (GUI) that allows the user to control the creation, modification and removal of content from a website without needing to know anything about HTML. The CDA component provides the back-end services that support management and delivery of the content once it has been created in the CMA.

Features CMSes can vary amongst the various CMS offerings, but the core functions are often considered to be indexing, search and retrieval, format management, revision control and publishing.

  • Intuitive indexing, search and retrieval features index all data for easy access through search functions and allow users to search by attributes such as publication dates, keywords or author.
  • Format management facilitates turn scanned paper documents and legacy electronic documents into HTML or PDF documents.
  • Revision features allow content to be updated and edited after initial publication. Revision control also tracks any changes made to files by individuals.
  • Publishing functionality allows individuals to use a template or a set of templates approved by the organization, as well as wizards and other tools to create or modify content.

A CMS may also provide tools for one-to-one marketing. One-to-one marketing is the ability of a website to tailor its content and advertising to a user's specific characteristics using information provided by the user or gathered by the site -- for instance, a particular user's page sequence pattern. For example, if the user visited a search engine and searched for digital camera, the advertising banners would feature businesses that sell digital cameras instead of businesses that sell garden products.

Other popular features of CMSes include:

  • SEO-friendly URLs
  • Integrated and online help, including discussion boards
  • Group-based permission systems
  • Full template support and customizable templates
  • Easy wizard-based install and versioning procedures
  • Admin panel with multiple language support
  • Content hierarchy with unlimited depth and size
  • Minimal server requirements
  • Integrated file managers
  • Integrated audit logs

There is almost no limit to the factors that must be considered before an organization decides to invest in a CMS. There are a few basic functionalities to always look for, such as an easy-to-use editor interface and intelligent search capabilities. However, for some organizations, the software they use depends on certain requirements. For example, consider the organization's size and geographic dispersion. The CMS administrator must know how many people will be utilizing the application, whether the CMS will require multilanguage support and what size support team will be needed to maintain operations. It's also important to consider the level of control both administrators and end users will have when using the CMS. The diversity of the electronic data forms used within an organization must also be considered. All types of digital content should be indexed easily.

CMS software vendors:

  • SharePoint
  • Documentum
  • M-Files
  • Joomla
  • WordPress
  • DNN
  • Oracle
  • WebCenter
  • Pulse
  • CMS
  • TERMINALFOUR
  • OpenText
  • Backdrop CMS

http://searchcontentmanagement.techtarget.com/definition/content-management-system-CMS

Workflow-enabling tools

edit
  • Handoff systems
  • Cloud translation systems

UI adjustments (Software)

edit
  • Resizing work, re-layout

UA adjustments (Content)

edit
  • Types, specifics

Post-translation engineering

edit

Software

edit

Help content

edit

Quality Assurance Process

Quality Assurance is the evaluation process performed by testers to verify that the program is working as intended. During this process the Testing teams will typically be assigned specific aspects of the program to focus on, assessing if it functions as intended. Should the tester run into a bug, they will then write up a document describing what issues the bug is causing to the program and how it differs from the intended process. After giving this brief description, the tester will include the step by step process of how they caused the bug to occur, and how often they were able to reliably cause the bug. Typically this document will have a video or screenshot attached to it, to further show how to cause the issue, as well as a reference to the build of the program the bug was found on. Lastly, the tester will mark it with a severity typically ranging from 1 to 5, with one representing the program crashing when the bug occurs and five representing a bug that could be dealt with at a later date. After which point the bug will be sent along to Bug Triage for review and assignment. Once the bug has been assessed and fixed, it is the job of the tester to regress the bug, going back through the program and performing the steps to reproduce it. Checking to make sure that the fix worked and the bug is not longer an issue.


Linguistic quality assurance (LQA)

LQA is the term used to identify the quality evaluation process used to assess the localization/translation quality of a project against predefined standards. It usually comes with a quality score assigned to a translation using a set of criteria and a set of error categories. After giving a score to all the different evaluations, one can easily find out whether a certain translation meets the quality standards that have been previously defined.

• Bug Triage

Bug triage is the process where each bug is analyzed in order to determine the root cause of a bug, as well as the action or decision to take according to the bug severity.

This process has the advantage of reducing costs of fixing the bugs if carried out at the right time. Different teams might sit together and discuss bug triage to evaluate the impact of a bug and effort required to fix it, based on release dates.

In some cases, bug fixes are also postponed for upcoming releases based on their severity and priority. We can say that there are three main stages of a bug triage process:

- Bug Review

- Bug Assessment

- Bug Assignment


Bug review

In this stage we look at the bugs that have been raised, those that are fixed, and those that are resolved, and we analyze them.

Bug Assessment

In this phase we assess the status of a bug, to understand if it’s a real bug that needs to be fixed, and in that case, who should be the person/team responsible for its fix.

Bug Assignment

At this stage of the process, we assign bugs to the correct team in order for them to provide a fix.

  • Linguists

Linguists may work independently as freelancers or be employed by a language service provider (LSP) or client company. They include interpreters, translators, editors and proofreaders. Translators translate texts from source language to target language in a timely manner. They must know how to work with various tools and technologies that support the translation process, such as translation memory, translation workflow and other computer-aided translation (CAT) tools. Interpreters translate spoken language. Interpreting can be simultaneous or consecutive. Editors and proofreaders check translations for mistakes and consistency of terminology, and generally refine the translation ensuring that the text no longer reads as a translation, but as if it was originally written in the target language.

Linguists must possess an aptitude for language and global cultures in their specialization. Sensitivity to nuance and contextual meaning is important. Strong communication skills, attention to detail and precision are a must in translation work.[57]:25

  • Localization quality assurance (QA) professionals

Localization QA professionals can be employed by an LSP or client company. They must have an exceptional attention to detail, systematic approach to working in a unified fashion and strong technical expertise.

  • Terminologists

Terminologists are professionals who study, create, and use terminology, especially in professional translation project management. A terminologist may facilitate the writing, editing, and translation process by researching and locating information that may assist linguists and other language services professionals produce high-quality translations. Terminologists ensure accuracy, consistency, and appropriateness of term usage.

  • Internationalization engineers

Internationalization engineer is responsible for having all technology products developed in a way that facilitates and considers localization and translation processes and requirements. These individuals work closely with developers on a code level to make sure that everything that affects the success of localization (date and time formats, Unicode compliance, font compatibility, design for text expansion etc.) be addressed in advance at the beginning of development. Skills that successful internationalization engineers must possess include a solid understanding of software and technology product development, coding and various technical development languages, and the ability to identify fags for internationalization issues. They need to have a strong comfort level working with technical engineers in software, technology development and localization engineering. Clear communication and the ability to teach and inform peer groups and management of this area of expertise is important.[57]:26

  • Localization engineers

A localization engineer works directly with any product, document, website or device that requires translation. At an LSP, localization engineers will be responsible for many things. They assess files for quoting localization and translation work. They dictate how files for localization be received by the client company. They take in files, process them, work with translation and localization tools, and help execute all necessary preparation of the files for translation. When files are translated, localization engineers recompile the files in any development format or system for reintegration into the final localized product. They work with localization QA to verify and fix errors. And they collaborate with client development and localization teams as necessary. Sometimes a localization engineer may be responsible for internationalization.

At a client company, localization engineers work as an integrated member of a development team to ensure that localization happens seamlessly. They alert the development teams of necessary localization requirements, receive files and special instructions on development and get to know the product that is being developed inside and out. They may work with an LSP company and their engineering team to answer questions and facilitate the technical aspect of the overall process.

Localization engineers must have an exceptionally high knowledge of development technologies that they are working with and how they fit in the localization process. They must be able to integrate various localization and translation tools such as translation memory (TM), translation workflow tools and other CAT tools.[57]:26

  • Solutions architects

This is a higher level technology professional who works with development teams, clients and sales people in an LSP to craft complex solutions for localization. Solutions architects must have technical competence and strong communication skills, be able to give presentations to decision makers and be a go-to person to solve challenging technical puzzles. To be able to assess and recommend the best path forward to a client, solutions architects must possess a solid understanding of software and technology product development, no matter the client and what they are building, and a firm understanding of the localization process at that particular LSP.[57]:27

  • Localization strategists

This is a higher level technology professional who works in a client company and defines a localization strategy for a product. A strategist is tasked with looking for the newest language technologies, finding ways to optimize the translation or localization process, and creating vendor and pricing strategies that create efficient and effective vendor and LSP relationships with the company. They are generally tasked with making everything in the translation or localization process go faster, cheaper and better, year over year.[57]:29

  • Technical managers

A technical manager handles a technical team consisting of localization engineers, internationalization engineers, localization QA professionals and solutions architects. This person ensures that all requirements for localization are met by assigning teams, resources, budget and expertise to any given project at any given time.

At a client company, technical managers may be responsible for several development departments, with localization as part of it. They work to support that everything related to localization success is in place and available for the teams to achieve their goals.

Technical managers at an LSP will run the entire technical department of localization engineers, internationalization engineers, localization QA professionals and solutions architects to perform all technical functions to support client assignments. They focus on budget, resourcing and time allocations to ensure the success of their teams. Important skill sets of the technical manger include solid people management expertise coupled with technical expertise. A technical manager only has credibility from a technical team if he or she has actually been an engineer in the past and has a strong knowledge of the complexities of technology.[57]:27

  • Executives

An executive is anyone who holds a high level management position at an LSP, or who holds any C-level position (CEO, COO or other) and has overarching responsibility for management of a language company. In the language industry, an executive must be comfortable working across cultures and in a global context. He or she must have expertise in professional business and technical services. Executives must know just enough about the language industry to be credible, but possess all executive leadership skills to pay attention to the bottom line and financial profitability. They must know how to optimize investment in technology, innovation, resources and people to do everything that their business requires. Strong skills in presenting, motivating and representing an organization publicly are essential.[57]:27

  • Operations managers

An operations manager may also be referred to as department manager, production manager or group manager. The operations manager is responsible for a team of specialists and professionals to get work completed on time, on budget and with excellent quality. An operations manager requires general people management and development skills, must know how to recruit and retain talent, take ownership of budgets and other administrative responsibilities and keep work flowing throughout an organization. These people assign resources, approve timelines and work with executive teams to ensure that all work gets done as promised to partners or clients.[57]:28

  • Project managers

Project managers are in charge of the execution of all the different projects that require translation or localization. They understand what needs to be translated or localized, organize the appropriate vendor and internal resources, and also create a schedule, timeline and associated budget. They work along the way to be sure everything is delivered on time and on budget. They track and resolve issues, work with developers and various departments to be sure that everything they are responsible for works out as planned. They usually report to an operations manager. A project manager needs to have excellent communication skills and the ability to work with people ranging from those in management to linguists to engineers to clients and others. Organizational skill, managing complexity and being able to keep track of several moving parts at once are essential. Financial budgeting skills are required, as well as the ability to negotiate and persuade people to do what is needed.[57]:29

  • Sales executives

A sales executive, on the other hand, is responsible for finding clients for a company and bringing in revenue. Sales executive positions require excellent communication skills, possess the ability to identify new business opportunities, make contact with decision makers, demonstrate the abilities of the company or service organization they represent and land business. A big part of a sales position is being consistently proactive to continually generate new business and form relationships. Resiliency, focus and natural motivation are required here. A sales executive in the language industry would do best enhancing his or her skills in selling technical professional services. There is a strong focus on “relationship” selling, which means that sales executives must learn how to get to know their clients, what their client challenges and needs are, and what the solutions are.[57]:29

  • Procurement managers

A procurement manager is a client side position, and is responsible for services agreements between the company and its language service providers. A procurement manager only exists at large companies. A procurement manager must have excellent negotiation skills, be able to craft detailed pricing strategies and form legal agreements with legal professionals. They will likely deal with the request for proposal and request for quotation process, billing, pricing and terms negotiation.[57]:30

  • Vendor managers

A vendor manager is the person at an LSP who forms relationships with third party partners, like linguists and contractor organizations. A vendor manager is responsible for sourcing and recruiting various professionals and specialists, testing and qualifying these vendor resources, and maintaining up-to-date contact records with these vendor individuals or companies in order to call on them when their skills are required. A vendor manager is akin to a human resources recruiter, but with a specialization.[57]:30

  • Volunteers

For individuals wanting to start their career or learn more about the localization/translation industry and its processes, volunteering is a great place to start. There are many places on the web to start volunteering as a translator/localizer, video subtitler, QA analyst, etc. Some places include TED Conferences, Mozilla, Facebook, Rosetta Foundation, and so on. Volunteering is not only a great way to get one's foot into the industry, but also get a chance to work and collaborate with other localization-related roles such as project managers and other linguists.

Multimedia Localization

edit

Multimedia translation also sometimes referred to as Audiovisual translation, is a specialized branch of translation which deals with the transfer of multimodal and multimedial texts into another language and/or culture and which implies the use of a multimedia electronic system in the translation or in the transmission process.[58]

Basic principles of Subtitling:

There are technical and cultural aspects which must be adhered to in Subtitling. Here are a few basic principles to good subtitling:[59] [60]

Not a literal translation - In subtitles there is only a limited number of characters per subtitles (usually 36-42 characters). The subtitler should have a good knowledge in the art of summarizing, to give the most concise and accurate translation and interpretation or adaptation of the original text into the target language in the fewest words possible.

Translation of humor -Translation of humor is always difficult as jokes that are hilarious in English may mean nothing in the target language. Same applies to puns or proverbs. In such instances, the translator needs to be creative, find something similar in the target language that will render the original sense and contribute towards understanding the message and also the plot.

On-screen Text – On-screen text, all important written information in the images such as names of establishments, road signs, billboards etc. that stay on-screen and are significant to the plot should be translated and incorporated wherever possible.

Grammar and Spelling - Use correct spellings and tenses for verbs along with the agreement of Subject with the verb.

Punctuation - Punctuation to be used according to language rules. There should not be any space after text and before punctuation marks.

Numbers - Numbers are written using numerals. When a sentence starts with a number, letters are used instead of numerals.

Lines - Never use more than two lines otherwise the subtitle would look too crowded and distracting. For an extended monologue, just use more subtitle frames.

Characters per second - The number of characters per seconds is important for the comfort of the viewers reading speed. The adult programs have 22 characters per second and children program have 18 characters per second.

Characters per line - There are 42 characters per line.

Fonts - Font color is either white or yellow. Bold and Underline is not permitted in Subtitling.

Hyphenation - The dialogue between two onscreen speakers is separated using hyphen ‘–’ .

Ellipses - Ellipses are used to indicate pause or an abrupt interruption, not when a sentence is split between two continuous subtitles.

Italics - Description of the scene, relevant sounds, singing or music are put in Italics within [square brackets].

Line break - Line division is particularly important to how subtitles look on screen and most importantly the speed of reading and comprehension. Do not break a line between pronouns and verbs, articles and their nouns, a person’s full name, conjunctions, prepositions, verbal phrases, idioms, expressions and abbreviations.

Commom issues and Challenges with Subtitling

edit

Audiovisual Translation (or “AVT”) scholars have conducted subtitling studies focused on audience attention or engagement.[61] [62] By using methods like eye tracking they can make recommendations on the optimal length of time to display the subtitle on screen, text segmentation (line breaking), subtitle shape, reading speed, etc. The goal of these studies is to allow viewers to follow the text in the subtitles, comfortably, and to understand the information conveyed. Appropriate subtitle speed and segmentation allow viewers to follow the text in the subtitles with ease yet to have enough time to take in the on-screen action. If subtitle speed is too fast and segmentation does not adhere to linguistic rules, viewers may find it difficult to follow and understand the information contained in the subtitles. Even a translation considered grammatically perfect, with the most accurate terminology, could be considered useless if your viewer can’t read it. Having this in mind, here are the main challenges an audiovisual translator faces daily:

  • Time and space constraints (reading speed)

The subtitle must be displayed on the screen during the time it is appropriate. The audience should have enough time to read the text, and the text should not cover more space than necessary on the screen. Reading speed varies according to the audience (children, adults, level of education, to name a few). So, the decision about how many characters to display per second is relative to your viewing public.

  • Inter-semiotic translation

The audiovisual translators are not only translating text from one language to another. They are converting spoken words into written form, and must consider congruence with all the other visual signs such as gestures, expressions, and images hitting the audience at the same time. That is why the ability to render full meaning with concise text, while respecting its context, is so important. Information must be prioritized to convey the right message and avoid dissonance.

  • Shot changes

The subtitles should follow the audio but also respect what is happening visually. For example, when there is a shot change and the camera shifts away from the speaker, you risk your subtitles being left hanging. Any such glitch will affect the viewer and break their immersion in the visual experience.

  • Tech-savvy

With this type of translation, there is a constant need to deal with technical aspects. It is common to have audio or video format problems, and the audiovisual translators need to know how to convert files and adhere to technical specifications presented by the client or inherent in the product you need to deliver. Does the client want you to provide the subtitles in .srt, .sub, .ttml, .xml…? Do they want you to embed the subtitles to the video? Is the subtitle time-coded to the correct frame rate? Many such questions can arise.


Finally, it is imperative to have in mind that in this type of translation the public will be exposed to the source language throughout the viewing experience. The greater their knowledge of the source language, the more likely they are to compare the original words spoken with your translation. You have to be prepared to educate your client on the constraints described above and how you will need to adapt your translation to navigate them.

Video Game Localization

edit

Video games localization is the adaptation process of video game software and hardware performed before its introduction to a particular country or region. This process may include translation of text, new audio, modifying storyline/characters, changing art, creating manuals and appropriate packing, and even adapting hardware to fulfill market standards, linguistic, cultural, and legal needs, among others. Aside from optimizing profits, the ultimate goal of video game localization is to replicate the original source experience (e.g. English, Korean, Japanese, etc.) in an equally enjoyable experience that caters to the end user's cultural context[63].

Localization Kit

edit

To localize a video game –like any other software- a localization kit is necessary. This kit will include resources that translators may need to localize all video game content and related materials. It is vital for the translators to understand the product to be localized, workflow, their respective roles and tasks.

Tool Kit
edit
  • Localization manual - may include word count for all text and audio, special instructions, details, tools, formats, and graphics, among other details.
  • Text - both in-game text and text used in supporting materials outside the game, as well as a glossary
  • Dubbing materials (if applicable) - list of characters and dubbing actors, dubbing text, and audio file samples
  • Graphics - interface, textures, and fonts
  • Tools - toolkit for localization and version compilations

Translation Flow

edit

An organized and clear translation flow is vital for a successful content translation and eventual localization. This flow can vary, but often has usually six steps: review, translation, proofreading, creative rewriting, debugging/testing and delivery.

  • Review the original files and determine what work is necessary, e.g. worldview, target demographic and region, file structure, tags and variables, character limits, line breaks (editable/un-editable), dialogues, etc.
  • Translation starts using a translation tool and database or translation memory to keep consistency with vocabulary and phrasing.
  • Proofreading by an editor to keep consistency of language and tone, correct mistranslations or missing content, character limits, etc.
  • Creative rewriting is necessary to adjust a character's tone, speech and personality according to the target locale.
  • Debugging and testing will report and fix any line break/position issue, truncation, broken strings, typos, etc.
  • Delivery is the last step of process when files are delivered in MS Word/Excel/PowerPoint, HC TraTool, SDL Passolo/Trados/WorldServer, XTM, or other compatible format[64].

Common Issues and Challenges in Video Games Localization

edit
Hard-coded Strings
edit

Follow best string wrapping practices that fit internationalization standards. When extracting text from the source code, make it into the resource file. Save one source file for each of game’s locales.

Contextual Information
edit

The localization project manager or person responsible for the localization should provide contextual information or answer any query regarding the project.

Gamers as Translators
edit

Translators need be to be native speakers, preferably active gamers who are familiar with the game's genre.

On-Device Testing
edit

Pseudo-localization testing, in which you replace the textual elements is a common method. A simpler and more cost-effective is on-device localization testing, which has the benefit of letting gauge the overall quality of the localization and not just glitches. Also, set text space as autofit with the text, which prevents some of these common UI issues.

Culturalization
edit

Games are often story-driven or very creative in language, and can also be laden with puns and cultural references. Depending on the setting of the game, it might involve knowledge from history, technology, and many more varied fields of study. Since the range of contexts can be humongous, in getting the "gist" out of the source text, one should be pun-sensitive and good at researching. For game-related concepts, game-savvy native-speaker translators are the best cultural advisers!

Another side of the issue is cultural taboos. Gestures, certain practices, political and religious topics that are appropriate in one culture may be offensive in another.

Translation Management
edit

Centralize translation management with a TMS (Translation Management System) to organize translations and reuse them.

Timing
edit

Think about localization from the start. Wrap strings at an early stage of development and have them ready for localization or tweak the coding style to meet international standards.[65]

Turnaround
edit

Localization is often the last step in the production of a game, and if there is delay in the production process, it is likely to compress time for localization. After games are released, to ensure their presence in the market, games are getting updated more frequently than ever before. For games that are available worldwide, it is crucial to simultaneously ship the translation, in order deploy the updates in local markets. To constantly deliver high-quality translation in a short amount of time, there should be a well-designed agile workflow to send the source text to localization and then back to the game.

Example of a Common Issue in Video Game Localization
edit

Video game localization involves languages with little to no similarities in their rules. One common issue is the difference in the User Interface (UI) that translators have to consider. The gaming experience must be intuitive regardless of the language the user chooses to play, especially in most text intensive games genres such as in Massive Multiplayer Online Role Playing Games (MMORPG).

  • The following is an example of localizing the user interface from Korean to Brazilian Portuguese in a MMORPG user menu.
UI Character Limit in Game Menu
English (US) Brazilian Portuguese Korean Maximum Allowed
Exit Desconectar 나가기 5 characters
Battlefield Campo de Batalha 전장 12 characters
Pet Animal de Estimação 5 characters

Note that all of the three words for "Exit", "Battlefield", and "Pet" for Brazilian Portuguese are longer than the allowed character limit. This issue is commonly resolved by:

  1. Changing the translated word without altering the original meaning. (e.g. the use of the word "Mascote" instead of "Animal de Estimação" for "Pet" maintains the original meaning in Korean).
  2. Requesting a larger text field from the designer/developer. (e.g. the word "Campo de Batalha" cannot be replaced because it is a major feature in the game and Brazilian Portuguese does not offer any replaceable word).

Although these words are simple to automate through translation tools, translators have to separate certain UI texts into exclusive strings due to the text limitations. Usually the Debugging and Testing steps resolve UI related issues, however, the fixes should not be overwritten by new automated translation data.

Determining what to Localize

edit

Video Game Localization teams are no different than other teams in the industry, and thus, are equally interested in expanding their product's presence to include other territories. In order to justify the investment there are certain aspects that L10n teams will consider. The most determining aspect will be the financial viability of localizing a given game into a given locale. Once this is decided, the other main concern will be the level of localization that the game is going to go through.

1.    Considering the financial viability of localization -- Localizing which games into which languages

Localization teams, and more specifically publishers, will determine which games to localize into which languages based on the potential financial success of the localization. In order to determine this, the publisher will consider the amount of money, time, and resources put into the localization against the potential return on the investment (or ROI). To know the potential ROI, the publisher will order a profit and loss statement (P&L) for the localization of a specific game and a specific language. If the statement determines that the numbers for the potential sales are higher than the numbers of development, localization, and marketing costs combined, then the localization project for that specific game and that specific locale can move forward.

Ideally, localization teams want to simultaneously ship (sim-ship) the localization with the source version of the video game, so the marketing strategy happens at the same time, has more impact, and thus generates more sales. Although sim-ship is the ideal scenario, it does not always happen. In pre-production some locales are not accounted for in localization, or there is simply not a big enough budget to localize a game into, let us say, East Asian languages. However, once the game has shipped and the ROI is higher than expected, there may be room for other locales down the road. In this case, the team is going to have to adjust to the added difficulty of having to localize a game that has already been released.

2.    The Level of Localization

It is widely known that users prefer to interact with products that are in their own languages. Similarly, gamers prefer games that create the illusion that were designed specifically for their language, culture, and territory. From project to project, however, the level of localization may vary. The level of localization refers to how much of a game is localized and how much is left in the source language. The following are the different levels of game localization:

  •     No Localization: Neither game or package are localized. The source version is distributed globally.
  •     Packaging and Manual Localization or “Box and Docs” Localization: Only the packaging and the manual of the game are localized. The game itself remains in its original state.
  •     Partial Localization: The game text and the packaging are localized, and the voice lines are subtitled.
  •     Full Localization: The game is localized entirely and voice lines are dubbed and subtitled. Full localization may contain partially localized content. That is, the fully localized version may have certain terminology/names or culturally-related images kept from the source version if the team determines that the target locale will prefer them (especially true of non sim-ship localized game versions).[66]

"Games as a Service" Localization

edit

The "Games as a Service" (or GaaS) model, which has become more common with the rise of mobile gaming, changes the localization process for video games significantly. What was formerly an industry ruled by products that were considered complete after shipment has evolved into one where software updates and downloadable content are the norm. Here are some ways that localization for GaaS differs from traditional video game localization:

  • Beta Releases - Many games, especially those played online, are made available in a beta build for players to experience before the game's official release. During this time, the game developer will collect feedback from the players on many aspects of the game, including its localization into the player's various native locales.
  • Continuous Updates - Games as a Service may receive continuous updates for years following their initial release to keep players interested. This means that all of the new in-game events and content must be localized, as well as corresponding promotional marketing materials. This can include all types of localization, including recording new voice lines, translating new scripts, fixing issues, and even adapting new game modes and functions.
  • Turnaround - Turnaround for GaaS can be especially quick, even as short as 24 hours, given that there can be daily updates. The amount of content for each drop may be smaller in these cases.
  • First-Time User Experience - First-Time User Experience (or FTUE) is of vital importance in GaaS, many of which use free-to-play or free-to-start models. In order to make the game profitable, the first few moments of gameplay must entice the player to keep playing, and so localization of those first few moments is given extra importance as a result.
  • Maintaining Tone - To make the players' experience the best it can be for the entire life of a game, it is important that the tone of the writing be kept as consistent as possible. Retaining translators and proofreaders helps with this task.
  • Gamer Feedback - Throughout the life of a game, the players will likely give feedback. Though all feedback is subjective, analyzing and reacting to feedback on localization can make a difference in how players perceive the game, and as a result, how much time and money they spend playing.

Localization as a communication process

edit

We define localization as the linguistic and cultural adaptation of digital content to the requirements and the locale of a foreign market. It is a process of adjusting software for a specific region or specific languages by adapting local components and translating text. The professional practice of localization is now very important in different industries, such as technology, medical and pharmaceutical. This practice is also getting bigger, now, some companies want to expand their products around the world, and they understand that they need to consider localization efforts, they need to include it in their business plans. Yes, now we can see more and more companies from different industries adding localization practices on their side. But, localization is much more than adapting, translating, codes or flows, localization, is a communication process. Now, let´s try to understand what communication is. Communication, unlike localization, is much older. Communication is as old as humanity. Communication is a natural process and a complex term to define. The communication professionals define communication as the process of sending information, this process, which also could be a cycle, has these components: the sender, the message, the channel, the receiver and feedback, this simple and complex process is how we can understand how communication works. Everyone communicates something all the time, the conversation that the people had this morning with their families or co-workers was part of a communication process, the email that the employees of a company got this morning was part of a communication process.

When is the communication process considered successful? The answer is also hard to find, but we can say that a success in communication is when the sender gets a reaction from the receiver. We can understand this when a seller achieves a sale, or a marketing campaign increases the sales of a product, or when a candidate wins an election from the votes of a group of people. All of these examples don’t mean that the pure act of sharing information is the key, but one thing we can say, in all these examples the way the sender sent the message to the receiver was an important element for achieving the goal of the seller, the campaign and the candidate.

Communication professionals say that before creating a message, or defining the channel, one of the most important elements is to know and understand the receiver, the ones who are going to get the message, because the more the sender knows the receiver the more the receiver will understand what the sender wants to say. And here is when we can talk about localization. What is the final goal of localization efforts? Probably, one of the answers is to make one product accessible to more people around the world, for example, localization can make it so that more people in different countries can use Microsoft Office, or play a video game, or watch a movie, or just understand the side effects of a pill. Here, is when the localization process becomes a communication process, and the localization practice, the editors, linguistics and translators become communication professionals, they must understand the local market and they can define the way a message must sound. They are not going to write the narrative or the story of a movie or video game, they, based on their local knowledge and their linguistic background, will find the right word for communicating the same meaning as the one written in the original version. Understanding how important is to know our public for a successfully adaptation of a text is as important as other components in this such an interesting processional practice of localization.

Marketing Localization

edit

Producing marketing content from one distinct local for another by adapting, language, text, font, images, & cultural taste to be appealing & relevant to that audience. Expanding and hitting the international market is key to a successful business. All the standard marketing rules apply to sell a product but when you attempt to sell the product internationally there may need to be some changes made. The existing product information often can not just be translated into another language and sold. The information may make no sense, be humorous, or be offensive in another location. Culturalization plays a big part in adapting a product to be appealing. Language, Imagery, color, nationality/origin, religion, slang, sex, all mean different things in different cultures. Some existing advertisements simply cannot be adapted & must be "re-skinned" or "re-branded" to work in other places.


Tips for successful Marketing Localization

1. Differenciate localization from translation Translation can be for a wide audience that is native in the target language they are translating into however when performing marketing localization, a translator caters to a more specific audience. For example, localization can be done in Canadian French or Brazilian Portuguese instead of standard French and Portuguese. Localization means formatting numbers, dates, and other culture-specific aspects accrding to the requirements of the target language.

2. Research potential markets Market research is an integral part of developing a business and developing a marketing campaign. The research will identify which markets you can explore and which languages will be used for localization. What do you need to know about potential markets? Explore as much information as possible. Also, consider surveying the target audience. The survey will help understand what their market lacks and how your product could be useful to them. Add social channnels provide marketers with valuable social data that can be used when doing research on potential markets. Check WEN analytics and social media data.

3. Hire local talent Hiring a local translator who is a native speaker of the target language is necessary to fit your business to the target culture.

4. Introduce localized social and web pages. Localizing social media pages is essential in implementing marketing campaigns. You can introduce your product to the new audience and communicate with them directly. Web site localization is an absolute necessity if you plan on introducing your product to foreign audience.

5. local marketing placement Product placement is less popular compare to social media marketing and digital marketing strategies. It is expensive strategy to implement but brings great results when it comes to entering a new market. We need to coorporate with movie directors or TV show producers to get your product in their creations.

6. Work with local influencers Influencer marketing powerful tool to enter a new market. If you plan to explore new market, working with the influencers who are native to them will bring you many benefits.

7. Follow local Laws & regulations It's important to consider what regulations exist around advertising including text, imagery and video since these vary immensely by region and country.

Language-specific Topics

edit

French

edit

German

edit

Italian

edit

Japanese

edit

Localization and Kinsoku Shori (In Japanese: 禁則処理)

What is Kinsoku Shori?

Kinsoku Shori are word wrapping rules that apply specifically to the Japanese language. Similar rules also apply to other East Asian languages such as Chinese and Korean.

Here are some examples of characters that cannot be placed at the start of a line:

Small characters such as ッ, ャ, ョ.

Punctuations such as 。、

Closing brackets such as }, 」, )

To see the full example, please visit: https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages

Many word processing software products have built in features to control line breaking so that an incorrect character does not come at the start of a line. The characters which cannot be separated will remain together. Examples of such software are Adobe InDesign and Microsoft Word.


Why is this important to localization?

The Japanese language becomes hard to read if those characters appear at the start of line, or are separated on two lines. Also, for marketing headers or other highly visible text, incorrect line breaks are considered to be a sign of unprofessional design work.

While some word processing and desktop publishing software products automatically do Kinsoku Shori. But, it is often overlooked during web page internationalization and localization. It is not a simple process because each browser behaves differently. Maintaining Kinsoku Shori in various screen sizes, across multiple devices is not easy. There are ways to take care of Kinsoku Shori using Cascading Style Sheets (CSS).

Currently, layout errors caused by line breaks or a lack of Kinsoku Shori are fixed manually. For example, a linguistic QA tester logs a bug, then the developer fixes it as needed. It is not possible to apply Kinsoku Shori to the entire web page manually. Therefore, lower visibility text is often left as it is.


Reference:

https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages

https://ja.wikipedia.org/wiki/%E7%A6%81%E5%89%87%E5%87%A6%E7%90%86

https://w3c.github.io/i18n-tests/results/line-breaks-jazh

Spanish

edit

The list of locales.

The list of English stop words.

Translation Glossary of Terms [67]

The following list is composed of the most common terms used in interpretation or translation. Understanding all the jargon of interpretation or translation is vital when it comes to building relationships personally or professionally.

A

Adaptation — The process of converting information into an appropriate format for the target language and culture.
Algorithm — “TM” applications employ matching algorithm(s) to retrieve similar target language strings, flagging differences.
Alignment — Alignment is the task of defining translation correspondences between source and target texts.
Alignment Tool — Application that automatically pairs versions of the same text in the source and target languages in a table. Also called bi-text tool.
Ambiguity — Situation in which the intended meaning of a phrase is unclear and must be verified – usually with the source text author – in order for translation to proceed.
Antonym — Antonyms are opposites words, that reside in an inherently incompatible binary relationship, e.g. In the pairs – male:female; long:short; up:down; and precede:follow.
Arabic Numerals — Set of ten numerals (0,1,2,3,4,5,6,7,8,9) that comprise the most commonly used symbolic representation of numbers throughout the world.
Artificial Intelligence — Branch of computer science devoted to creating intelligent machines that produced the first efforts toward machine translation.
Attribute — A property defined and applied to a Translation Memory units/segment to help sequence retrieval. Attributes are also those fields that define and qualify term bases.
Automatic Retrieval — When a translator moves through a document, TM’s are automatically searched and displayed. (Server based).
Automatic Substitution — Exact matches come up in translating new versions of a document.
Automatic Translation — Machine-based translation process not subject to input by a human translator.

B

Back Translation — Process of translating a previously translated text back into its source language.
Bidirectional — Script that normally reads from right to left but contains some exceptions in which other characters, like numerals, read from left to right. Hebrew and Arabic are examples of bidirectional languages.

C

CAT (Tools) — Computer-assisted translation (tools) – The process by which a human translator uses computer software to facilitate translation.
Common Sense Advisory — Market research agency providing data to operationalize, benchmark, optimize, and innovate industry best practices in translation, localization and associated industries.
Character Set — Collection of symbols or characters that correspond to textual information in a language or language group.
Cognate — In linguistics, cognates are words that have a common etymological origin. An example of cognates within the same language would be English shirt and skirt.
Compilation — The activities required to check, process and output to one or multiple target formats in a single source publishing environment (e.g. Robohelp).
Collaborative Translation — Emerging approach to translation in which companies use the elements of crowdsourcing in a controlled environment for working on large corporate projects in short periods of time.
Concatenation — Procedure of linking multiple files or messages together as a single document, often to facilitate processes such as search and replacement, term list extraction, collocation finding, and repetition rate establishment.
Concordance — This feature allows translators to select one or more words in the source segment and the system retrieves segment pairs that match the search criteria.
Consistency — Measure of how often a term or phrase is rendered the same way into the target language.
Context — Information outside of the actual text that is essential for complete comprehension.
Controlled Vocabulary — Standardized terms and phrases that constitute a system’s vocabulary.
Controlled Language — Language in which grammar, vocabulary and syntax are restricted. In order to reduce ambiguity and complexity and to make the source language easier to understand by native and non-native speakers and easier to translate with machine and human translation.
Country Code — Abbreviation of two or three characters to signify a country or dependent area. ISO 3166 specifies country codes, such as “AL” for Albania and “CZ” for the Czech Republic. There are also country codes for telephone numbers, such as +1 for the U.S. and Can
CMS — (Content Management System) Tool that stores, organizes, maintains, and retrieves data.
Crowdsourcing — The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers
CT3 — Abbreviation for community, crowdsourced, and collaborative translation.
Cultural Adaptation — Adjustment of a translation to conform with the target culture.
Cultural Assessment — Examination of an individual’s or group’s cultural preferences through comparative analyses.
Culturally-Sensitive Translation — Translation that takes into account cultural differences.

D

DBE — Abbreviation for double-byte enabled.
Desktop Publishing — Applications like FrameMaker, PageMaker, and QuarkXPress to prepare documentation for publication.
Dialect — Variety of a language spoken by members of a particular locale and characterized by a unique vocabulary, grammar and pronunciation.
DITA — XML-Based architecture for authoring, producing and delivering technical information.
DNT — Abbreviation for do not translate. List of such phrases and words include brand names and trademarks.
Domain — The area of knowledge communicated within a text, translation, or corpus.
DTD — Document type definition. Description of how content should be structured, providing rules for tags and characteristics, to enable programs to more easily process and store the document. Commonly abbreviated DTD.
DTP: Desk Top Publishing — It’s about using specific software to combine and rearrange text and images and creating digital files.
Double-Byte Enabled — Quality of an application or program that supports double-byte languages. Commonly abbreviated DBE.
Double-Byte Language — Language - such as Chinese, Korean, and Japanese – that requires two bytes (16 bits) to represent each character precisely.
Dubbing — Recording or replacement of voices commonly used in motion pictures and videos for which the recorded voices do not belong to the original actors or speakers and are in a different language.
Dynamic Content — Data produced in response to changeable, unfixed and retrieved from a database through user requests.

E

Eastern Arabic Numerals — Set of symbols used to represent numbers in combination with the Arabic alphabet in various countries, including Afghanistan, Egypt, Iran, Pakistan, Sudan, and also parts of India. Also called Arabic Eastern Numerals.
Editing — Editing – Second level of review in the traditional TEP process.
Encoding Scheme — System that assigns a numeric value to each character, in order to convert the character set to an automated form for transmitting and maintaining information.
Exact Match — Exact matches (during translation memory analysis) appear when the match between the current source segment and the stored one has been a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches also referred to as 100% matches.
Extended Characters — Characters that exceed the ASCII character range of seven bits, such as characters with diacritical marks or non-Roman characters.
eXtensible markup language (XML) — Metadata language used to describe other markup languages. Commonly abbreviated XML.

F

False Friends — False friends are pairs of words or phrases in two languages or dialects (or letters in two alphabets) that look or sound similar, but differ in meaning.
FIGS — Abbreviation for French, Italian, German and Spanish.
Functional Testing — Reviewing software applications and programs to ensure that the localization process does not change the software or impair its functions or on-screen content display.
Fuzzy Match — Indication that words or sentences are partially – but not exactly – matched to previous translations. When the match (during Translation Memory analysis) has not been exact, it is a fuzzy match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.
Fuzzy Logic — When exact matches cannot be found, Fuzzy Logic creates near matches in text, to translation memory terms.

G

GILT — Acronym for globalization, internationalization, localization, and translation.
GIM — Abbreviation for global information management.
Gist Translation — Use of human or machine translation to create a rough translation of the source text that allows the reader to understand the essence of the text.
Globalization (G11N) — The process by which regional economies, societies, and also cultures have become integrated through a global network of political ideas through communication, transportation, and trade.
Glocal — Combination of the words ‘global’ and ‘local,’ used to describe products or services intended for international markets and have been customized for different languages, countries, and cultures.
Glossary — A glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms.
GMX — GILT Metrics. GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics.

H

Homonym —  A homonym is one of a group of words that share the same spelling and the same pronunciation but have different meanings.

I

In Context Exact (ICE) Match or Guaranteed Match — An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph.
In-Country Review — The evaluation of a translated text by an individual who resides within the country where the target text will be used.
Internationalization (I18N) — Internationalization is the planning and preparation stages for a product that is built by design to support global markets.
Interpretation — Process of rendering oral spoken or signed communication from one language to another, or the output that results from this process.

L

Language — System of signed, spoken, or written communication.
Language Tags and Codes — Language codes are closely related to the localizing process. They indicate the locales involved in the translation and adaptation of the product.
Language Combination — Group of active and passive languages used by an interpreter/translator.
Language Kit — Add-on feature. This feature permits a keyboard to produce character sets for a given language.
Language Pair — Languages in which a translator or interpreter/translator can provide services.
Language Services Provider (LSP) — An organization or business that supplies language services, such as translation, localization, or interpretation. Commonly abbreviated LSP
Leverage — The practice of reusing previously translated terms and phrases in new translations. Also, the rank which evaluates how much of the previously translated text can be reused.
Linguistic Parsing — The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank. On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text. So parsing is used to normalize word order variation of phraseology, this is which words can form a phrase.
Literal Translation — Translation that closely follows the phrasing, order as well as sentence construction of the source text.
LISA — Localization Industry Standards Association.
LISA QA Model — A metric for the evaluation of translation quality developed by the Localization Industry Standards Association.
Localization (L10N) — Process of adapting or modifying a product, service, or website for a given language, culture or region.
Localization Engineering — Software engineering carried out to support localization. Activities include internationalization, bug fixing, functionality testing, dialog box resizing, help compilation, as well as other software-related activities. Most LSPs charge for these services by the hour.
Localization Tool — Application that assists with the translation and adaptation required for localization.

M

Machine Translation (also known as automated translation) — Translation carried out exclusively by a machine. Commonly abbreviated MT.
Machine Translation Plus Translation Memory — workflow and technology process in which terms not found in translation memory are automatically sent to the machine translation software for translation.
Markup Language — The language that uses annotations to indicate how text should be formatted.
Match — that words or sentences are matched – either partially or fully – to previous translations.
Meaning-for-meaning translation — Translation for which the words used in both languages may not be exact equivalents, but the meaning is the same.
Mega-Language — of the ten most important languages on the web, including Chinese, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Metadata — Information that describes data.
Mirroring — Process of adjusting the layout of a software or a website to be aligned to the right to serve the needs of Bidirectional Languages that are read from right-to-left.
Morpheme — unit of meaning in a language.
Mother Tongue — Native as well as first learned language of an individual.
MT — Abbreviation for machine translation.
Multi-Byte Character Set — Character set in which the number of bytes per character varies. Abbreviated MBCS.
Multi-Byte Language — Language that requires the use of a multi-byte character set.
Multiculturalization — Process by which the linguistic and cultural diversity among a group of people increases.
Multi-Language Vendor (MLV) — Language service provider that offers services in multiple language pairs. Abbreviated MLV.
Multilingual Workflow — Automation of business processes related to the development of multilingual products by managing multilingual content, usually through a translation management system, machine translation, and also translation memory.
Multinationalization — Process of expanding an organization’s presence into multiple nations. Commonly abbreviated M18N.
MultiTerm — The SDL Trados terminology tool. Latest version SDL MultiTerm 2009 as well as SDL MultiTerm Server 2009.

N

Native Language — First language that a human learns naturally, usually since childhood.
Networking (TM Server) — When networking during the translation it is possible to translate a text efficiently together with a group of translators.
Neural machine translation (NMT)[68] - is an approach to machine translation that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.
Neutral Spanish (also Universal Spanish) — Spanish that is mutually intelligible by speakers from various parts of the Spanish-speaking world and is not immediately identifiable with any single regional variety of the language. No standards exist for defining neutral Spanish.
Next-Wave Language — One of the languages of growing importance on the web.

O

OLIF — This stands for open lexicon interchange format.
Ontology — Description of the relationships between concepts, objects, and other entities within a given field.

P

Plain English — Method of writing English that employs a clear and simple style, usually for the purpose of improving readability. Among its features are using only active verbs (no passive voices) and making sure that each word has only one meaning.
PM — Abbreviation for “project manager.” Individual who carries out management and coordination tasks for a given translation project. Commonly abbreviated PM.
PPW — Abbreviation for price per word.
Post-Editing — Process by which one or more humans review, edit, and improve the quality of machine translation output.
Project Manager — Individual who carries out management and coordination tasks for a given translation project. Commonly abbreviated PM.
Pre-Editing — Process by which a text is edited prior to translation in order to clarify ambiguous terms and increase translatability.
Pre-Translation — Phase of the translation process in which documents are prepared for conversion into another language. This usually includes an automated analysis against translation memories. so that previously translated text is inserted in a file, therefore avoiding rework and associated costs.
Project Setup — Translation preprocessing steps include tasks such as glossary and style guide preparation, project planning, file preparation, content familiarization, and training.
Proofreading - Practice of checking a translated text to identify and correct spelling, grammar, syntax, and coherency and integrity errors, (usually carried out by a second linguist or translator. – not necessarily. Proofreading can be done by editors with no second language.
Pseudo-Localization — The process of faking translation of software or web applications before starting to localize the product for real. It is used to verify that the user interface is capable of containing the translated strings (length) and to discover possible internationalization issues.
Pseudo-Translation — A procedure which simulates how a translated document will look after translation and how much extra DTP or other work will be required before actual translation is done. This can help in setting the appropriate timelines of projects.

Q

QA — Abbreviation for quality assurance. Process designed to ensure translation quality. Specific processes followed with the purpose of minimizing errors.
QC — Abbreviation for quality control.
QI — Abbreviation for quality improvement. Quality improvement Process designed to ensure translation quality, in which the overall goal is to enhance performance.
Quality Assurance — Process designed to ensure translation quality. Specific processes with the purpose of minimizing errors.
Quality Control — Process designed to ensure translation quality, in which the target text is reviewed with the purpose of catching errors.
Quality Improvement — Quality improvement Process designed to ensure translation quality, in which the overall goal is to enhance performance.

R

RBMT — Abbreviation for rules-based machine translation.
Register — Measure of formality of language dependent upon the tone, terminology, as well as grammar implemented.
Repetition — Sentence or phrase that is repeated in the source text, often referred to a Translation Memory analysis.
Rich Media Content — Synonymous for interactive multimedia. A broad range of interactive digital media that exhibit dynamic motion, taking advantage of enhanced sensory features such as video, audio and animation.
ROI — Return on Investment. The performance measure that evaluates the efficiency of an investment.
Roman Numerals — System of numerals that evolved from the system used in classical Rome, often used for purposes such as numbering pages in introductions or prefaces.

S

SAE J2450 — A metric for the evaluation of translation quality, originally developed for the automotive sector. The metric comprises error categorization and severity.
SDK — Abbreviation for software development kit.
Segment — Sentence or phrase that is separated from the rest of a text based on language construction rules such as punctuation.
Segmentation — Its purpose is to choose the most useful translation units. Segmentation is a type of parsing. It is done monolingually using superficial parsing and alignment is based on segmentation.
Simplified Chinese — Contemporary written Chinese language used in mainland China and Singapore.
SimShip — Simultaneous shipment. Abbreviation for simultaneous shipment.
Single-Byte Character Set — Character set in which a single 8-bit byte represents a character.
Single sourcing (Single Source Publishing) —  Single sourcing or single source publishing – Process of producing a document in one format and automatically translating or publishing it into multiple formats.
SMT — Abbreviation for statistical machine translation.
Software Development Kit — Documentation and source code that facilitate the process of developing programs that interface with a given product. Commonly abbreviated SDK.
Software Engineering — Process of translating and adapting computer software from one language and culture into another. Also referred to as localization engineering.
Source Code — Code that is compiled to develop a program.
Source Count — Number of words in a text to be translated. The count of words in the document.
Source File — File that contains the source document in its original form, as opposed to a generated file, and is also required for localization processes.
Source Language — Language of the text that to be translated. The language.
Source Text — Text that needs translation.
Source Text Analysis — Analysis of the source text prior to translation that provides a better idea of the difficulty of the translation.
SRX — Segmentation Rules eXchange (SRX). Intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation may increase the leveraging that can be achieved.
Segmentation Rules eXchange (SRX) — The vendor-neutral standard for describing how translation and other language-processing tools segment text for processing.
Standard Line — Measure of the usual number of keystrokes per line in a certain text, which varies per country, and consists on average of 50 to 60 characters; commonly used for translation projects that are priced on a per line basis.
Statistical Machine Translation — Generation solutions that take a probability-based approach to translation through computational analysis of data, treating data as character strings, determining patterns, and also leveraging regularities. Commonly abbreviated SMT.
Style Guide — Document that describes the correct grammar, punctuation, spelling, style and numeric formats to ensure consistency and quality in a translated text.
Style Sheet — Document or template that describes the structure as well as format of a document, with instructions regarding fonts, page size, spacing, margins, paragraph styles and tag markups to ensure consistency and quality in a translated text.
Subtitles (also Captioning) — Subtitles are textual versions of the dialog in films and television programs. They usually display at the bottom of the screen. They can either a written form of the original language or a translation.
Synonym — Different words with almost identical or similar meanings, e.g. Student as well as pupil.
Syntax — The study of structure and elements that form grammatical sentences.

T

Tagging — Marking content in a document with information about its content.
Target Audience — Group of people who receive the information rendered by the interpreter in the target language.
Target Language — This is the language that a text is translated.
TBX — Abbreviation for term base eXchange. XML standard for exchanging terminological data.
Technical Translation — Translation of technical texts, such as user or maintenance manuals, catalogs, and data sheets.
TEP — Edit – Proofread Process.
Term — Word, phrase, symbol or formula that describes or designates a particular concept.
Term Extraction (also term harvesting) — Selecting terms in a text and placing them in a terminology database for analysis at a later time.
Terminology — Collection of terms.
Terminology Analysis — Process carried out prior to translation in order to analyze the vocabulary within a text. In addition, to analyze its meaning within the given context, often for the purpose of creating specialized dictionaries within specific fields.
Terminology Database — Electronic repository of terms and associated data.
Term Extraction — It can have as input a previous dictionary. Moreover, when extracting unknown terms, it can use parsing based on text statistics. Uses to estimate the amount of work involved in a translation job. This is very useful for planning and scheduling the work. Translation statistics usually count the words and estimate the amount of repetition in the text.
Termbase — Termbase is a database containing terminology and related information. Most termbases are multilingual and contain terminology data in a range of different languages.
Termbase Definition and the Structure of Entries — Termbase entries are structured in the following way:

  • Entry Level — Contains system fields and any descriptive fields that apply to the entry as a whole.
  • Index Level — Contains index fields with terms as content and any descriptive fields that apply to all terms in a given language.
  • Term Level — Contains any descriptive fields that apply to a given term.

The termbase definition for a given termbase specifies the number and type of fields that a termbase entry may contain and the entry structure that entries must conform to. The entry structure specifies:

  • The number and type of fields that may exist at each level in the entry.
  • The hierarchical structure of fields within each level; fields nested or not.

Termbase Fields — The different types of field are as follows:

  • Index fields —  Contain the terms for each entry. Each index corresponds to one of the termbase languages.
  • Descriptive fields —  Contain descriptive information about the entry or language as a whole, or about the individual terms. Each descriptive field has a defined data type. Types of data include text, picklist, number, date, Boolean and multimedia file.
  • Entry class field — Specifies the entry class to which the entry belongs.
  • System fields —  Created and maintained by the system. These fields store tracking information for the entry as a whole or for individual fields. System fields in MultiTerm include the Entry number field and the set of four history fields. The Entry Number field automatically assigns to each entry at entry level; for more information about history fields, see below.
  • History fields —  MultiTerm uses a set of four history fields: Created on, Created by, Modified on and Modified by. History fields automatically assigns to each entry at entry level, and also to each index at the index level. For all other fields in the termbase, history fields are optional. The require commissioning in the Termbase Wizard. Once assigned, history fields are created and maintained by the system.
  • Term Link — Term Link (formerly TBX Link) is an XML namespace-based notation that enables specific identified terms within an XML document to be linked to an external XML termbase, including those in TBX – TermBase eXchange (TBX) format. The purpose of the Term Link specification is to provide a rigorous notation for linking embedded terms in an XML document to their entries in an external termbase. Term Link is not yet an official standard, and its contents and format may change prior to official adoption.

Text Memory — The basis of the proposed Lisa OSCAR xml:tm standard. Text memory comprises author memory and also translation memory.
TBX — TermBase eXchange. This LISA standard revised and republished as ISO 30042. It allows for the interchange of terminology data including detailed lexical information.
Terminology — Terminology is the study of terms and their use.
Terminology Management — Quality translation relies on the correct use of specialized terms.
Textual Parsing — It is very important to recognize punctuation in order to distinguish for example the end of sentence from abbreviation. Thus, mark-up is a kind of pre-editing.
Term Extraction Tools — for extracting text automatically from text to create a termbase. Tools include SDL MultiTerm Extract 2009.
Term Base eXchange — XML standard for exchanging terminological data. Commonly abbreviated TBX.
Terminology Management —  Use of computer software to manage translation resources, create terminology databases for translation projects, and improve productivity and consistency.
Terminology Management Tool — Computer application. Facilitates terminology management.
Terminology Manager — Software application that facilitates the process of translation. This is done so, by interacting with a terminology database.
Terminology Software — Data processing tool that allows one to create, edit as well as consult text or electronic dictionaries.
Text Expansion — Process that often occurs during translation in which the total number of characters in the target text exceeds that of the source text.
Text Extraction — The process of placing the text from a source file into a word processing file. This is for the use by a linguist.
Text Style — Characteristics of terminology, style and sentence formation within a given text.
TMX — Abbreviation for translation memory eXchange. Translation Memory eXchange (TMX) is a standard that enables the interchange of translation memories between translation suppliers.
Traditional Chinese — Original Chinese ideographic character set used in Taiwan, Hong Kong, Macau and also some Chinese communities who have not adopted the simplified characters used in the People’s Republic of China.
Transcreation — When new content is developed or adapted for a given target audience instead of merely translating existing material. It may include copywriting, image selection, font changes, as well as other transformations that tailor the message to the recipient.
Transcription — Process of converting oral utterances into written form.
Translatability — Degree to which a text is rendered into another language.
Translate-Edit-Proof — Most common set of steps used for linguistic quality assurance in translation production processes. Commonly abbreviated TEP.
Translation — Process of rendering written communication from one language into another, or the output that results from this process.
Translation Capacity — Average number of characters, words, lines, or pages that a professional translator can translate. This is done within a given time frame, such as a day, week, or month.
Translation Kit (also Localization Kit) — set of files and instructions given to an LSP by a client. The purpose of a translation kit is to provide LSPs with expectations. The subject matter as well as target audience, files and format needing translation, delivery expectations, special considerations and instructions.
Translation Management — The management of the translation workflow, often including the content assets also.
Translation Management System (also TMS) — Program that manages translation as well as localization cycles, coordinates projects with source content management, and centralizes translation databases, glossaries, and additional information relevant to the translation process. Commonly abbreviated TMS.
Translation Memory — Translated text segments stored in a database. A translation memory is a system which scans a source text and tries to match strings (a sentence or part thereof) against a database of paired source and target language strings with the aim of reusing previously translated materials.
Translation Memory eXchange (also TMX) — Standard for converting translation memories from one format to another. Commonly abbreviated TMX.
Translation Memory Plus Machine Translation — A workflow as well as technology process. Terms not found in translation memory, automatically sent to the machine translation software for translation. The results are then fed back into the translation memory. Commonly abbreviated TMT.
Translation Memory System — Computer-aided translation tool that offers translation suggestions from translation memory.
Translation Portal — Web-based service that enables translation agencies, freelance translators, and customers to contact one another. This is ideal for not just contact, but also to exchange services.
Translation Unit — This is the segment of text treated as a single unit of meaning.
Transliteration — Process of converting words from a source text or audio file into a written text that facilitates pronunciation of the words.
TM — Translation Memory, see Translation Memory.
Trados — SDL Trados is a leading Translation Memory Editor used in translation. Latest versions SDL Trados Studio 2009 and SDL Trados TM Server.

U

Unicode — 16-bit character set that is capable of encoding the characters of the world’s major language scripts.
Unicode Standard — Industry encoding standard that allows computers to represent and also manipulate text in most of the world’s writing systems.
Updating TM — A new translation updated to a TM after the translator accepts it. As always in updating a database, there is the question what to do with the previous contents of the database. A “TM” is modifiable by changing or deleting entries in the TM. Some systems allow translators to save multiple translations of the same source segment.
UTF-16, UTF-32, UTF-8 — UTF-16 – Abbreviation for 16-bit Unicode transformation format. UTF-32 – Abbreviation for 32-bit Unicode transformation format. UTF-8 -Abbreviation for 8-bit Unicode transformation format.
UTX — Universal Terminology eXchange (UTX). A format standard specifically designed for user dictionaries of machine translation. Also used for general, human-readable glossaries. The purpose of UTX is to accelerate dictionary sharing and reuse by its extremely simple and practical specification.

V

Voice-Over — Technique in which a disembodied voice narrates a film, documentary, or other visual media.

W

Word Count — The total number of words in a text. Typically used to price translation projects.
Word Delimiter — Character, such as a ‘space’ or ‘carriage return,’ that marks a distinction between words in a text.
Workflow Management — Computer or web-based applications used to direct translation and also localization work processes.

X

XLIFF — XML Localisation Interchange File Format. XLIFF provides a single interchange file format understood by any localization provider. This is the preferred way of exchanging data in XML format in the translation industry.
XML — Abbreviation for eXtensible markup language. Metadata language used to describe other markup languages. Commonly abbreviated XML.
XML Text Memory (xml:tm) — xml:tm (XML-based Text Memory) is the vendor-neutral open XML standard for embedding text memory directly within an XML document using XML namespace syntax.

References

edit
  1. "Localization vs. Internationalization". www.w3.org. Retrieved 2018-03-11.
  2. "What is Localization?". GALA Global. 2015-08-05. Retrieved 2018-03-11.
  3. "Localization and Translation Glossary – Localization (l10n)". Phrase. Retrieved 2021-02-26.
  4. Esselink, Bert. “The Evolution of Localization.” www.intercultural.urv.cat/media/upload/domain_317/arxius/Technology/Esselink_Evolution.pdf.
  5. "The History of Localization - Localize". Localize. 2016-11-08. Retrieved 2018-03-12.
  6. 6.0 6.1 Esselink, Bert. A Practical Guide to Localization. Amsterdam; Philadelphia, John Benjamins Publishing Company, 2000.
  7. Schäler, Reinhard. "Localization and translation" Handbook of Translation Studies: Volume 1, Edited by Yves Gambier and Luc van Doorslaer, 2010, pp. 209–214.
  8. "2017 Resource Directory" MultiLingual, January 2017, p. 77.
  9. Microsoft Globalization Development Center
  10. ISO 639 Language Codes. https://www.iso.org/iso-639-language-codes.html
  11. ISO 3166 Country Codes. https://www.iso.org/iso-3166-country-codes.html
  12. the list of Locale ID (LCID). https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a
  13. "Transcreation". Wikipedia. 2018-12-27. https://en.wikipedia.org/w/index.php?title=Transcreation&oldid=875536191. 
  14. https://theatacompass.org/2016/04/20/transcreation-translating-and-recreating/
  15. https://www.taus.net/academy/reports/evaluate-reports/taus-transcreation-best-practices-and-guidelines
  16. https://docs.sdl.com/LiveContent/content/en-US/SDL%20WorldServer-v6/GUID-83C3F944-F1C2-4B61-8A7C-310C0849B5E3
  17. https://www.thoughtco.com/what-is-linguistic-variation-1691242
  18. https://www.w3.org/International/questions/qa-i18n
  19. http://msdn.microsoft.com/en-us/library/aa292135(v=vs.71).aspx
  20. http://www.nytimes.com/2016/01/07/world/europe/coca-cola-withdraws-from-social-media-war-over-crimea.html
  21. http://www.dailymail.co.uk/news/peoplesdaily/article-3729719/China-s-fury-Rio-Olympics-used-incorrect-national-flags.html
  22. "Major Indian Languages", Discover India, Archived 1 Jan 2007, Retrieved 2 March 2018.
  23. "iPhone X". Apple. Retrieved 2018-02-24.
  24. "iPhone X". Apple (in Chinese (China)). Retrieved 2018-02-24.
  25. "iPhone X". Apple (Portugal) (in European Portuguese). Retrieved 2018-02-24.
  26. "iPhone X". Apple (Egypt). Retrieved 2018-02-24.
  27. "Language Identifier Constants and Strings (Windows)". msdn.microsoft.com. Retrieved 2018-03-02.
  28. "Multilingual User Interface (Windows)". msdn.microsoft.com. Retrieved 2018-03-02.
  29. "Set Regions and Language Options". msdn.microsoft.com. Retrieved 2018-03-02.
  30. http://uk.practicallaw.com/5-618-1186
  31. https://iapp.org/news/a/is-the-gdpr-a-data-localization
  32. https://www.itic.org/policy/forced-localization/data-localization
  33. https://itif.org/publications/2017/05/01/cross-border-data-flows-where-are-barriers-and-what-do-they-cost
  34. https://www.americanbar.org/content/dam/aba/publications/antitrust_magazine/anti_fall2017_cohen.authcheckdam.pdf
  35. DePalma, Donald A. et Hills, Mimi."Localization Returns on Investment: Quantifying the Value of Localization in High-Tech", April 2010, p.1
  36. Benjamin, B. Sargent."Calculating ROI in Software Localization", Software Business Magazine, July 2002, p.2
  37. DePalma, Donald A. "Making a business case for localization when there’s little or no business to be had" "MultiLingual", January 2017, p.32
  38. https://www.investopedia.com/ask/answers/031815/what-formula-calculating-profit-margins.asp
  39. https://www.accountingtools.com/articles/2017/5/12/net-sales
  40. https://www.investopedia.com/terms/c/cogs.asp
  41. https://www.investopedia.com/terms/o/operatingmargin.asp
  42. https://www.investopedia.com/terms/g/general-and-administrative-expenses.asp
  43. https://docs.microsoft.com/en-us/globalization/localization/content-localization
  44. https://www.welocalize.com/assessing-in-house-outsourced-localization-models/
  45. https://www.globalme.net/blog/to-outsource-or-not-to-outsource-localization-weighing-the-pros-and-cons.
  46. https://en.wikipedia.org/wiki/Terminology
  47. https://en.wikipedia.org/wiki/Terminology_extraction
  48. "Machine Translation (MT): Everything You Need to Know". Memsource website. Retrieved 2021-07-26.
  49. Mandelin, Clyde (November 2017). Press Start to Translate. Fangamer. p. 26-28. ISBN 978-1-945908-86-6. https://www.fangamer.com/products/press-start-to-translate-legends-of-localization-book. 
  50. Jason Brownlee, Ph.D. "Encoder-Decoder Recurrent Neural Network Models for Neural Machine Translation". https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/
  51. Jason Brownlee. "A Gentle Introduction to Neural Machine Translation". https://machinelearningmastery.com/introduction-neural-machine-translation/
  52. Philipp Koehn. "Statistical Machine Translation Draft of Chapter 13: Neural Machine Translation Philipp Koehn". https://arxiv.org/pdf/1709.07809.pdf.
  53. Jason Brownlee, Ph.D. "Encoder-Decoder Recurrent Neural Network Models for Neural Machine Translation". https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/
  54. "Zero-Shot Translation with Google's Multilingual Neural Machine Translation System". Google AI Blog. Retrieved 2019-03-20.
  55. "Google Neural Machine Translation". Wikipedia. 2019-01-24. https://en.wikipedia.org/w/index.php?title=Google_Neural_Machine_Translation&oldid=879951080. 
  56. "Zero-Shot Translation with Google's Multilingual Neural Machine Translation System". Google AI Blog. Retrieved 2019-03-20.
  57. 57.00 57.01 57.02 57.03 57.04 57.05 57.06 57.07 57.08 57.09 57.10 57.11 Spacinsky, Denise. "Careers in Localization", MultiLingual, December, 2013.
  58. "Multimedia Translation". Wikipedia. 2019-03-30. https://en.wikipedia.org/wiki/Multimedia_translation. 
  59. https://engagemedia.org/help/best-practices-for-online-subtitling/
  60. https://www.esist.org/wp-content/uploads/2016/06/Code-of-Good-Subtitling-Practice.PDF.pdf
  61. Díaz Cintas, Jorge and Aline Remael. (2007). Audiovisual Translation: Subtitling. P.45-54; 80-99; 172-180.
  62. Szarkowska, Agnieszka. (2018). "Report on the results of the SURE Project study on subtitle speeds and segmentation" Report on the results of the SURE Project study on subtitle speeds and segmentation. London.
  63. https://en.wikipedia.org/wiki/Video_game_localization
  64. https://www.honyakuctren.com/topics/pdf/GameLocalizationPamphlet.pdf
  65. http://www.oneskyapp.com/blog/game-localization-mistakes/
  66. Heather M. Chandler & Stephanie O. Deming, The Game Localization Handbook (2012). Jones and Barlett Learning, 2nd ed.
  67. "Translation Glossary of Terms". Retrieved 2019-02-25.
  68. "Neural machine translation". Retrieved 2019-02-25.

Model Life Cycle Engine