Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The English Language in the Digital Age — Executive Summary

In the space of two generations, much of Europe has become a distinct political and economic entity, yet culturally and linguistically Europe is still very diverse. While such diversity adds immeasurably to the rich fabric of life, it nevertheless throws up language barriers. From Portuguese to Polish and Italian to Icelandic, everyday communication between Europe’s citizens, as well as communication in the spheres of business and politics, is inevitably hampered. To take one example, together, the EU institutions spend about a billion euros a year on maintaining their policy of multilingualism, i.e., on translation and interpreting services. Moreover, we tend to be shackled and blinkered by our linguistic environment, without, in many cases, being aware of this: we may be searching the Web for some piece of information and apparently fail to find it, but what if this information actually exists, is in fact findable, but just happens to be expressed in a different language to ours and one we do not speak? Much has been said about information overload, but here is a case of information overlook that is conditioned entirely by the language issue.

One classic way of overcoming the language barrier is to learn foreign languages. However, the individual rapidly reaches the limits of such an approach when faced with the 23 official languages of the member states of the European Union and some 60 other European languages. We need to find other means to overcome this otherwise insurmountable obstacle for the citizens of Europe and its economy, its capacity for political debate, and its social and scientific progress.

So, how can we alleviate the burden of coping with language barriers? Language technology incorporating the fruits of linguistic research can make a sizable contribution. Combined with intelligent devices and applications, language technology can help Europeans talk and do business with each other, even if they do not speak a common language.

However, given the Europe-wide scale of the problem, a strategic approach is called for. The solution is to build key enabling language technologies. These can then be embedded in applications, devices and services that support communication across language barriers in as transparent and flexible a way as possible. Such an approach offers European stakeholders tremendous advantages, not only within the common European market, but also in trade relations with non-European countries, especially emerging economies. These language technology solutions will eventually serve as an invisible but highly effective bridge between Europe’s languages.

With around 375 million native speakers worldwide, English is estimated to be the third most spoken language in the world, coming behind only Mandarin Chinese and Spanish. Accordingly, since the dawn of work on language technology some 50 years ago, a large amount of effort has been focussed on the development of resources for English, resulting in a large number of high quality tools for tasks such as speech recognition and synthesis, spelling correction and grammar checking. Even today, the language technology landscape is dominated by English resources. Proof of this is evident just by looking at what has been going on in the research sphere: a quick scan of leading conferences and scientific journals for the period 2008-2010 reveals 971 publications on language technology for English, compared to 228 for Chinese and 80 for Spanish. Also, for automated translation, systems that translate from another language into English tend to be the most successful in terms of accuracy.

For many other languages, an enormous amount of research will be required to produce language technology applications that can perform at the same level as current applications for the English language. However, even for English, considerable effort is still needed to bring language technology to the desired level of a pervasive, ubiquitous and transparent technology. As the analysis provided in this report reveals, there is no area of language technology that can be considered to be a solved problem. Even if a large number of high quality software tools exist, problems of maintaining, extending or adapting them to deal with different domains or subjects remain largely unsolved. In addition, whilst the automatic detection of grammatical structure for English can already be carried out to quite a high degree of accuracy, the same cannot yet be said for deeper levels of semantic analysis, which will be required for next generation systems that are able to understand complete sentences or dialogues. In general, systems that can carry out robust, automated semantic analysis, e.g., to generate rich and relevant answers from an open-ended set of questions, are still in their infancy. However, some forerunners of these more intelligent systems are already available, which give a flavour of what is to come. These include IBM’s supercomputer Watson, which was able to defeat the US champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions.

Automated translation and speech processing tools currently available on the market also still fall short of what would be required to facilitate seamless communication between European citizens who speak different languages. On the face of it, free online tools, such as the Google Translate service, which is able to translate between 57 different languages, appear impressive. However, even for the best performing automatic translation systems (generally those whose target language is English), there is still often a large gap between the quality of the automatic output and what would be expected from an expert translator. In addition, the performance of systems that translate from English into another language is normally somewhat inferior.

The dominant actors in the field are primarily privately-owned for-profit enterprises based in Northern America. As early as the late 1970s, the European Commission realised the profound relevance of language technology as a driver of European unity, and began funding its first research projects, such as EUROTRA. In the UK, the then Department of Trade and Industry made a substantial co-investment to support UK EUROTRA participants. Many of today’s language technology research centres in the EU exist due to the initial seed funding from that particular project. At the same time, national projects were set up that generated valuable results, but never led to a concerted European effort. In contrast to this highly selective funding effort, other multilingual societies such as India (22 official languages) and South Africa (11 official languages) have recently set up long-term national programmes for language research and technology development. The predominant actors in language technology today rely on imprecise statistical approaches that do not make use of deeper linguistic methods and knowledge. For example, sentences are often automatically translated by comparing each new sentence against thousands of sentences previously translated by humans, in an attempt to find a match, or a statistically close match. The quality of the output largely depends on the size and quality of the available translated data. While the automatic translation of simple sentences into languages with sufficient amounts of available reference data against which to match can achieve useful results, such shallow statistical methods are doomed to fail in the case of languages with a much smaller body of sample data or, more to the point, in the case of sentences with complex structures. Unfortunately, our complex social, business, legal and political interactions require concomitantly complex modes of linguistic expression.

The European Commission therefore decided to fund projects such as EuroMatrix and EuroMatrixPlus (since 2006) and iTranslate4 (since 2010), which carry out basic and applied research, and generate resources for establishing high quality language technology solutions for all European languages. Building systems to analyse the deeper structural and meaning properties of languages is the only way forward if we want to build applications that perform well across the entire range of European languages.

European research in this area has already achieved a number of successes. For example, the translation services of the European Commission now use the MOSES open source machine translation software, which has been mainly developed through European research projects. In general, Europe has tended to pursue isolated research activities with a less pervasive impact on the market. However, the potential economic value of these activities can be seen in companies such as the UK-based SDL, which offers a range of language technologies, and has 60 offices in 35 different countries.

Drawing on the insights gained so far, it appears that today’s “hybrid” language technology, which mixes deep processing with statistical methods, will help to bridge the significant gaps that exist with regard to the maturity of research and the state of practical usefulness of language technology solutions for different European languages. The assessment detailed in this report reveals that, although English-based systems are normally at the cutting edge of current research, there are still many hurdles to be overcome to allow English language technology to reach its full potential. However, the thriving language technology community that exists in English-speaking countries, both in Europe and worldwide, means that there are excellent prospects for further positive developments to be made. META-NET’s long-term goal is to introduce high-quality language technology for all languages. The technology will help tear down existing barriers and build bridges between Europe’s languages. This requires all stakeholders – in politics, research, business and society – to unite their efforts for the future.

This white paper series complements other strategic actions taken by META-NET (see the appendix for an overview). Up-to-date information such as the current version of the META-NET vision paper and the Strategic Research Agenda (SRA) can be found on the META-NET web site: http://www.meta-net.eu.