Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Italian Language in the Digital Age — Executive Summary

During the last 60 years, Europe has become a distinct political and economic structure. Culturally and linguistically, it is rich and diverse. However, from Portuguese to Polish and Italian to Icelandic, everyday communication between Europe’s citizens, within business and among politicians is inevitably confronted with language barriers. The EU’s institutions spend about one billion Euros a year on maintaining their policy of multilingualism, i.e., translating texts and interpreting spoken communication. The European market for translation, interpretation, software localisation and website globalisation was estimated at 8.4 billion in 2008 and is expected to grow by 10% per annum. Are these expenses necessary and are they even sufficient? Despite this high level of expenditure, the translated texts represent only a fraction of the information that is available to the whole population in countries with a single predominant language, like the USA, China or Japan. Language technology and linguistic research can make a significant contribution to removing the linguistic borders. Combined with intelligent devices and applications, language technology will help Europeans talk and do business together even if they do not speak a common language.

The Italian economy takes advantage from the European single market but language barriers can bring business to a halt, especially for SMEs who do not have the financial means to reverse the situation. The only (unthinkable) alternative to a multilingual Europe would be to allow a single language to take a predominant position and replace all other languages in transnational communication. Another way to overcome language barriers is to learn foreign languages. Yet, considering the multitude of European languages, including 23 official languages of the European Union and some 60 other languages, language learning alone is not sufficient to provide for communication, trade and information transfer across all language borders. Without technological support, e.g., machine translation, the European linguistic diversity is an insurmountable obstacle for Europe’s citizens, economy, political debate, and scientific progress.

Language technology is a key enabling technology for sustainable, cost-effective and socially beneficial solutions to language problems. Language technologies will offer European stakeholders tremendous advantages, not only within the common European market, but also in trade relations with non-European countries, especially emerging economies. Language technology solutions will eventually serve as a unique bridge between Europe’s languages. An indispensable prerequisite for their development is first to carry out a systematic analysis of the linguistic particularities of all European languages, and the current state of language technology support for them. As early as the late 1970s, the EU realised the profound relevance of language technology as a driver of European unity, and began funding its first research projects, such as EUROTRA. After a longer period of sparse funding on the European level, the European Commission set up a department dedicated to language technology and machine translation a few years ago. Currently, the EU is supporting language technological projects such as EuroMatrix and EuroMatrix+ (since 2006) and iTranslate4 (since 2010), which, through basic and applied research, generate resources for establishing high quality language technology solutions for all European languages. These selective funding efforts led to a number of valuable results. For example, the translation services of the European Union now use the Moses open-source machine translation software, which has been mainly developed in European research projects. However, these projects never led to a concerted European effort, where the EU and its member states systematically pursue the common goal of technologically supporting all European languages. Rather than building on the outcomes of its research projects, Europe has tended to pursue isolated research activities with a less pervasive impact on the market. Thus, an intensive phase of funding has eventually not led to sustainable results. In many cases, research funded in Europe turned out to bear fruit, but outside of Europe. The winners of this general development include Google and Apple. In fact, many of the predominant actors in the field today are privately-owned for-profit enterprises based in Northern America. Most of their language technology systems rely on imprecise statistical approaches that do not make use of deeper linguistic methods and knowledge. For example, sentences are often automatically translated by comparing each new sentence against thousands of sentences previously translated by humans. The quality of the output largely depends on the size and quality of the available data. While the automatic translation of simple sentences in languages with sufficient amounts of available textual data can achieve useful results, shallow statistical methods are doomed to fail in the case of languages with a much smaller body of sample data or in the case of new sentences with complex structures. Analysing the deeper structural properties of languages is the only way forward if we want to build applications that perform well across the entire range of European languages.

Concerning research in Europe, the prerequisites are optimal: Through initiatives like CLARIN, META-NET, and FLaReNet, the research community is well-connected; in META-NET and FLaReNet a long-term research agenda is currently evolving, and language technology is slowly but steadily strengthening its role within the European Commission. Still, in some respect, our position is worse compared to other multilingual societies. Despite fewer financial resources, countries like India (22 official languages) and South Africa (11 official languages) have set up long-term national programmes for language research and technology development.

What is missing in Europe is the lack of awareness, political will and the courage to strive for an international leading position in this technology area through a concerted funding effort. Drawing on the insights gained so far, today’s hybrid language technology mixing deep processing with statistical methods should be able to bridge the gap between all European languages and beyond.

However, as this series of white papers shows, there is a dramatic difference between Europe’s member states in terms of both the maturity of the research and in the state of readiness with respect to language solutions. Italian, as one of the bigger EU languages, is better equipped than many other languages, but further research is needed before truly effective language technology solutions will be ready for everyday use and in order not to lag behind the much better resourced English language. The percentage of global Internet users who speak Italian can be expected to decrease in the near future. As a consequence, Italian may experience in the upcoming decades the problem of being under represented on the web especially compared to English, a problem in which a fundamental role will be played by language technologies. The capability of a language to be “digitally” present in Internet-based applications and services has become a crucial element to maintain the cultural vitality of the language itself.

On the other hand, Internet applications and services can be sustained only if adequate infrastructures and technologies are present. In Italy, research on HLT is carried on by more than 15 research labs, with an active and relevant presence in the international research community. Considerable effort has been invested in Language Technologies research in Italy since 1997, when Human Language Technology was designated a National research policy. Unfortunately, funding at the national level is currently very limited, and little usable language technology is built in comparison to the anticipated need.

In spite of the accomplishments obtained in the field of language technologies for Italian, the current state of technologies is not enough to guarantee a digital dimension to Italian such as it is required by applications and services of the future Internet. In this volume, we will present an introduction to language technology and its core application areas as well as an evaluation of the current situation of language technology support for Italian. This white paper series complements the other strategic actions taken by META-NET (see the appendix for an overview). Up-to-date information such as the current version of the META-NET vision paper and the Strategic Research Agenda (SRA) can be found on the META-NET web site: http://www.meta-net.eu.