Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Finnish Language in the Digital Age — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and searching for information; and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitization of information, knowledge and everyday communication affect our language? Will our language change or even disappear?

All our computers are linked together into an increasingly dense and powerful global network. The girl in Ipanema, the customs officer in Imatra and the engineer in Kathmandu can all chat with their friends on Facebook, but they are unlikely ever to meet one another in online communities and forums. If they are worried about how to treat earache, they will all check Wikipedia to find out all about it, but even then they won’t read the same article. When Europe’s netizens discuss the effects of the Fukushima nuclear accident on European energy policy in forums and chat rooms, they do so in cleanly-separated language communities. What the internet connects is still divided by the languages of its users. Will it always be like this?

In science fiction movies, everyone speaks the same language. Could it be Finnish, even though astronauts rarely mouth Finnish words as naturally as they speak English? Many of the world’s 6,000 languages will not survive in a globalized digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. What are the Finnish language’s chances of survival?

With more than 5 million speakers, the Finnish language is fairly well positioned compared to many languages. There are 4 public television channels with Finnish-language programmes and more than 30 private TV broadcasters. Most international movies have Finnish subtitles. After Finland became a full member of the EU, the Finnish language has probably somewhat strengthened its position and status.

The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications. Here too, the Finnish language is fairly well-placed: all important international software products are available in Finnish versions; the Finnish Wikipedia has more than 290,000 articles and the Finnish top level domain .fi is very popular.

In the field of language technology, the Finnish language is moderately equipped with products, technologies and resources. There are applications and tools for speech synthesis, speech recognition, information retrieval, spelling correction and grammar checking. There are also a few applications for automatically translating language, even though these often fail to produce linguistically and idiomatically correct translations, especially when Finnish is the target language. This is partly due to the specific linguistic characteristics of the Finnish language.

Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between 57 languages, IBM’s supercomputer Watson that was able to defeat the US champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.

The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios. For example, it will help immigrants to learn the Finnish language and to integrate more fully into the country’s culture.

The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements.

This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the essence of questions and generate rich and relevant answers.

However, there is a yawning technological gap between English and Finnish, and it is currently getting wider. After a very successful research record in the 1980s and 1990s, Finland is currently losing its role as a contributor of language technology. Basic language technology research was funded at a Centre of Excellence level in the 1980s and 1990s, which resulted in a number of spin-off enterprises based on the technologies developed.

After the period of basic research funding only small scale industrial project funding has been provided by Tekes, the Finnish Funding Agency for Technology and Innovation. As a result, Finland (and Europe in general) lost some very promising high-tech innovations to the US, where there is greater continuity in their strategic research planning and more financial backing for bringing new technologies to the market. In the race for technology innovation, an early start with a visionary concept will only ensure a competitive advantage if you can actually make it over the finish line. Otherwise all you get is an honorary mention in Wikipedia.

After this decline in language technology basic research funding in Finland, many experts migrated to diverse small companies. US-based companies used their resources to develop technologies into their own industrial strength products. Nevertheless, there is still a very high research potential in Finland. Apart from internationally renowned research centres and universities, there are a number of innovative small and medium-sized language technology companies that manage to survive through sheer creativity and immense efforts, despite the lack of venture capital or sustained public funding.

Due to early commercial successes for Finnish language technology, the availability of basic tools such as parsers and lexicons in the research community for processing Finnish became limited. As an odd consequence, technology specifically adapted to the Finnish language was only marginally involved in Finnish research projects and therefore most of the research and development prototypes used English.

Because of the lack of adequate language resources and basic research funding, the Finnish language has been hardly present in any international technology competitions. This holds true for extracting information from texts, grammar checking, machine translation and a whole range of other applications.

Many researchers believe that these setbacks are due to the fact that, for fifty years now, the methods and algorithms of computational linguistics and language technology application research have first and foremost focused on English. In a selection of leading conferences and scientific journals published between 2008 and 2010, there were 971 publications on language technology for English and only 10 for Finnish. Language technology for Danish and Swedish was better represented with 26 and 19 articles respectively, while Norwegian trailed behind with only 2 articles.

However, other researchers believe that English is inherently better suited to computer processing. Languages such as Spanish and French are also a lot easier to process than Finnish using current methods. This means that we need a dedicated, consistent and sustainable research effort if we want to use the next generation of information and communication technology in those areas of our private and work life where we speak and write Finnish.

Summing up, despite the prophets of doom the Finnish language is not in danger, even from the prowess of English language computing. However, the whole situation could change dramatically when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with small populations of speakers.

The dentist jokingly warns: “Only brush the teeth you want to keep”. The same principle also holds true for research support policies: you can study every language under the sun all you want, but if you really intend to keep them alive, you also need to develop technologies to support them.

META-NET’s vision is high-quality language technology for all languages in order to achieve political and economic unity through cultural diversity. The technology will help tear down existing barriers and build bridges between Europe’s languages. This requires all stakeholders – in politics, research, business, and society – to unite their efforts for the future.

Drawing on the insights gained so far, it appears that today’s ’hybrid’ language technology mixing deep processing with statistical methods will be able to bridge the gap between all European languages and beyond. As this series of white papers shows, there is a dramatic difference between Europe’s member states in terms of both the maturity of the research and in the state of readiness with respect to language solutions.

This white paper series complements other strategic actions taken by META-NET (see the appendix for an overview). Up-to-date information such as the current version of the META-NET vision paper or the Strategic Research Agenda (SRA) can be found on the META-NET web site: http://www.meta-net.eu.