Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Latvian Language in the Digital Age — Executive Summary

Information technology impacts our lives every day. We typically use computers for writing, communicating, calculating, and searching for information, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers – smartphones – in our pockets and use them to make phone calls, write e-mails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear?

All our computing devices are linked together into an increasingly dense and powerful global network. However, when Europe's citizens discuss the effects of the Fukushima nuclear accident on European energy policy in online forums and chat rooms, they do so in distinctly separate language communities. What the Internet connects is still divided by the languages of its users. Will it always be like this?

Many of the world’s 6,900 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Many others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. What are the chances of survival for the Latvian language?

With about 1.5 million native speakers worldwide, the Latvian language is in fact approximately the 150th most spoken language in the world. Latvian is the sole official language of the Republic of Latvia and one of the official languages of the European Union.

Although a relatively high number, 2,035 books and booklets were published in Latvia in 2010, the total number of print copies was only 3,33 million compared to 28,355 million copies in 1991. Latvian can be heard on numerous radio stations, two Latvian language public television channels and several private TV channels. In addition, many international movies are dubbed into Latvian for TV and theatre viewing.

Latvia is still in the process of recovering from the impact of mass imigration and segregation of its education system by language imposed by the Soviet regime from the 1950’s through the 1980’s. As a result, for nearly a third of the Latvian population Russian is their native language. During this period Russian was the only language used in a large number of Latvian schools. As an outcome, in 1989 only a fifth of the Russian population considered Latvian their secondary language. The diminished role of Latvian created an anxiety about its gradual extinction.

Now Latvian is protected by a national language policy based on the principle that Latvian is the only official language in Latvia and it is the language for coalescing the different ethnic groups living in Latvia. At the same time the national policy ensures the preservation, development, and use of minority languages in different areas. The government is trying to overcome language segregation by encouraging bilingual education and requiring public secondary schools to teach at least 60% of subjects in Latvian.

At the writing of this paper, now more than 75% of native Russian speakers have good or average Latvian skills, among them almost all (94%) young people (17–25) are more or less proficient in Latvian.

There are concerns in Latvia about the threat of the ever-increasing use of Anglicisms, and fears that the Latvian language will become riddled with English words and expressions. But in spite of extensive and various contacts with other languages (Russian, English, German, Polish, Swedish), Latvian has survived and the language maintains its stability. However, as a result of centuries of foreign domination, in modern Latvian one can trace numerous lexical and morphological influences – loanwords, calques, and borrowed idioms which have been fully assimilated.

One good perscription for cultivating our lovely Latvian words and phrases is to actually use them, frequently and consciously; linguistic polemics about foreign influences and government regulations are usually not helpful. Our main concern should not be the gradual Anglicisation of our language, but its complete disappearance from use in major areas of our personal and public lives.

The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications. Here the Latvian language is not so well-placed: Latvian is used on less than 0.1% of the world’s websites lagging behind languages like Lithuanian or Slovenian. Although several global software products are available in Latvian versions, many users prefer English or Russian versions.

In the field of language technology, the Latvian language is not so well equipped with products, technologies and resources. Although there are applications and tools for spelling and grammar checking, tokenisation and part of speech tagging, there are rather big gaps that should be urgently filled, especially in respect to speech technologies and large and qualitative language resources. There are electronic dictionaries and applications for automatic translation from and into Latvian. While being very useful to get a general meaning of foreign language texts these are not yet able to produce linguistically and idiomatically correct translations, especially when Latvian is the target language due to the specific linguistic characteristics of Latvian.

Information and communication technologies (ICT) are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands written and spoken sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between numerous languages, IBM’s supercomputer Watson that was able to defeat the US-champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.

The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios. For example, it will help local businesses to find customers abroad or immigrants to learn the Latvian language and better integrate into the country’s culture.

The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements.

This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.

Not all European languages are equally well prepared for this future. There is a yawning technological gap between English and Latvian, and it is currently getting wider. We see this gap not only in comparison with larger languages, but also comparing with some lesser spoken languages that have benefited from systematic national efforts in advancing language technologies.

Language technology has never been a priority research field in Latvia. There is no dedicated language technology programme, development and research activities are fragmented and mostly organised around short-term projects that complicate the development of larger resources and long-term cooperation between institutions, and only few courses on language technology related studies are available. However, through state research programmes in ICT and Latvian Studies several successful projects were carried out in 2005–2009. After this period the field got far less support resulting in fewer activities in semantics, controlled languages and machine translation. However, there is still a high research potential at research institutes and universities.

Apart from research centres and universities, there are some remarkable achievements by innovative language technology companies. By focusing on usable applications and leading pan-European industry and research collaboration projects co-funded by the European Commission, strong advances have been achieved in translation technologies.

Every international technology competition tends to show that results for the automatic analysis of English are far better than for other languages, including Latvian. Many researchers reckon that these setbacks are due to the fact that, for fifty years now, the methods and algorithms of computational linguistics and language technology application research have first and foremost focused on English. However, other researchers believe that English is inherently better suited to computer processing. Languages such as Spanish and French are also much easier to process than Latvian using current methods. This means that we need a dedicated, consistent, and sustainable research effort if we want to use the next generation of information and communication technology in those areas of our private and work lives where we live, speak and write Latvian.

The Latvian language is not in an immediate danger, even from the prowess of English language computing. However, the whole situation could change dramatically when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with very small populations of speakers. If not, even ‘larger’ languages will come under severe pressure. If Latvian is to survive as a viable national language in the developed world, it must be able to meet IT demands. Consequently, systematic efforts and investments in language technology must form an essential part of its language preservation policy.