Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Hungarian Language in the Digital Age — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitization of information, knowledge and everyday communication affect our language? Will our language change or even disappear?

All our computers are linked together into an increasingly dense and powerful global network. The girl in Ipanema, the officer in Budapest and the engineer in Delhi can all chat with their friends on Facebook, but they are unlikely ever to meet one another in online communities and forums. If they are worried about how to treat earache, they will all check Wikipedia to find out all about it, but even then they won’t read the same article. When Europe's netizens discuss the effects of the Fukushima nuclear accident on European energy policy in forums and chat rooms, they do so in cleanly-separated language communities. What the Internet connects is still divided by the languages of its users. Will it always be like this?

Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. What are the Hungarian language’s chances of survival?

With its approx. 13 million speakers, Hungarian is 12th on the list of the most populous European languages. It is the official language of the Republic of Hungary, where ca. 97% of the population of 10 million claims Hungarian as their native language. It is also spoken by Hungarian communities in the seven neighbour countries, the largest one being an approx. 1.5 million community in Romania. Additionally, emigrant communities use it worldwide, primarily in the United States, Canada and Israel.

The Hungarian language is an island in Europe – most European languages belong to the Indo-European family of languages, but not Hungarian. It is a Finno-Ugric language, related to Finnish, Estonian and a number of minority languages spoken in the Baltic states and in Russia. It is the most widely spoken non-Indo-European language in Europe, but contrary to world languages such as English and Chinese, or to more commonly used European languages such as German and French, Hungarian does not play a prominent role on the international scene.

There are plenty of complaints in Hungary about the ever-increasing use of Anglicisms, and some even fear that the Hungarian language will become riddled with English words and expressions. But our study suggests that this is misguided. The Hungarian language has already survived the impact of new words and terms from the Old Turkish on the steppes, then later from the Slavs in the Carpathian basin. Moreover, Hungary was a part of the Ottoman Empire for 150 years in the 16th-17th century, then a part of the Habsburg Empire till the first half of the 20th century. In those times the Latin and German influence was the strongest. One good antidote to losing our lovely little Hungarian words and phrases is to actually use them – frequently and consciously; linguistic polemics about foreign influences and government regulations do not usually help. Our main concern should be not the gradual Anglicisation of our language, but rather its complete disappearance from major areas of our personal lives. Not science, aviation and the global financial markets, we mean the many areas of life in which it is far more important to be close to a country’s citizens than to international partners – domestic policies, for example, administrative procedures, the law, culture and shopping.

The status of a language depends not only on the number of speakers, but also on the presence of the language in the digital information space and software applications. The existence of a quite active Hungarian-speaking web community is well demonstrated by the fact that the Hungarian Wikipedia is the 19th largest, ranking higher than commonly used European languages such as Turkish, Romanian or Danish, and world languages such as Arabic or Korean. A few important international software product is available in Hungarian versions, however, due to the special characteristics of Hungarian, the adaptation of English-based applications is quite difficult. Another reason that hinders the development of expensive technologies for Hungarian is the fact that the Hungarian market is quite small.

In the field of language technology, we can be cautiously optimistic about the current state of Hungarian language technology support. There is a viable LT research community in Hungary, which has been supported in the past by national and recently, increasingly, European funding. Currently both of the two EU-funded projects that are coordinated by Hungary in the competitive ICT field come from the language technology domain. A number of large-scale resources and state-of-the-art technologies have been produced and distributed for Hungarian. However, the scope of the resources and the range of tools are still very limited when compared to the resources and tools for the English language, and they are simply not sufficient in quality and quantity to develop the kind of technologies required to support a truly multilingual knowledge society.

Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between 57 languages, or its European counterpart itranslate4.eu (the product of a Hungarian led consortium), IBM’s supercomputer Watson that was able to defeat the US-champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.

The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios.

The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements.

This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.

However, there is a yawning technological gap between English and Hungarian, and it is currently getting wider. There is a lack of continuity in research and development funding. Short-term coordinated programmes tend to alternate with periods of sparse or zero funding. In addition, there is an overall lack of coordination with programmes in other EU countries and at the European Commission level. As a result, Hungary (and Europe in general) lost several very promising high-tech innovations to the US, where there is greater continuity in their strategic research planning and more financial backing for bringing new technologies to the market. In the race for technology innovation, an early start with a visionary concept will only ensure a competitive advantage if you can actually make it over the finish line. Otherwise all you get is an honorary mention in Wikipedia.

Nevertheless, there is still a high research potential in Hungary and the EU. Apart from internationally renowned research centres and universities, there are a number of innovative small- and medium-sized language technology companies that manage to survive through sheer creativity and immense efforts, despite the lack of venture capital or sustained public funding.

Although Hungary has supported important developments in corpus building and language resource generation, language technology resources and tools for Hungarian clearly do not yet reach the quality and coverage of comparable resources and tools for the English language, which is in the lead in almost all language technology areas. Every international technology competition tends to show that results for the automatic analysis of English are far better than those for Hungarian. This holds true for extracting information from texts, grammar checking, machine translation and a whole range of other applications.

Many researchers reckon that these setbacks are due to the fact that, for fifty years now, the methods and algorithms of computational linguistics and language technology application research have first and foremost focused on English. However, other researchers believe that English is inherently better suited to computer processing. And languages such as Spanish and French are also a lot easier to process than Hungarian using current methods. This means that we need a dedicated, consistent, and sustainable research effort if we want to be use the next generation of information and communication technology in those areas of our private and work life where we live, speak and write Hungarian.

Summing up, despite the prophets of doom the Hungarian language is not in danger, even from the prowess of English language computing. However, the whole situation could change dramatically when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with very small populations of speakers. If not, even ‘larger’ languages will come under severe pressure.

The dentist jokingly warns: "Only brush the teeth you want to keep". The same principle also holds true for research support policies: You can study every language under the sun all you want, but if you really intend to keep them alive, you also need to develop technologies to support them.