Papago learns a thing or two from foreign experts
Because of this exclusivity, machine translation apps in East Asian countries are mostly led by domestic tech firms. What makes the task even more difficult is the relative shortage of language data compared to English and other western languages.
In Korea, Naver has the lead. Its translation app Papago has the largest number of monthly active users in the service sector, surpassing Google Translation in Korea last August. The service currently translates 13 different languages.
Machine learning scientists Stephane Clinchant and Vassilina Nikoulina joined Naver Labs’ Papago team in Korea last August. The married couple are the first two visiting researchers from Naver Labs Europe in Grenoble, France, where they worked before its acquisition by the local portal site operator in 2017. They each have more than 10 years of experience in machine learning, data retrieval and machine translation.
In an interview with the Korea JoongAng Daily last week, Clinchant and Nikoulina discussed machine translation for a Korea-based service and their experience working closer to local customers. Below are edited excerpts.
Q. As non-Korean speakers, what kind of advice do you give to the local Papago development team?
Nikoulina: For a long time already, the algorithms in machine translation are language independent. You take existent text like two in English and Korean, for instance, and you train them in a mathematic model that will learn how to translate words into a specific combination, based on statistics.
Clinchant: When explaining this I like using the reference of a car: You have the engine and gasoline that burns. In AI or machine translation, the algorithm is the engine. It takes data as an input, transforms it and produces movement. The data can be English to French, English to Korean but whatever it is, after a stage we call “pre-processing” it will be put inside the engine that will consume it. That’s why the engine itself is language independent.
How do you overcome the shortage of Korean-related data compared to western languages?
Nikoulina: We mostly use algorithms that have been developed for many other languages. There is a lot of research going on for languages that are rich in data like English and French. But there’s also a line of research for those with less data that we also try to apply in case of Papago. When you work with Asian languages in which you do not have a lot of data available, there are different techniques to how you can treat it to exploit different sources of information.
Clinchant: For example you can train the algorithm to speak multiple languages at the same time. Instead of training the system to translate Korean to Vietnamese, you go from Korean to English to Vietnamese at the same time so that you get more data. You can always have more data, but I think Papago already has quite a lot. Adjusting the size of the engine to the size of the data is one thing you can do.
Are there any language-dependent difficulties in developing a translation service for Korean?
Nikoulina: For me, every language has its difficulties and English is a particularly easy language to test from a grammatical point of view. But Korean is one of the morphologically rich languages and it has the same challenges as Russian, Turkish or German. You can apply exactly the same technique that we apply for German to Korean. So there are a lot of things we transfer from research in western languages. But the main feasibility in Korean and other Asian languages is honorifics.
You came to Korea last August. Have you encountered any cultural differences working here compared to Naver Labs Europe?
Clinchant: At the beginning, people wouldn’t say no directly. So you don’t say things directly, but you have to get some kind of intention or grasp why the person said a certain thing. We did have prior knowledge but it took us a bit to adapt. […] I think Koreans also have a different perception of time, at least compared to French people. Things here go very fast - in the way you do things and project yourself in the future.
As you said, Koreans are not always direct in expressing negation. Do cultural differences like this pose difficulties in machine translation?
Nikoulina: It’s the same when you are in the situation of a human interpreter; when you interpret Korean to English you don’t translate the intention, you translate the words. In case of under-interpretation, it is possible if you have data but there is no such relevant data available.
Clinchant: It also depends on whether a concept in one language can be translated into the other language. The fact that you have titles in Korea - this is something difficult for us to understand.
One day, our colleague told us when they meet a foreigner they try to guess what title they should be called by. This is something a machine cannot do automatically to this day because it’s grounded not only in the language but in your mind, in some of the assumptions you have.
You return to France this August. What was your most valuable lesson learned during your visit to Korea?
Nikoulina: Learning how to work with people here. That was one of my motivations for coming here. Most people at the European unit are willing to collaborate with people in Korea, but establishing connections was difficult because of physical distance and cultural difference.
Being able to contribute to the real product was also motivating. The research we do in Europe may be very interesting but it is not all immediately practical.
Clinchant: I think I became more efficient being next to Koreans. I also liked being focused on Papago because sometimes in Europe often one researcher would have many different projects to work on, but here there was one big team focused on one service.
BY SONG KYOUNG-SON [firstname.lastname@example.org]