Thank you, Mr. Chair.
I am not very well versed in procedure, so you will forgive me if I forget to use your proper titles, like Mr. Chair and so on. It could well happen.
I would like to start by giving you a little biography of myself, to justify my appearance before you today.
I know this might not seem to be the case, but I have very close to 50 years of experience in translation. I started very young, around 25 years old, working in translation in areas as varied as banking, heating, and the media. So I have a good idea of what a really intense and varied translation practice can be like.
I have worked as a translator, a terminologist and a translation services manager, which gives me a good overview of the profession as a whole.
After 30 years of that varied practice, I decided to do a PhD, because I wanted to be listened to when I spoke. I have been teaching since that time. I teach translation, the basic element, as well as revision, which, in translation, is quality control. That led me to machine translation. I have also taught machine translation and post-editing, which I will talk to you about later.
I am going to let others deal with the political and sociolinguistic issues. I will not be talking about that at all. That is not my goal today. Instead, my goal is to rehabilitate the translation software. I will probably be one of the only ones to do so, but that is not the crux of my message.
The crux of my message is to show that present circumstances do not really allow us to use what I consider to be an excellent machine translation system in the best way. That does Portage no favours, a system of which Canada can be proud. In international competitions, this system regularly places first. It is one of the jewels in the crown of technological innovation.
Contrary to what the media has been saying recently, the problem has absolutely nothing to do with the performance of the machine translation software. I would like to point out that all the examples that I have read in La Presse and in other media are completely inappropriate. Let me give you one. People have been laughing themselves silly about the expression “it's raining cats and dogs”. No one uses that expression in the public service. I do not see why we are hung up on it. It is an old-fashioned expression that I feel has no place in the current language of public servants.
Let me read you a commercial definition of the system that exactly reflects how I feel about it.
Portage is a statistics-based software system that yields far better results than earlier attempts to automate the highly nuanced art of translation.
We are not talking about linguistics. Machine translation software does not translate languages; that is not what it primarily does. You end up with a translation, but the system works by statistical analysis. This is about mathematics, not about language. The system understands nothing. It just understands the data it is given and the data it has already stored. It makes comparisons.
However, I am less in agreement with the words “highly nuanced art”. The software works on binary coding: 1,0,1,0,1,0. There is nothing highly nuanced about it. The program depends on the machine learning statistics. It really is “garbage in, garbage out”. It absorbs what it is given and it gives back what it is given. If what you give it is not good, then the product it gives you will not be good either. Really, it is no more complicated than that.
Why did I decide to come here? Because, with things as they are at the moment, I was wondering where we are going. This is a three-fold distortion of machine translation.
First, I feel that the use the Translation Bureau itself had in mind was not the generalist role the system currently has. For example, translating emails comes under the heading of general texts, whereas that is not what the system was intended for. Nor is it what its designers intended it for. They always have been conscious of the fact that, just like at the very beginning and all through the 1960s, machine translation of general texts will never reach the quality that humans can achieve. The designers say so too, they are not kidding themselves. There really has to be preparation from below and from above.
From below, we have what we call the corpus. I have told you that it is a statistical analysis system. It is going to analyze in terms of what it already has in its memory. There is the machine translation process, the computerized translation process, which ends up as a linguistic product, a text, which then also has to be refined by humans in a process that we call machine text editing, or more commonly, post-editing. So, if there are no humans on both sides, the results are certainly going to be terrible. The software designers recognize that themselves: they never thought that they would end up with quality translations. However, it seems that some other people believe that you can do so, and that is why they want to install the software at all costs.
The other distortion is that, by implementing the system immediately, we are going to deliver a fatal blow to the development of machine translation, in my opinion, because we are harming it a great deal. But, as I was telling you earlier, it is one of the jewels in the crown of the country's technological innovation. So if we want to kill off all the enthusiasm for machine translation at the outset, as a discipline itself, we have found a good way to go about it. If we put the system into operation now, we will certainly end up with gibberish, just as we have read in the press, and that will harm the reputation of Canadian machine translation researchers.
Now I am going to talk to you about post-editing. I am less familiar with corpus development, which is an area that has much more to do with linguistics, than with quality control. Post-editing is a quality control operation. I will not go into the details, but I also feel that, even with post-editing, we are not completely assured of quality machine translation.
There are various reasons for that. Among them is that post-editing as it is presently conceived is all about speed. So, the quicker you work, the less freedom you have, for example, to work with the sentences or to make the texts more idiomatic in the target language. I am not even talking about French. It must be said that current software like Portage is not so bad at translating idiomatic expressions. Language that is more idiomatic in general poses the problem, in fact, not idiomatic expressions themselves. However, post-editors are working at such a rhythm that they cannot restore the idiomatic aspects every time.