Evidence of meeting #77 for Official Languages in the 42nd Parliament, 1st Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was questionnaire.

A recording is available from Parliament.

On the agenda

MPs speaking

Also speaking

Marc Hamel  Director General, Census program, Statistics Canada
Jean-Pierre Corbeil  Assistant Director, Social and Aboriginal Statistics Division, Statistics Canada

3:40 p.m.

Liberal

The Chair Liberal Denis Paradis

Pursuant to Standing Order 108(3), we are continuing our study of the 2016 census language data and the overestimation of the growth of English in Quebec.

Today we are pleased to welcome two Statistics Canada representatives: Jean-Pierre Corbeil, assistant director of the social and aboriginal statistics division, and Marc Hamel, director general of the census program.

Welcome, gentlemen.

I imagine you know how our committee works. As usual, we will give you about 10 minutes to make your presentation. Then we will move on to a period of questions and comments from committee members.

I believe Mr. Hamel will be making the presentation.

Please go ahead, Mr. Hamel.

3:40 p.m.

Marc Hamel Director General, Census program, Statistics Canada

Thank you, Mr. Chair.

First, I want to thank the committee for giving Statistics Canada this opportunity to present the facts concerning an error detected in the 2016 population census language data that it released on August 2.

I believe you have received copies of the presentation we prepared to explain to you what happened. I am going to review that presentation and talk about the various points addressed in it.

As we now know, an error occurred in the 2016 population census findings, and it mainly concerns a few communities in Quebec. The error caused an overestimation of the growth of English as a mother tongue and the language spoken most often at home, mainly in the province of Quebec and in some of its municipalities, and an overestimation of the decline of French. It also resulted in a slight overestimation of the rate of English-French bilingualism in Quebec and the rest of Canada.

The source of that error was a programming problem in an auxiliary data collection procedure. The error occurred during a follow-up step conducted with respondents to fill in incomplete information. The error occurred in the transfer of responses for a subset of French questionnaires. It affected the content of the short form only and concerned approximately 61,000 people.

Responses were miscoded by the system for two language questions: questions 8 a) and 8 b), which concern the language spoken at home, and question 9, which concerns mother tongue. Responses to the “French” and “English” categories were reversed.

In the presentation, you will find a sample paper questionnaire in which those questions appear. As you can see, the response selections are reversed between the English- and French-language versions. In short, the program read the French version of the questionnaire as though it were in English and interpreted the first response, which is "French", as being "English".

A comprehensive review of the entire collection and processing process resulted in a clear diagnostic of the impact of that error. As I mentioned, approximately 61,000 individuals had their responses incorrectly classified for these three questions. We confirmed that this error affected only the response categories that are in a different order in the English and French questionnaires. As a result, for a subset of questionnaires, the “French” responses were coded as “English” responses. As the problem originally concerned the French version of the questionnaire, the error mainly affected findings in the province of Quebec.

Statistics Canada takes the quality of its data and their importance for users very seriously. Once informed that some results appeared to be hard to explain for certain Quebec communities, we immediately proceeded with a new review of our data production processes. Our presentation provides a timeline of events from the moment we were informed of a potential problem, to the moment we identified the source of the error, and the moment we corrected it.

On August 9, the chief statistician was notified in writing by a data user about inconsistencies in the 2016 census findings for the English language in select communities in the province of Quebec. Statistics Canada then conducted an exhaustive review of the data collection and processing of the 2016 census. We looked for the origin of the problem.

On August 11, we confirmed that there was an error in a computer program and released a statistical announcement to that effect. We immediately informed data users that there was a problem with the data.

From August 12 to 15, Statistics Canada re-ran the entire data processing and analysis process for the language variables.

On August 16, an expert panel assembled by Statistics Canada reviewed the new language data.

On August 17, we released new data and a technical note explaining the nature of the problem and exactly what had been done.

All language data products were thus released as of August 17. All data products initially made available on August 2 were corrected and are now available on the Statistics Canada website.

In the work we did to correct this error, we took a number of steps, including verifications throughout the data processing, with particular attention to records affected by the error. We verified and validated that the error was limited to the language variables only and did not apply to other parts of the questionnaire. We conducted an analysis of the impact of the error at every processing stage and at several geographic levels, and we cross-checked with other data sources to ensure the new findings were valid. Lastly, we conducted a review assisted by an expert panel, as I mentioned earlier.

In view of this error, we have since implemented rigorous mechanisms to determine the sources of variations in numbers and percentages between the 2016 and previous censuses. Data validation methods have been changed to enable us to identify factors that explain the variations down to the level of every municipality in Canada. Our verification process is now vastly more robust as a result. No other production error has been detected for any other data released to date.

That, broadly speaking, covers the events surrounding our release of the language data on August 2, 2017, and the measures Statistics Canada took to uncover the causes of that error, to make the appropriate corrections, and to re-release the data so we could certify for our users that the data could be used without restrictions.

We are now prepared to answer your questions.

3:45 p.m.

Liberal

The Chair Liberal Denis Paradis

Thank you very much, Mr. Hamel.

We will immediately begin the period of questions and comments by handing the floor over to Mr. Bernard Généreux.

3:45 p.m.

Conservative

Bernard Généreux Conservative Montmagny—L'Islet—Kamouraska—Rivière-du-Loup, QC

Thank you, Mr. Chair.

Thank you for being here today, Mr. Hamel and Mr. Corbeil.

As you know, gentlemen, when we parliamentarians are required to make decisions, we rely on what are called facts, factual elements. The data we are given enable us to make decisions for Canadians. Consequently, Statistics Canada stakes its credibility on all the data it provides to parliamentarians, institutions, companies, and its entire clientele in the broadest sense.

What happened in August undermined Statistics Canada's credibility to a certain degree, and it was important for us to meet with you today to take stock of the situation. You are here today to defend your institution's credibility, and I am pleased that media people are here too so they can report the matter to Canadians. We will probably be doing the same in an upcoming report.

I do not think we have any grounds to doubt Statistic Canada's credibility. What is certain is that Statistics Canada has been around for quite a long time, and decisions that Canadian parliamentarians from all parties have previously made have been based on facts, information, and data that you have provided. It is fundamentally important and even essential that the information we receive and on which we rely in making decisions be absolutely perfect, and that is particularly true with regard to official languages.

How can this kind of error occur given the number of employees you have, the credibility you enjoy, and the history of your institution? How can this kind of error still occur in 2017? That is the main question in my mind. Furthermore, I would like to know whether this has happened before. Whatever the case may be, do you think that this error, which occurred in 2017, was human or technological in origin? Can the two be separated?

3:50 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

The answer I can give is that I absolutely agree with you. Statistics Canada's credibility is always at stake when we use data. We always want to ensure that users can count on valid information.

We are still reviewing all the processes associated with what happened in this instance. The census is a very complex machine, involving hundreds and even thousands of processes, and we release millions of Information units. That being said, we also have rigorous and systematic processes for reviewing everything that is released based on the census.

I cannot specifically explain to you the nature of the problem that occurred. Ultimately, a computer system misread the questionnaire, but a computer system is created by human beings.

3:50 p.m.

Conservative

Bernard Généreux Conservative Montmagny—L'Islet—Kamouraska—Rivière-du-Loup, QC

I entirely agree with you, Mr. Hamel. I know too well how this can happen, having previously been a printer. I witnessed instances in which errors were made, more particularly French-language errors, on ballots and in other printed documents. Printers must redo the work in those cases. When you are a printer, you have to check before you print.

In this case, we are talking about answers to two questions that were reversed in the English and French versions. The information entered in the computer system was therefore incorrect, since the answers to those two questions were not in the same order in both versions of the questionnaire.

The entire questionnaire must be proofread. Was the error solely in the electronic questionnaire, in the paper questionnaire, or in both?

3:50 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

In this case, it was in fact a reversal error in a computer system.

3:50 p.m.

Conservative

Bernard Généreux Conservative Montmagny—L'Islet—Kamouraska—Rivière-du-Loup, QC

Yes, but it was in the questionnaire.

3:50 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

The system was supposed to read the French version of the questionnaire and interpret it as the French version. During the conversion, if the first possible answer, “French”, had been checked, the system should have interpreted that answer as “French”. However, it was the English matrix that interpreted the French questionnaire and unfortunately thought the first answer was “English”.

As regards the output of this system, we should have realized that the questionnaire was incorrectly interpreted. We should have made the correction, but that was not done.

3:50 p.m.

Conservative

Bernard Généreux Conservative Montmagny—L'Islet—Kamouraska—Rivière-du-Loup, QC

I do not think you should use the word “unfortunately” in that sentence. I agree that humans tell the computer what to do, but there must be an absolute correspondence between the questionnaire and the final result. Nothing unfortunate should be able to occur.

The month of November starts tomorrow, and this error occurred in August. You are unable to explain to me exactly what happened, despite the analyses you conducted of the processes to determine the cause of the problem. Three months later, you still do not know what actually happened.

3:50 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

I know what happened, but I still do not know how the error escaped us.

When we create a system, it is systematically designed and individually tested. We test the outputs of that system. We verify where in fact the information subsequently goes, which system takes over, and so on. All that is done systematically when we prepare for and conduct the census.

For the moment, I cannot tell you why we did not detect the error when we tested all those systems. However, we take measures and use matrices to test all these processes. Once the data are produced, they are validated. At the validation stage, we saw that changes had occurred, but we did not understand that the verification should have been done before releasing and correcting the data.

This type of error is highly unlikely but not impossible.

3:50 p.m.

Conservative

Bernard Généreux Conservative Montmagny—L'Islet—Kamouraska—Rivière-du-Loup, QC

You just told me in a single answer that the system did not detect the problem but that you noted that something unusual had probably occurred.

3:55 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

It was not the system that failed to detect the error. It was the people who tested the system who failed to see it was incorrectly reading the questionnaire.

3:55 p.m.

Liberal

The Chair Liberal Denis Paradis

Thank you, Mr. Généreux.

Now we will go to Ms. Lapointe.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

Thank you, Mr. Chair.

Gentlemen, thank you very much for accepting our invitation.

Like Mr. Généreux, I was very surprised to hear you say there were problems associated with the anglophone population. Earlier you mentioned a few anglophone populations.

What did you mean? Are we talking about Quebec as a whole or only certain places?

3:55 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

Where a person who completed the French version of the questionnaire indicated English as the spoken language, the system mistakenly read that as though the person had indicated French as the spoken language. The error could affect certain cases in that way.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

I see.

You said that had the effect of overestimating the rate of bilingualism in Quebec and the rest of Canada.

We have been talking about Quebec for a while now, but what did you observe for the rest of Canada? Was the answer the same?

3:55 p.m.

Jean-Pierre Corbeil Assistant Director, Social and Aboriginal Statistics Division, Statistics Canada

As my colleague Mr. Hamel mentioned, there were between 2,000 and 3,000 cases outside Quebec. Since those people should have been identified as francophones but were identified as anglophones, that of course had a slight impact. We are talking about a minor overestimation of the rate of bilingualism. For Canada as a whole, the percentage stated was 18%, but it is actually 17.9%. In Quebec, we are talking about a difference of a few tenths of a percentage point. The figure was 45%, and there was a difference of two or three tenths of a percentage point. If we are talking about anglophones living outside the greater Montreal area, you should know that people in the small municipalities outside that major region are most likely to be bilingual. Since these were francophones instead of anglophones, the result was an overestimation of the rate of bilingualism.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

Earlier I think you said that the responses of 31,000 people in Quebec had been incorrectly classified, but the figure in your document is 61,000. Is it 31,000 or 61,000?

3:55 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

It is 61,000.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

All right. I had understood 31,000 when you made your presentation. I probably misunderstood. I just wanted to verify that it was indeed 61,000.

You will understand my concern after the following comments.

In our proceedings, the committee has often discussed the importance of accurately enumerating anglophone and francophone rights holders under paragraphs 23(1)(a) and (b) and subsection 23(2) of the Canadian Charter of Rights and Freedoms.

In your last appearance before the committee, Mr. Corbeil, you explained that the process involved in asking the right questions and ensuring you cover the right things was a long one. In fact, you did not seem sure that all francophone rights holders in the rest of Quebec could be enumerated. I assume you must have had to conduct some tests to make sure you asked the right questions.

3:55 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

The problem was in fact unrelated to the questions. The problem was in the underlying mechanics of those questions and concerned the data production process as a whole. The problem occurred during an auxiliary data collection process when we converted certain responses in order to follow up with respondents. During that conversion, the system read questionnaires completed in French as questionnaires completed in English, as you can see in the example.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

Has this kind of misreading problem previously occurred?

3:55 p.m.

Director General, Census program, Statistics Canada

Marc Hamel

Not in this case. These are the only questions for which the answers do not appear in the same order in the English and French versions.

3:55 p.m.

Liberal

Linda Lapointe Liberal Rivière-des-Mille-Îles, QC

In this case, a given population was overestimated or underestimated. Since you work at Statistics Canada, you are aware of the impact this can have in Canada.