Thank you, Mr. Chair.
First, I want to thank the committee for giving Statistics Canada this opportunity to present the facts concerning an error detected in the 2016 population census language data that it released on August 2.
I believe you have received copies of the presentation we prepared to explain to you what happened. I am going to review that presentation and talk about the various points addressed in it.
As we now know, an error occurred in the 2016 population census findings, and it mainly concerns a few communities in Quebec. The error caused an overestimation of the growth of English as a mother tongue and the language spoken most often at home, mainly in the province of Quebec and in some of its municipalities, and an overestimation of the decline of French. It also resulted in a slight overestimation of the rate of English-French bilingualism in Quebec and the rest of Canada.
The source of that error was a programming problem in an auxiliary data collection procedure. The error occurred during a follow-up step conducted with respondents to fill in incomplete information. The error occurred in the transfer of responses for a subset of French questionnaires. It affected the content of the short form only and concerned approximately 61,000 people.
Responses were miscoded by the system for two language questions: questions 8 a) and 8 b), which concern the language spoken at home, and question 9, which concerns mother tongue. Responses to the “French” and “English” categories were reversed.
In the presentation, you will find a sample paper questionnaire in which those questions appear. As you can see, the response selections are reversed between the English- and French-language versions. In short, the program read the French version of the questionnaire as though it were in English and interpreted the first response, which is "French", as being "English".
A comprehensive review of the entire collection and processing process resulted in a clear diagnostic of the impact of that error. As I mentioned, approximately 61,000 individuals had their responses incorrectly classified for these three questions. We confirmed that this error affected only the response categories that are in a different order in the English and French questionnaires. As a result, for a subset of questionnaires, the “French” responses were coded as “English” responses. As the problem originally concerned the French version of the questionnaire, the error mainly affected findings in the province of Quebec.
Statistics Canada takes the quality of its data and their importance for users very seriously. Once informed that some results appeared to be hard to explain for certain Quebec communities, we immediately proceeded with a new review of our data production processes. Our presentation provides a timeline of events from the moment we were informed of a potential problem, to the moment we identified the source of the error, and the moment we corrected it.
On August 9, the chief statistician was notified in writing by a data user about inconsistencies in the 2016 census findings for the English language in select communities in the province of Quebec. Statistics Canada then conducted an exhaustive review of the data collection and processing of the 2016 census. We looked for the origin of the problem.
On August 11, we confirmed that there was an error in a computer program and released a statistical announcement to that effect. We immediately informed data users that there was a problem with the data.
From August 12 to 15, Statistics Canada re-ran the entire data processing and analysis process for the language variables.
On August 16, an expert panel assembled by Statistics Canada reviewed the new language data.
On August 17, we released new data and a technical note explaining the nature of the problem and exactly what had been done.
All language data products were thus released as of August 17. All data products initially made available on August 2 were corrected and are now available on the Statistics Canada website.
In the work we did to correct this error, we took a number of steps, including verifications throughout the data processing, with particular attention to records affected by the error. We verified and validated that the error was limited to the language variables only and did not apply to other parts of the questionnaire. We conducted an analysis of the impact of the error at every processing stage and at several geographic levels, and we cross-checked with other data sources to ensure the new findings were valid. Lastly, we conducted a review assisted by an expert panel, as I mentioned earlier.
In view of this error, we have since implemented rigorous mechanisms to determine the sources of variations in numbers and percentages between the 2016 and previous censuses. Data validation methods have been changed to enable us to identify factors that explain the variations down to the level of every municipality in Canada. Our verification process is now vastly more robust as a result. No other production error has been detected for any other data released to date.
That, broadly speaking, covers the events surrounding our release of the language data on August 2, 2017, and the measures Statistics Canada took to uncover the causes of that error, to make the appropriate corrections, and to re-release the data so we could certify for our users that the data could be used without restrictions.
We are now prepared to answer your questions.