Information & Ethics Committee on Feb. 7th, 2022

Evidence of meeting #5 for Access to Information, Privacy and Ethics in the 44th Parliament, 1st Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was good.

A video is available from Parliament.

On the agenda

Collection and Use of Mobility Data by the Government of Canada

Committee Business

MPs speaking

Also speaking

Daniel Therrien Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Martyn Turcotte Director, Technology Analysis Directorate, Office of the Privacy Commissioner of Canada

Khaled El Emam Canada Research Chair in Medical Artificial Intelligence, As an Individual

11:55 a.m.

Conservative

The Chair Conservative Pat Kelly

Thank you.

Our last questioner will be Mr. Bains for a five-minute round.

11:55 a.m.

Liberal

Parm Bains Liberal Steveston—Richmond East, BC

Thank you, Mr. Chair.

Thank you to our witnesses for joining us today.

I know a lot of questions have been asked. You talked about how you highly doubt people know whether their data is protected or unprotected. Last week I talked about something similar.

I look at a number of apps that people are using across the country, everything from Google Maps to other apps that people have on their phones. It's probably in the hundreds. Typically, the app will ask you for access to your information and access to your camera. You said that something needs to be done to strengthen this protection. Is that something that you feel should be included in this feature?

Noon

Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Daniel Therrien

I'll distinguish two things. I heard you say—or perhaps I misunderstood—that there's a question of knowledge or awareness by Canadians, and a question of protection. As to whether the data of Canadians was adequately protected, that is the subject of our investigation, so I'm not saying it was protected or not protected. That's what we're going to investigate.

In terms of knowledge, yes, I maintain that most users of the Telus services probably did not know that their data would be used that way. We had a look at the privacy policies of Telus, and there is something in these privacy policies, as there often is in privacy policies of companies, informing Canadians that their mobility data, in a de-identified fashion, might be used for what they call “the public good”. They did not define “public good” to mean “used by the government and PHAC”. Be that as it may, we know these privacy policies are not read. They're long, they're complicated, and even lawyers have difficulty understanding them. That's not a particularly good way of informing Canadians of how their data will be used. I think in this case, the government probably did a better job through the COVIDTrends web page to inform Canadians. Be that as it may, I think it's fair to say that Canadians by and large were not aware and that more should be done.

Frankly, it will never be possible to inform people of all the uses that will be made of their information, because there are too many of these uses and many are legitimate or for the public good. If data is to be used for the public good, consent cannot be a precondition for all these public good uses. Consent has a place, and transparency has a place. Improving privacy policies has a place, but the real solution is to have a backstop to the absence of consent where you have objective criteria like legitimate commercial interests, which I agree probably need a bit of definition, or social good, enforced by somebody who can protect the interests of individual Canadians.

It's a complicated area. Let's not lose track of the fact that data can be used for good, but it needs to be better regulated.

Noon

Conservative

The Chair Conservative Pat Kelly

You have time for one last question, Mr. Bains.

Noon

Liberal

Parm Bains Liberal Steveston—Richmond East, BC

What's your standard for adequate protection of data?

Noon

Conservative

The Chair Conservative Pat Kelly

In 10 seconds or less.

February 7th, 2022 / noon

Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Daniel Therrien

Is it for the social good, or for legitimate commercial interests, on one hand? On the other hand, does it violate privacy as a human right? Does it constitute surveillance? You balance these things out, and you determine whether the use of data is adequate in that fashion.

Noon

Conservative

The Chair Conservative Pat Kelly

Thank you very much.

With that, we conclude panel one of today's meeting. I'm sure all members will join me in thanking Commissioner Therrien and Mr. Turcotte.

I would like to proceed immediately to the second panel. I'm going to dispense with the procedural statements, because I think everybody was here, including our witness who was observing.

I'll suspend for a brief moment for a sound check, and then we'll begin panel two.

12:05 p.m.

Conservative

The Chair Conservative Pat Kelly

We're resuming the meeting to begin the second panel.

Without further delay, I invite our witness, Dr. Khaled El Emam, to make his opening statement to a maximum of five minutes, following which we will have a single round of six minutes each.

Go ahead, Dr. El Emam.

12:05 p.m.

Dr. Khaled El Emam Canada Research Chair in Medical Artificial Intelligence, As an Individual

Thank you, Mr. Chair and members of the committee.

The purpose of my remarks is to offer an overview of de-identification. As someone who has worked in this area for close to 20 years in both academia and industry, perhaps this is where I can be helpful to the committee's study. I cannot comment on the specifics of the approach taken by Telus and PHAC because I do not have that information. My focus is on the state of the field and practice.

It's important to clarify terminology. Terms like anonymization, de-identification and aggregation are used interchangeably, but they don't mean the same thing. It's more precise to talk about the risk of re-identification. The objective when sharing datasets for a secondary purpose, as is the case here, is to ensure that the risk of re-identification is very small.

There are strong precedents on the definition of very small risk, which come from data releases by, for example, Health Canada, from guidance from the Ontario privacy commissioner, and from applications by European regulators and health departments in the U.S. Therefore, accepting a very small risk is typically not controversial as we rely on these precedents that have worked quite well in practice.

If we said that the standard is zero risk, then all data would be considered identifiable or considered personal information. This would have many negative consequences for health research, public health, drug development and the data economy in general in Canada. In practice, a very small risk threshold is set, and the objective is to transform data to meet that threshold.

There are many kinds of transformations to reduce the risk of re-identification. For example, dates can be generalized, geographical locations can be reduced in granularity, and noise can be added to data values. We can create synthetic data, which is fake data that retains the patterns and statistical properties of the real data but for which there is no one-to-one mapping back to the original data. Other approaches that involve cryptographic schemes can also be used to allow secure data analysis. All that is to say there's a tool box of privacy-enhancing technologies for the sharing of individual-level data responsibly, and each of those has some strengths and weaknesses.

Instead of sharing individual-level data, it's also possible to share summary statistics only. If done well, this has a very small risk of re-identification. Because the amount of information in summary statistics is significantly reduced, it does not always meet an organization's needs. If it does, it can be a good option, and that's how we tend to define “aggregate data”.

In practice, for datasets that are not released to the public, additional security, privacy and contractual controls must be in place. The risk is managed by a combination of data transformations and these controls. There are models to provide assurance that the combination of data transformations and controls has a very small risk of re-identification overall.

There are other best practices for responsible reuse and sharing of data, such as transparency and ethics oversight. Transparency means informing individuals about the purposes for which their data are used and can involve an opt-out. Ethics means having some form of independent review of the data-processing purposes to ensure that they are not harmful, surprising, discriminatory, or just creepy. Especially for sensitive data, another approach is a white-hat attack on the data: Someone is commissioned to launch a re-identification attack to test the re-identification risk empirically. This can complement the other methods and provide additional assurance.

All this means is that we have good technical and governance models to enable the responsible reuse of datasets, and there are multiple privacy-enhancing technologies, mentioned above, available to support data reuse.

Is everyone adopting these practices? No. One challenge is the lack of clear, pan-Canadian regulatory guidance or codes of practice for creating non-identifiable information that take into consideration the enormous benefits of using and sharing data and the risks of not doing so. This, and more clarity in law, would reduce uncertainty, provide clear direction for what reasonable, acceptable approaches are, and enable organizations to be assessed or audited to demonstrate compliance. While there are some efforts, for example by the Canadian Anonymization Network, it may be some time before they produce results.

12:10 p.m.

Conservative

The Chair Conservative Pat Kelly

You have one minute, please.

12:10 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

I've written a white paper with 10 recommendations for regulating non-identifiable data, which I can share with the committee if the committee wishes to review it.

To conclude, while I have not assessed the measures taken in this situation, I hope my comments can assist the committee's work.

Thank you. I welcome your questions.

12:10 p.m.

Conservative

The Chair Conservative Pat Kelly

Thank you very much.

Before we begin, I'll remind all members that we are going to do a single six-minute round, so if anybody wishes to split their time, please indicate your intention.

With that, I'm going to begin with Mr. Brassard.

12:10 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

Thank you, Mr. Chair.

Dr. El Emam, I really appreciate your being here today.

Obviously, we're in the process of identifying some of the risks associated with the mobility data gathering of the Public Health Agency of Canada through a couple of organizations. I know you are an expert in this field of reidentifying de-identified and disaggregated data. Can you speak to the risks associated with that?

12:10 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

If the data is de-identified using known practices, good practices, then the risks can be very small. There are many precedents from reputable organizations in Canada and internationally for what's deemed to be acceptable risk, and we can measure those risks and apply techniques to reduce the risk to be acceptably small. The methodologies have been well established and have been used in practice for some time.

12:10 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

Can you speak to some of those methods that can be used to reidentify such data?

12:10 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

Yes, absolutely.

To de-identify information, there are transformations like reducing the granularity of the geography, to have larger and larger geographic areas, for example, or reducing the granularity of dates so you can have larger time intervals; instead of days, you can have weeks or longer. You can use synthetic data, which creates fake data that looks like the real data but it's not about the individuals. You can use cryptographic techniques, where you encrypt the data and do the analysis on the encrypted data.

There are a number of different technologies that have been developed that can be used for this purpose. The choice, of course, will depend on the objectives of the Public Health Agency and what kind of analysis they do, but there are options.

12:10 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

I've seen some studies and some reports. There was a European study done. The New York Times did an unbelievable study on how easy it is to reidentify data given one, two, three, four or five points of data being picked up.

Can you speak to those data points and the vulnerability with respect to reidentifying that data?

12:10 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

If good methods have been applied, the risk of re-identification can be very small. I think that, in many of those examples, good methods were not applied. They demonstrate the importance of applying good methods and good practices.

As I mentioned, the risk is not going to be zero. There's always some risk. You manage that residual risk by putting in place additional controls, such as additional security controls, privacy controls and contractual controls.

Overall, the risk can be quite small. The approaches work well in practice when they have been applied properly.

12:15 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

The issue of consent is one that we've heard about as being important throughout this whole process. Oftentimes there's a convoluted requirement to provide consent, and oftentimes people aren't aware that their data is being tracked.

Can you speak to the importance of consent as well?

12:15 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

As Commissioner Therrien mentioned, in cases like this it can be impractical to obtain consent a priori. Therefore, the de-identification methods and the additional controls and transparency and ethics reviews all provide assurance that the data is no longer identifiable and it's being used responsibly.

12:15 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

The other area you've been focused on.... I've read some of your work on synthetic data generation for privacy-preserving sharing of health data. The committee is not just looking at what happened with Public Health, but also looking forward and potentially making recommendations to the government on some changes that are needed in the collection of this data and ensuring that the privacy of individuals is maintained.

If you don't mind, could you just speak a bit more to synthetic data generation?

12:15 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

Yes. The idea is that you start with the real data and you build a machine-learning or AI model that learns all the patterns in the real data, and then you generate new data from this model.

The generated data has no mapping to the original data. It has no mapping to real people. It's fake data that's generated from a model, but it maintains the properties and characteristics of the real data. You can do many kinds of analytics and surveillance—in this case, public health surveillance—using the synthetic data, but you have strong privacy protection at the same time.

12:15 p.m.

Conservative

John Brassard Conservative Barrie—Innisfil, ON

Is the privacy risk diminished if you use this type of data generation?

12:15 p.m.

Canada Research Chair in Medical Artificial Intelligence, As an Individual

Dr. Khaled El Emam

Yes. The risks will be quite small.