Information & Ethics Committee on April 17th, 2018

Evidence of meeting #99 for Access to Information, Privacy and Ethics in the 42nd Parliament, 1st Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was facebook.

A video is available from Parliament.

On the agenda

Breach of personal information involving Cambridge Analytica and Facebook

Committee Business

MPs speaking

Bob Zimmer

Nathaniel Erskine-Smith

Also speaking

Daniel Therrien Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Chris Vickery Director of Cyber Risk Research, UpGuard, As an Individual

8:45 a.m.

Conservative

The Chair Conservative Bob Zimmer

Welcome back, everybody.

We will call to order the Standing Committee on Access to Information, Privacy and Ethics. This is meeting 99. Pursuant to Standing Order 108(2), we are studying the breach of personal information involving Cambridge Analytica and Facebook.

Today we have some witnesses via teleconference and in person.

In person, we have Daniel Therrien, Privacy Commissioner of Canada, and Barbara Bucknell, Director of Policy and Research.

Via teleconference, we have Chris Vickery, Director of Cyber-Risk Research at UpGuard.

Welcome to all.

Mr. Therrien, you have the floor

8:45 a.m.

Daniel Therrien Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Good morning.

I would like to thank the committee for the invitation today to discuss the privacy implications of online platforms and appropriate legislative responses to the concerns of citizens about how their personal information is being used.

As you are aware, I received a complaint about this matter and announced some weeks ago that my office is conducting a formal investigation into how personal information on Canadians has been impacted by the activities of Facebook and Aggregate IQ.

Due to my confidentiality obligations under the law, I'm not in a position to discuss the details of this investigation with you today. I cannot prejudge our findings.

What I can share with you, however, is some perspective on the wider context that may assist you as you begin your study.

Canadians want to enjoy the many benefits of the digital economy, but they rightly expect they can do so without fear that their rights will be violated and their personal information will be used against them. They want to trust that rules, legislation, and government will protect them from harm.

In the recent Facebook matter, what happened, as acknowledged by CEO Mark Zuckerberg, was, quote, a “major breach of trust”. As recognized by the CEO of another giant tech company, Tim Cook of Apple, the situation is so dire that it is now time to develop well-crafted legislation to regulate the digital economy. The time of self-regulation is over.

In Canada, we of course have privacy legislation, but it is quite permissive and gives companies wide latitude to use personal information for their own benefit. Under PIPEDA, organizations have a legal obligation to be accountable, but Canadians cannot rely exclusively on companies to manage their information responsibly. Transparency and accountability are necessary, but they are not sufficient.

To be clear, it is not enough to simply ask companies to live up to their responsibilities. Canadians need stronger privacy laws that will protect them when organizations fail to do so. This was a major conclusion of my annual report to Parliament last year, and a point I made during your recent study of PIPEDA, Canada's private sector privacy law.

Significantly, given the opaqueness of business models and complexity of data flows, the law should allow my office to go into an organization to independently confirm that the principles in our privacy laws are being respected—without necessarily suspecting a violation of the law.

The time has also come to provide my office with the power to make orders and issue financial penalties, helping us to more effectively deal with those who refuse to comply with the law.

Strengthened legislation does not need to be an impediment to innovation. We know that personal information plays a key role in the digital economy, including advances in the field of artificial intelligence, which are necessary for Canada's social and economic development. We need legislation that ensures, as a general rule, that Canadians provide meaningful, informed consent for the collection and use of their personal information. But consent will not always be possible in the world of big data and artificial intelligence, where personal information may be used for multiple purposes not always known when it is collected.

This is why we recommended that Parliament examine exceptions to consent. We believe such exceptions, subject to conditions that would offer other forms of privacy protection, are preferable to relying on an interpretation of consent that is so broad as to become meaningless. We prefer narrower, specific exceptions, but we recognize that one option could be a European-style legitimate interest exception.

I'm of course very pleased that your committee recently issued a report calling for comprehensive changes to the federal private sector privacy law, which included several recommendations I had made but also others that would significantly improve the privacy rights of Canadians. Your report has shown that you are attuned to the issues stemming from the dated state of federal privacy laws in Canada, and you have actively called upon the government to make comprehensive changes.

Many in society, particularly in the last few weeks, are making similar calls. Even leaders of the tech industry now see the need for enhanced regulations.

If there was ever a time for action, I think, frankly, this is it.

Another area ripe for action concerns privacy protections and political parties.

As you are aware, no federal privacy law applies to political parties; British Columbia is the only province with legislation that covers them.

This is not the case in many other jurisdictions. The UK, much of the EU and New Zealand all cover political organizations with their laws.

In point of fact, in many EU states, information about political views and membership is considered highly sensitive, even within existing data protection regimes, requiring additional protections.

There are also now—in the digital environment—so many more actors involved: data brokers, analytics firms, social networks, content providers, digital marketers, telecom firms and so forth.

So while I am currently investigating commercial organizations such as Facebook and Aggregate IQ, I am unable to investigate how political parties use the personal information they may receive from corporate actors.

In my view, this is a significant gap.

Some independent authority needs to have the ability to review the practices of political parties and to assess whether privacy rights are being truly respected by all relevant players.

This gap requires addressing in one statutory form or another, either in privacy laws, in the Canada Elections Act or in a specific statute.

In conclusion, I would again highlight the urgency to act, as well as the stakes involved.

The integrity of our democratic processes—as well as trust in our digital economy—are both clearly facing significant risks.

I cannot think of more relevant questions for legislators to confront, and I applaud you for doing so.

Thank you again for your invitation, and I would welcome your questions.

Thank you.

8:50 a.m.

Conservative

The Chair Conservative Bob Zimmer

Thank you, Mr. Therrien.

We'll go to Chris Vickery, who is in sunny California today.

Mr. Vickery.

April 17th, 2018 / 8:50 a.m.

Chris Vickery Director of Cyber Risk Research, UpGuard, As an Individual

Good morning.

It is a pleasure to be appearing before you. I am grateful for the opportunity. I believe the matter before us is one of very great importance. Facebook is certainly one of the core elements involved, but I would urge all of you to keep an eye towards the very focused efforts of others who rely on Facebook as a pillar of their operations but not solely on Facebook; others who are tending to cause direct harm to what I believe is the institution of democracy itself as sort of an end goal of what they're working towards here.

In case you don't know anything about me, I am somewhat uniquely situated to speak on the topic. The majority of my work can be described as hunting down data breaches. I openly call myself a “data breach hunter”. Over the last several years, my reputation has grown to be one of a leading authority on the prevalence and causes of data breaches as well as common patterns of incident response by the affected entities. Please note, though, that the data breaches that I locate and secure are not the result of actual computer exploitation or malicious acts. This is just data that has been left out in the open for whatever reason, and nobody realized it until I came along and found it. You may think there probably wouldn't be that much of that, but you'd be surprised. There is quite an epidemic of misconfigurations out on the Internet.

Some examples of data that I've secured stem from Verizon; Viacom; Microsoft; Hewlett-Packard; the United States Department of Defense; the Mexican national institute of elections, the INE; a couple of international terrorism blacklists; as well as the 2016 Trump presidential campaign website. They were leaking a bit of information as well.

The sum total of the efforts I've undertaken has resulted in the safeguarding of nearly two billion records containing private information, so I am well versed in this stuff. I look forward to answering any questions you may have.

More on point, I would like to point out that two data breaches that I came across in December 2015 involved the United States voter registration in its entirety, all 50 states plus DC. The second time, in the December that I found it, they were more enhanced. They had private details about people, with various pieces of personality and behavioural things—whether or not somebody was a gun owner, whether or not they lived a biblical lifestyle.

Six months later, in 2016, I came across another nationwide U.S. voter registration database, this one even more enhanced, having details on whether or not somebody watched NASCAR, whether or not they were anti-abortion sentiment holders, or whether or not they likely owned a gun.

Then another set of nationwide records came to my attention. I downloaded them after finding them in June 2017. This would be the third round of complete U.S. voter registration records that I came across. This was 198 million records, ranking as the largest U.S. voter data breach in known history. I would like to point out that at the time of the discovery, not a single one of these database breaches were protected with even a username or a password. They were simply out in the open. If you knew where to look, anyone in the entire world could find them.

The AggregateIQ situation that brings me here today is one that first started on March 20 of this year—not that long ago. I didn't know who AggregateIQ was until March 20. I was fiddling around on an open public website called GitHub where the developers collaborate and publish open source code.

I saw a reference to @aggregateiq.com in relation to some SCL Group code that was out there and just available to the public. I followed the bread crumbs, figured out what AggregateIQ was, and noticed they had a sub-domain called GitLab. When I viewed gitlab.aggregateiq.com, it occurred to me that the registration was available, and they were in essence inviting the entire world to register for an account on their collaboration portal.

I proceeded to register an account, it let me in, and all of these tools, utilities, credentials, scripts, employee notes and issues, and merge requests were all present before me. I very quickly realized the importance of this and that there would be likely heavy interest from regulators, governments, and the populace of several nations, so I began downloading. Normally, I go to great efforts to protect anybody who may be affected by this type of thing, but the overwhelming public interest in knowing the truth behind what Cambridge Analytica, AggregateIQ, and SCL Group have been doing is a compelling factor in this particular situation. I don't want you to think I just run out there and hand out everyone's dirty laundry when these things are found. This is a different situation.

Again, keep in mind that anyone in the entire world with an Internet connection could have found the same thing, gotten an account the same way I did, and downloaded the exact same things, regardless of what nation they were in or what loyalties they might feel. This was completely exposed with no manner of protection whatsoever. A malicious actor could have taken it a step further in that there were, and are, database passwords, usernames, credentials, keys, and authentication methods documented in these files that I did not take advantage of. I did download them, but I did not go the extra step and use those passwords to access the additional databases.

If it were found by someone else, and they were of the persuasion that would take advantage of it, it could have been, and may be, a much more serious data breach than has been mentioned. They could be completely infiltrated. Every bit of data that has ever crossed through AggregateIQ's hands could be in the hands of anyone who had found this same exposure.

There are a few remaining questions that I have not been able to decipher fully that I believe your investigation should figure out. While I am still looking into quite a bit of the data, I have not come to the exact final conclusion as to what AggregateIQ's relationship is to SCL Group and Cambridge Analytica. The walls of the separation between those entities are very porous. It's clear that code access permissions and data have traversed between the three of them, and other groups, so I would implore you to get to the bottom of that.

The second question is to what extent, if any, restricted political and private data has been utilized by AggregateIQ or AggregateIQ employees for commercial profit-seeking ventures. I have found evidence of ad networks being developed under the same domain, one notably called Ad*Reach network—and there are a few Ad*Reach networks on the Internet, so make sure you're looking at the right one before going after anybody in a questioning manner—as well as aq-reach. One of the employees who was working at AIQ was doing simultaneous work for an ad company called easyAd Group AG, which is based in Switzerland and has subsidiaries in the U.S. and in Russia. I would love to know what work was being done and if any of the data travelling through AIQ was utilized in any of those ad campaigns or set-ups that the employee was working on at the same time.

9 a.m.

Conservative

The Chair Conservative Bob Zimmer

We're at 10 minutes, so if you could wind up your testimony, that would be great.

9 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

Yes. I have one final point.

There is also a cryptocurrency token aspect to this. Exactly one comment within the GitLab commentary section was marked with flag that later I noticed was a confidential flag. That comment had to do with the Midas token. I looked into it. The Midas token was a project they were working on, and it was tagged to a website selling cryptocurrency at a $10,000 minimum buy-in.

The website has gone down since this was made public, and it feels very fishy to me. If you could figure out why somebody was developing a cryptocurrency on the AggregateIQ GitLab instance, for sale to the public, and why they would possibly not want anyone to know about this, I think it would be worth the investigation.

Thank you. I look forward to answering any questions.

9:05 a.m.

Conservative

The Chair Conservative Bob Zimmer

Thank you, Mr. Vickery.

Just for the committee's knowledge, California is about three hours behind us, so he's up at 5 o'clock in the morning.

Thanks again for appearing.

We'll start with Nathaniel Erskine-Smith.

9:05 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

Thank you very much.

There are a lot of moving parts to what we've heard. My first question is just to clarify.

Based on everything you've reviewed—obviously, you haven't been able to review everything, just given the sheer volume of the information you've been able to access—it's your view that information that was collected across a number of different campaigns for specifically political and more public purposes has been clearly used for commercial profit-seeking ventures.

9:05 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

It's highly likely that this has occurred. I have the tools. I don't have the ingredients that those tools mixed with, because that would have involved taking the additional step of going into databases and such. From what I see, there's no reason to have these tools in this way, and the documentation as it is, if you are not going to mix the political data for commercial reasons.

9:05 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

Okay.

Can you give us an example for those of us who are less experienced? You mentioned gun ownership. You mentioned living a biblical life and a few other examples. What's the most personal information you found?

9:05 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

The most personal information specific to the voter data breaches or just generally?

9:05 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

When you're talking about using different databases to combine a profile for an individual, how detailed a profile are we talking about?

9:05 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

The most detailed message you have sent to a loved one through any chat app could very easily be logged, archived, tied to your name.

9:05 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

You have come across examples like that?

9:05 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

Yes, but let me clarify. There is a separate Facebook-related incident that has not been reported at all yet—I'm working with a journalist right now to bring it to everyone's attention—that is not involved with Cambridge Analytica, as far as I know, but the number is 48 million people on that one. It does involve messages. The degree of privateness they were sent...is not quite determined yet, but they do get pretty personal.

9:05 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

You said that a number of different databases are being amalgamated, I guess, in some ways, to create these profiles. Can you give us the significance of these databases? Presumably, some of these are from the election databases you talked about. Are there other examples you can give us?

9:05 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

Yes. In their documentation, Aggregate IQ go into detail about their system. It starts with being bootstrapped by the RNC's Data Trust data vault, which is the Republican National Committee here in the United States. I had actually found the Data Trust database before it was part of the find in June 2017. It's quite extensive. It contains data as they merged with i360, which is a Koch brothers-backed political information company. Data Trust deleted a blog entry where they claim to have merged their data with i360.

There's also L2 Political. They provided data to this whole beast of a machine. That was admitted to on Cambridge Analytica's website recently.

Facebook is obviously part of it. The documentation by AggregateIQ goes on to explain that commercial databases are involved. I know that Experian is one that contributed data toward the RNC Deep Root Analytics data briefs that I found in 2017. I know that because there were Experian IDs being lined up to each voter ID with all the consumer habits being tied onto everybody.

AggregateIQ also states that candidates can bring in their own sources of volunteer and supporter and donator information. They'll aggregate all that into the main “database of truth”, as they call it. State voter files then corroborate what the RNC has on file.

So there's really no end to what they can plug into here.

9:10 a.m.

Liberal

Nathaniel Erskine-Smith Liberal Beaches—East York, ON

You mentioned there are porous walls between some of these entities, like AIQ, Cambridge Analytica, and others. When you access the AIQ information, can you give us an example of what that porous relationship looks like? What are some key examples where you see players across different companies accessing the same information?

9:10 a.m.

Director of Cyber Risk Research, UpGuard, As an Individual

Chris Vickery

Well, one example that is very appropriate, because it illustrates both the original discovery and the whole nature of this relationship, is an employee named Ali Yassine. I usually try not to name people, but I feel that it's important for you to know this for the purposes of looking into it. He was a full stack developer for SCL Group. On his public GitHub page, he had code that came out of AggregateIQ. I know this because I found it within AggregateIQ's code base, and it was marked as being authored by an AggregateIQ employee named Koji. So you have SCL and AggregateIQ that supposedly have no relationship but both working with the same code base. Then, further on down in the code base, there is a field that says “client”, and written in there is “Cambridge Analytica”. Now, I can't see why SCL Group would be saying that Cambridge Analytica is a client of theirs. They basically own Cambridge Analytica. SCL Group is the mother ship on top of that. The only reasonable explanation to me is that AggregateIQ would have been the one putting Cambridge Analytica as the client, then the code being passed to SCL Group, and that just not being changed immediately. There's a little triangle going on there.

I can also tell you that the GitLab logs very clearly show that with the Ripon project, which was primarily developed for Ted Cruz's 2016 campaign, the very initial seeds of it were downloaded from the domain scl.ripon.us, placed in the GitLab, and developed and evolved from there. Scl.ripon.us is a domain underneath Alexander Nix's name. He's the one who is registered under the WHOIS records. That's another example of code flowing from one to the other.

Also, there are examples that Cambridge Analytica has put forward, through their public statements, of data that they used. More recently, I guess they felt pressure to be transparent about where the data came from. They admitted that they got the RNC Data Trust data. The RNC IDs are all over the place in the fields, categories, targeting scripts, and parsers that are present in AggregateIQ's repository as well as in their documentation. So if data [Technical difficulty—Editor] directly from one to the other, they are certainly dealing with the same type of data.

9:10 a.m.

Conservative

The Chair Conservative Bob Zimmer

Thank you, Mr. Erskine-Smith. We'll have another round. We have two hours.

We'll go on to seven minutes for Mr. Kent.

9:10 a.m.

Conservative

Peter Kent Conservative Thornhill, ON

Thank you, Chair.

Thank you, Commissioner Therrien and Mr. Vickery, for being with us today.

Commissioner, I know you can't discuss the specifics of your formal investigation into Facebook and AIQ in the Canadian context, but I wonder whether you could share with us your anticipated timeline for completion of the investigation and an eventual report.

9:15 a.m.

Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Daniel Therrien

It's difficult to say. There are many factors at play here. Under the law, we have one year to complete our investigation. We'll obviously try to do so before then.

When you look at the allegations made, you see a fairly complex web of interactions between a number of players. These we will need to clarify. That could take a bit of time. We're also working in concert with other commissioners or data protection authorities. Of course, we're doing so with the Province of British Columbia, with which we're jointly doing this investigation, but we're also in contact with others, including in the U.K. but not limited to the U.K. There is, then, a bit of coordination.

What I'm saying is that this is somewhat complex, which may add to the time, but we have certainly at the most a goal of doing this within a year, and we'll try to do that before then.

9:15 a.m.

Conservative

Peter Kent Conservative Thornhill, ON

Thank you.

Now, your emphasized remarks again today calling for amendments to the Privacy Act to cover political parties' use of personal information carries significant new weight, given the information before us and the public regarding attempts, and perhaps some successes, to interfere with the democratic process in the recent U.S. election and in the Brexit vote in the United Kingdom. Certainly we have questions waiting for Mr. Wylie regarding his employment by the Liberal Party of Canada under two leaders between 2007 and 2009, his termination for what was described by one of the leaders as invasive elements of the work he was doing or was proposing be used, and then his re-employment by the Liberal research group after the 2015 election—in 2016—and payment of $100,000. Those are questions for another day.

But your request is that political parties be brought under legislation and regulated either under the Privacy Act or under the Elections Act of Canada. Which would you suggest would take priority?

9:15 a.m.

Privacy Commissioner of Canada, Office of the Privacy Commissioner of Canada

Daniel Therrien

I would say probably both, actually. The situation currently is that most federal political parties have privacy policies—internal codes of conduct, so to speak, in their relationship with the people with whom they interact and from whom they collect information. That's a start.

I think, first of all, the substance of these policies could be improved, from what we have seen. One common element missing from the privacy policies of federal parties is the right of individual electors to have access to the information that parties have about them. That's a huge flaw. There is, then, the issue of the substance. But these are voluntary codes, and no one independent of the parties examines whether the parties actually live up to the promise they're making in these policies. That leads me to a very important reason that political parties should be governed by legislation: to ensure that whatever substantive rules exist, hopefully better than what they are now, are verified by an independent third party.

Should that independent third party be the Privacy Commissioner, the Chief Electoral Officer, a third person? That can be discussed, but I think that what this leads to—leads me to, at least—is that there are at least two types of issues at play here. There's the issue of privacy and whether parties treat the personal information of individuals properly, which is a privacy issue that would make me, perhaps, the best person to look at the question. Then, the allegations that we have been seeing in the past few weeks lead to a mix of the use of personal information and privacy on one hand and political purposes on the other, which is more the domain of the Chief Electoral Officer. Ideally, I would say, the two institutions would be able to verify what is happening so that the expertise of each is put in common.

9:20 a.m.

Conservative

Peter Kent Conservative Thornhill, ON

I assume you watched Mr. Zuckerberg's testimony in Washington last week.