Mr. Chris Vickery (Director of Cyber Risk Research, UpGuard, As an Individual) at the Access to Information, Privacy and Ethics Committee

April 17th, 2018 / 8:50 a.m.

Chris Vickery Director of Cyber Risk Research, UpGuard, As an Individual

Good morning.

It is a pleasure to be appearing before you. I am grateful for the opportunity. I believe the matter before us is one of very great importance. Facebook is certainly one of the core elements involved, but I would urge all of you to keep an eye towards the very focused efforts of others who rely on Facebook as a pillar of their operations but not solely on Facebook; others who are tending to cause direct harm to what I believe is the institution of democracy itself as sort of an end goal of what they're working towards here.

In case you don't know anything about me, I am somewhat uniquely situated to speak on the topic. The majority of my work can be described as hunting down data breaches. I openly call myself a “data breach hunter”. Over the last several years, my reputation has grown to be one of a leading authority on the prevalence and causes of data breaches as well as common patterns of incident response by the affected entities. Please note, though, that the data breaches that I locate and secure are not the result of actual computer exploitation or malicious acts. This is just data that has been left out in the open for whatever reason, and nobody realized it until I came along and found it. You may think there probably wouldn't be that much of that, but you'd be surprised. There is quite an epidemic of misconfigurations out on the Internet.

Some examples of data that I've secured stem from Verizon; Viacom; Microsoft; Hewlett-Packard; the United States Department of Defense; the Mexican national institute of elections, the INE; a couple of international terrorism blacklists; as well as the 2016 Trump presidential campaign website. They were leaking a bit of information as well.

The sum total of the efforts I've undertaken has resulted in the safeguarding of nearly two billion records containing private information, so I am well versed in this stuff. I look forward to answering any questions you may have.

More on point, I would like to point out that two data breaches that I came across in December 2015 involved the United States voter registration in its entirety, all 50 states plus DC. The second time, in the December that I found it, they were more enhanced. They had private details about people, with various pieces of personality and behavioural things—whether or not somebody was a gun owner, whether or not they lived a biblical lifestyle.

Six months later, in 2016, I came across another nationwide U.S. voter registration database, this one even more enhanced, having details on whether or not somebody watched NASCAR, whether or not they were anti-abortion sentiment holders, or whether or not they likely owned a gun.

Then another set of nationwide records came to my attention. I downloaded them after finding them in June 2017. This would be the third round of complete U.S. voter registration records that I came across. This was 198 million records, ranking as the largest U.S. voter data breach in known history. I would like to point out that at the time of the discovery, not a single one of these database breaches were protected with even a username or a password. They were simply out in the open. If you knew where to look, anyone in the entire world could find them.

The AggregateIQ situation that brings me here today is one that first started on March 20 of this year—not that long ago. I didn't know who AggregateIQ was until March 20. I was fiddling around on an open public website called GitHub where the developers collaborate and publish open source code.

I saw a reference to @aggregateiq.com in relation to some SCL Group code that was out there and just available to the public. I followed the bread crumbs, figured out what AggregateIQ was, and noticed they had a sub-domain called GitLab. When I viewed gitlab.aggregateiq.com, it occurred to me that the registration was available, and they were in essence inviting the entire world to register for an account on their collaboration portal.

I proceeded to register an account, it let me in, and all of these tools, utilities, credentials, scripts, employee notes and issues, and merge requests were all present before me. I very quickly realized the importance of this and that there would be likely heavy interest from regulators, governments, and the populace of several nations, so I began downloading. Normally, I go to great efforts to protect anybody who may be affected by this type of thing, but the overwhelming public interest in knowing the truth behind what Cambridge Analytica, AggregateIQ, and SCL Group have been doing is a compelling factor in this particular situation. I don't want you to think I just run out there and hand out everyone's dirty laundry when these things are found. This is a different situation.

Again, keep in mind that anyone in the entire world with an Internet connection could have found the same thing, gotten an account the same way I did, and downloaded the exact same things, regardless of what nation they were in or what loyalties they might feel. This was completely exposed with no manner of protection whatsoever. A malicious actor could have taken it a step further in that there were, and are, database passwords, usernames, credentials, keys, and authentication methods documented in these files that I did not take advantage of. I did download them, but I did not go the extra step and use those passwords to access the additional databases.

If it were found by someone else, and they were of the persuasion that would take advantage of it, it could have been, and may be, a much more serious data breach than has been mentioned. They could be completely infiltrated. Every bit of data that has ever crossed through AggregateIQ's hands could be in the hands of anyone who had found this same exposure.

There are a few remaining questions that I have not been able to decipher fully that I believe your investigation should figure out. While I am still looking into quite a bit of the data, I have not come to the exact final conclusion as to what AggregateIQ's relationship is to SCL Group and Cambridge Analytica. The walls of the separation between those entities are very porous. It's clear that code access permissions and data have traversed between the three of them, and other groups, so I would implore you to get to the bottom of that.

The second question is to what extent, if any, restricted political and private data has been utilized by AggregateIQ or AggregateIQ employees for commercial profit-seeking ventures. I have found evidence of ad networks being developed under the same domain, one notably called Ad*Reach network—and there are a few Ad*Reach networks on the Internet, so make sure you're looking at the right one before going after anybody in a questioning manner—as well as aq-reach. One of the employees who was working at AIQ was doing simultaneous work for an ad company called easyAd Group AG, which is based in Switzerland and has subsidiaries in the U.S. and in Russia. I would love to know what work was being done and if any of the data travelling through AIQ was utilized in any of those ad campaigns or set-ups that the employee was working on at the same time.

See context to find out what was said next.