Evidence of meeting #22 for Government Operations and Estimates in the 41st Parliament, 2nd Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was actually.

A recording is available from Parliament.

On the agenda

MPs speaking

Also speaking

Michael Chui  Partner, McKinsey Global Institute, McKinsey and Company
Paul Baker  Chief Executive Officer, Chicago Open Data Institute
Gordon O'Connor  Carleton—Mississippi Mills, CPC

9:10 a.m.

Partner, McKinsey Global Institute, McKinsey and Company

Michael Chui

There are a couple of things. One is that our aim, as part of the McKinsey Global Institute, is to inform the policy discussion, but just to be clear, we actually don't provide direct policy recommendations. That being said, I think the topic you bring up around privacy of individual data is an incredibly important one, one that people care deeply about.

I don't know that there's a blanket ability to solve that problem, because many people disagree about exactly what data should be used for what purpose. I do think one interesting direction to think about regarding policy is that it is increasingly difficult to control the creation of data, and sometimes even the distribution of data, although I think thoughtful policy can be used to regulate that. Often what people find most objectionable is particular uses of data. It's not that the data exists, as you described it, but when it's used for particular purposes is when oftentimes people find it objectionable.

I think one interesting policy direction to take is to identify the uses of data that we don't want to have happen, and to regulate or legislate those uses, rather than the data itself.

9:15 a.m.

NDP

Anne-Marie Day NDP Charlesbourg—Haute-Saint-Charles, QC

Can you tell us whether the U.S. has that kind of legislation to protect citizens' privacy?

9:15 a.m.

Partner, McKinsey Global Institute, McKinsey and Company

Michael Chui

I can't say that I have a comprehensive perspective, but for instance, there is legislation around the release of video rental records. That was a very specific example of a use of data, and the legislators decided it was not data that should be released. It's not a terribly wide-ranging example, but it is a specific example of where that has happened.

9:15 a.m.

NDP

Anne-Marie Day NDP Charlesbourg—Haute-Saint-Charles, QC

Can you tell us....

9:15 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

Could I jump in?

9:15 a.m.

NDP

Anne-Marie Day NDP Charlesbourg—Haute-Saint-Charles, QC

Go ahead.

9:15 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

There are a couple of specific laws in the U.S. related to privacy. One is HIPAA, which is on medical privacy. It was a major issue before we had the Affordable Care Act, because if insurance companies found out you had a particular disease, they would bar you from having insurance, or employers wouldn't hire you because you would be more expensive, that type of thing. It's less of an issue now.

There are also very strict laws around student data privacy. With elementary and high school students, you really can't reveal anything about them. That doesn't mean that data should not be shared among people who are authorized to use and see the data. For instance, there's one initiative we're working on now. In school, one of the best indicators that there's a problem is, for example, if a second-grader doesn't show up at school, and they're not sick, chances are, especially in low-income communities, the family has a problem of one sort or another. What will happen is the child won't show up. Someone from a social service agency will visit him. They'll maybe interview the family for 15 or 20 minutes, and make a decision whether the child should stay in the family or not, that type of thing. They have almost no data about the kid. They don't have any attendance data, grades data. They don't know whether the mother is in a drug- or alcohol-treatment program. They don't know whether the father is there, or if the father is there, whether he has post-traumatic stress syndrome from being in Iraq or Afghanistan, or something like that.

Compare that with a situation where you're driving your car with a broken tail light and you get pulled over by a motorcycle policeman. The policeman takes your wallet, walks back to his motorcycle, and checks you against 64 different databases. Why can't we serve children in the same way and share data privately among people dealing with low-income kids to try to improve their lives?

There are situations where you obviously don't want to share that data with the public, but you do want to share it with agencies, with teachers, with principals so they can help families develop.

9:15 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you very much for your answers.

I will now turn the floor over to Mr. O'Connor for five minutes.

9:15 a.m.

Gordon O'Connor Carleton—Mississippi Mills, CPC

Good morning, gentlemen.

One of the things we wonder about is whether the federal government is in line with other countries or not, whether they're producing enough data. We have a statistic here that says 193,000 data sets are on the open data portal of the Government of Canada. How does this compare to other countries or other areas?

I'll ask both of you to comment on that.

9:15 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

The numbers vary. In the City of Chicago everyone always said there were 800 and some data sets, and I just talked to a guy who is in charge of the data sets yesterday and he said there are something like 550 data sets. The Obama administration originally, when they announced data.gov, said there were 190,000 data sets or something like that. Now they've reduced that to 54,000 data sets.

Even though the federal government in the U.S. has released lots and lots of data sets, and there's been a lot of data released for a very long time—weather data, satellite data, road-related data, geographic data—really, going back 20, 30,or 40 years, there's a study recently saying that even with all the open data efforts in the U.S., less than 10% of federal agencies actually release significant amounts of data. Much of that is because they make a fair amount of money by selling data either to other agencies or within their own agency. It's kind of a net wash. One agency is paying another agency for data, and the other agency is receiving income, which doesn't make much sense when you're within the same organization but it's part of the accounting rules that's creating inefficiency there.

We would like to see lots more data. There is a lot of data all over the world. We're members of the international Open Data Institute. We're the Chicago node. The U.K. has made tremendous strides in releasing data. France is releasing quite a bit more data. We have a fellow in our office right now who's in charge of the open data portals for the Dominican Republic, which didn't have a particularly democratic government for a very long time. Even there, there's an alliance of progressives and conservatives who want to release data for budget accountability, political accountability, so you can have an alliance among both sides of the political spectrum. We've been talking with them and learning a lot about what's going on there.

Things are happening. There's an effort in the Philippines, and even Russia is taking steps towards releasing more data. There are about 17 nodes in the Open Data Institute worldwide now. We were the first node just last October, and between then and now there are 16 more nodes. Many of them are in countries that were not democratic 10, 15, or 20 years ago.

9:20 a.m.

Partner, McKinsey Global Institute, McKinsey and Company

Michael Chui

I would just add a couple of comments.

It is actually not straightforward to compare countries in terms of their “number of data sets”, because you could easily combine two data sets and say it was one. That's probably not a great metric. It is actually difficult to come up with a good metric for it. Certainly, you don't just want to count the amount of data that's out there. Ideally, you'd want to count the impact that data is having on the economy, which is a much more challenging thing to do. That being said, the Open Knowledge Foundation does have at least one benchmark and they try to compare countries. I wouldn't say I necessarily endorse it, but it's interesting to look at.

Finally, if you'd indulge me, I would go back to the privacy question. I think the approach taken in HIPAA versus at least one aspect of the Affordable Care Act in the U.S. does illustrate the difference between legislating the generation or control of data itself, which is HIPAA, versus the Affordable Care Act where you are not allowed to use data in order to discriminate about who is covered by insurance. That's basically the difference, whether or not you're allowed to have data at all or how you use it. I think that's a difference in the way you can legislate policy with regard to privacy.

9:20 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you, Mr. O'Connor. Your time is up.

I will now give the floor to Mr. Byrne for five minutes.

May 1st, 2014 / 9:20 a.m.

Liberal

Gerry Byrne Liberal Humber—St. Barbe—Baie Verte, NL

The conversation seems to be somewhat dominated by economics and economic capacity and opportunity. I think there are two stovepipes in this discussion. One is the economic opportunity and the other is civil society and their expectations that open data could lead to new transparency and new information being involved.

Do either one of you see a conflict in terms of the progression, the evolution, of open data? Is there one stovepipe over the other that seems to be dominating the evolution of open data? Are there pitfalls or concerns that either one of you had? I say this because you both come at this from somewhat different perspectives, not disharmonious, but different perspectives. I'd love to receive your perceptions on that question.

Mr. Chui, would you like to begin?

9:20 a.m.

Partner, McKinsey Global Institute, McKinsey and Company

Michael Chui

Sure.

First of all, I think the use of open data to create transparency and accountability in our public institutions is incredibly important. We focused our report on the economic potential, partly because we thought that was part of the dialogue that hadn't been explored quite enough. In addition, we focused it because of our expertise as part of a management consulting firm doing economics and business research. Certainly we agree. I don't think there's conflict necessarily. I'm not sure I'd describe them as stovepipes, but rather as different benefits that you can obtain from open data. Certainly we believe that accountability and transparency are incredibly important as well. We don't think that deriving economic value reduces the potential impact on civil society.

9:25 a.m.

Liberal

Gerry Byrne Liberal Humber—St. Barbe—Baie Verte, NL

Thank you.

Mr. Baker.

9:25 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

The Open Data Institute's Tim Berners-Lee founded the group, and he invented the web. The slogan he came up with was: knowledge for everybody. This idea is it's not just the government leaders, not just the corporate leaders who should know intimately what's going on, but ordinary people should have access to similar levels of information to make democratic decisions, to be informed when they vote or participate in politics. From ODI's point of view, it's both the democratizing aspects and the economic benefit.

The Open Data Institute aggressively supports businesses using open data to create businesses partly for social good. There are some major problems, things like climate change. We're working on a project now to link Arctic researchers so they can share information more quickly, and we can make more progress on researching climate change, sea ice floes. Sea ice is melting very quickly, glaciers are, but if we can accelerate the research process by sharing data, space on the icebreakers.... It costs $50,000 a day to rent an icebreaker. If you can have three or four different teams renting space for a few days at the same time, you can cut down the cost, that type of thing.

There are many areas. Obviously, we've taken a lot of steps backward with the Billionaires United decision in the U.S., where a lot of secret money is now going into politics, and it's a serious problem, but there are many public issues. For instance, there's lot of controversy around charter schools in the U.S. A lot of groups, like the League of Women Voters, the PTA, are trying to discover data to try to figure out whether charter schools are doing good or bad. The way a lot of the laws are written, charter schools don't have to release as much data as the regular public schools do, so it's very hard to compare apples and oranges. On one side, on the charter school side, you have a big PR effort going on, sponsored by millionaires again, who want to take a certain portion of the public education budget. On the other side, you have teachers and parents who like the public schools and want to keep them. We don't have knowledge for everybody here. The charter school operators know what they're doing. They don't reveal it. We're also involved with Common Cause in another effort to try to uncover some of that data.

I would say both are very important. It's hard to know which is more important.

9:25 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you, Mr. Byrne.

Your time is up.

We now go to Ms. Ablonczy for five minutes.

9:25 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

Thank you, gentlemen, for sharing your expertise with us.

As you can tell, we don't just want to put out a lot of information. That's part of it, but we're struggling with how to focus that into positive results, to value-add for government, for business, for organizations. That's why we're particularly interested in your studies.

Can you boil it down for us? If you wanted us to focus on the best results you think could be garnered from the open data project, and how we could foster them and how we could incentivize people to use them, what would you say are the biggest bangs for the buck?

9:30 a.m.

Partner, McKinsey Global Institute, McKinsey and Company

Michael Chui

As an advocate of using data, I think it would potentially be a mistake for me to throw out ideas without actually doing any analysis. I think the thing you would want to do is to take the type of analysis that we did here globally and apply it to Canada and try to understand what types of data could potentially create the most value in Canada.

As we looked at it we did find, in transportation for instance, tremendous potential. In education we found tremendous potential. What you would need to do is to try to understand, from a Canadian perspective, where the most bang for the buck would be. I think it would be a mistake for me to speculate without having actually done some analysis, given that we actually believe in data as the way to make those decisions.

That being said, I do think the other insight you had was incredibly important, which is that “field of dreams” doesn't work by itself. You make the data available and benefits occur. You actually need to create a vibrant ecosystem of users of data. What that means is engaging with people who will develop programs, develop apps that actually use the data that make it useful to companies and individual citizens, etc.

As I said before, that's almost a marketing game. What we've seen here in the United States, the government has done things like create events. I know that in Canada, for instance, I was in Toronto for the CODE hackathon. It was one of those events that made it more noticeable to people who write computer programs that in fact there's a vast source of data they can use to create new applications. Events and contests, even advertising, quite frankly, are some of the things that have to happen in order to create an ecosystem of, let's call it loyalty. The customers of the data who are the developers of applications have to first, be aware, and second, incented to create applications using open data.

9:30 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

Paul, do you have any thoughts on that?

9:30 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

Yes. I like to look at the really big issues, which in my opinion are climate change and medicine right now, at least two of the kind of scientifically oriented areas. There's tremendous activity there. In the climate change area the U.S. federal government has mandated that anyone who gets federal money for scientific research has to release their data within a year. That's fairly recently. Informally it has been going on for a couple of years now, but people haven't adhered to that policy particularly well. From now on, I think they're going to adhere and that is going to accelerate the pace of scientific research, including on climate change which, at least for people who believe climate change is happening, is extremely important.

We see a tremendous amount of energy also in medicine. Silicon Valley is now “disrupting”—an overused word—medicine in many ways. The federal government and even some drug companies are sharing data about their drug studies. A couple of companies have committed to releasing all of their data related to the drug studies they've done, which could help treatments and could also help better figure out what medicines actually work and what medicines don't work.

There's an effort in the U.K. where half a million people have agreed to have their genes analyzed so they can combine the genomic data with their electronic medical record data. Kaiser Permanente in the bay area has a million of their patients agreeing to have their genes analyzed and combined with their EMR data. They're opening that data up to qualified researchers who follow privacy procedures.The promise in that area is simply tremendous.

There's also crowd-sourced medical data which is really interesting. For instance, some researchers and doctors in California have created a mobile phone app where, if you have what you think might be a cancerous mole, you can take a picture, and that picture goes into a database and then when the mole is removed and analyzed, you come back and the application tells you whether it was cancerous or not. They've accumulated enough data now that by simply taking a picture of a mole, you can get a pretty good probability as to whether you should have it analyzed or not. They're using artificial intelligence to analyze the colour, the pattern on the outside, the size, all that kind of stuff. They're building up crowd-sourced ways of doing this.

A lot of people have lots of moles, but if you have a mole you're worried about and think you should get it tested, you're much more likely to get it tested. Every time you test, it's another $1,000.

9:35 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you. I have to stop you here, Mr. Baker.

It is now Mr. Ravignat's turn.

9:35 a.m.

NDP

Mathieu Ravignat NDP Pontiac, QC

I thought the Chicago Open Data Institute projects were especially interesting. You have carried out many of them. Congratulations on your research and your analyses.

I would like to talk to you specifically about the Chicago Lobbyists project. Lobbying at all levels of government is a concern for me.

In your portal, where does the information you have gathered come from?

9:35 a.m.

Chief Executive Officer, Chicago Open Data Institute

Paul Baker

It comes from the City of Chicago. The City of Chicago actually had been collecting lobbying data for quite some time. Under Mayor Daley they just didn't release it, and then when Mayor Emanuel came in, he released it within the first couple of months of his term.

There were initially 14 different data sets, while now there are 17 or 18 different data sets. There was a handful of us who had been advocating for open data for quite a few years. During the Daley administration, we hadn't really been able to get a significant amount, and when Emanuel starting releasing stuff, we were complaining once he actually gave us some data that it was incumbent on us to actually do something with it.

I started looking for data sets we could do something with. I saw these 14 lobbying data sets, but if you look at one data set, no one can learn anything from it, so we basically got volunteers. We got a couple of guys from Groupon together, three people from Webitects, a couple of volunteers. One woman was riding her bicycle across the country, going to work for Code for America in San Francisco. She stopped in the city and Google sponsored a hackathon. We got this team together. We went from seven o'clock in the morning on the hackathon, and by seven o'clock in the evening, we pretty much had it done. Some were really high-quality designers and developers, and we all worked together, but the interesting part—

9:35 a.m.

NDP

Mathieu Ravignat NDP Pontiac, QC

I have to interrupt you to ask another question.

Has the government shown more transparency in that area? Has data been used effectively to protect the public and reduce the number of conflicts of interest?