Evidence of meeting #18 for Government Operations and Estimates in the 41st Parliament, 2nd Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was actually.

A video is available from Parliament.

On the agenda

MPs speaking

Also speaking

David Eaves  Open Data Consultant, As an Individual
Renée Miller  Professor, Department of Computer Science, University of Toronto
Mark Gayler  Technology Strategist, Western Canada Public Sector, Microsoft Canada Inc.
Ginny Dybenko  Executive Director, Stratford Campus, University of Waterloo
Gordon O'Connor  Carleton—Mississippi Mills, CPC

9:35 a.m.

Executive Director, Stratford Campus, University of Waterloo

Ginny Dybenko

If your eventual goal is engagement, then there is nothing like asking someone to contribute to the dialogue in a real way to engage the individual constituents. The accuracy aside, I think it's a process that's well worth pursuing.

9:35 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you.

Thank you, Monsieur Trottier.

Mrs. Day, the floor is yours for five minutes.

9:35 a.m.

NDP

Anne-Marie Day NDP Charlesbourg—Haute-Saint-Charles, QC

Thank you, Mr. Chair.

My questions are mainly for Mr. Eaves or Ms. Miller.

As you know, our study is about improving the government's open data practices. More specifically, we want to look into how Canadian companies can have better access to high-value information with strong economic potential. So this study is part of an economic perspective. We are trying to find out how all this can be used within an economic vision.

Mr. Eaves, earlier, you talked about cross-sectional data—environmental data on the caribou and on logging potential. That was very interesting. One of our recent witnesses asked questions about that.

Earlier, for fun, I used my iPad to research something as simple as taxes. We are currently in the midst of the tax season. The search engine ranked Government of Canada data 13th, and Revenu Québec data 4th.

A Treasury Board representative was saying that, in terms of open data, Canada was doing fine and was well-positioned compared with other G8 countries. Yet I remain skeptical.

How can we increase data accessibility and people's interest? Why would people go on the Canadian government's website data.gc.ca, instead of using a search tool? How can we position ourselves to ensure that our data is used regularly? We will have very detailed big data, mainly with regard to universities, and research and development. How can Canada become a world leader? We are being told this is already the case, but is it really?

9:35 a.m.

Open Data Consultant, As an Individual

David Eaves

I will answer in English, as I cannot explain all the nuances in French.

There are a few things I would say.

First, I think we have to look at some timelines here that are going to matter. I love the point about students using open data in their research. Prior to the release of the open data portal, you had to pay for StatsCan data. That meant every single student in this country who was doing an undergrad paper or doing research used American data to do all of their work. All of their case studies were American-based, because the American data was free. Up until three years ago, everybody in Canada who did any kind of studies in university tended to gravitate towards American data.

Some of the economic benefits, then, will come from having a population that becomes more and more familiar with Canadian data and what's available. That will take us a process of several years, to have students who are going through college and in their studies beginning to familiarize themselves with what's possible and what's available and then entering the workforce and bringing that to the companies where they work. I do want us to make sure that we have some expectations about how long some of the transformation will take.

That would be the first piece. The second is I think to have a really strategic vision about what the industries are that we want to support that we have data around, and what the policy goals are that we think we can pursue that would enhance those industries. One thing we do know is that data, in and of itself, even when it almost never gets used, can have a transformative impact on how industry operates.

One of the best examples of this was the release of the TRI, which is the pollution data in the United States. In Canada we have something familiar, called the NPRI data. This is the data that every facility in the country must release about how much pollution it released. The very creation of that dataset caused a huge number of facilities in the United States to lower the amount of pollution they were releasing. They became more efficient and more environmentally sensitive just because they now knew that everybody in the world could come and look at what they were releasing.

As a government, it would be interesting for us to think about what was the data that, if we knew we had it, would enable our economy to become more productive and more effective, and then we had a pursuit around how to gather that data and how to share it in a way that industry could leverage or community groups could leverage.

In fact there was an article about that just this morning. There's enormous concern about a potential housing bubble in Canada. At the end of the day, as the lead economist on this issue at CIBC said, we don't actually gather data that would allow us to assess whether or not there is a bubble.

So if we're looking at the various industries that are out there and where the deficiencies on data are, economists and industry experts, people in the industry, are already telling us where we're deficient. I think the question we need to be asking ourselves is this: what role does government have in creating those datasets and curating them to help the economy reach its maximum potential?

9:40 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you, Mrs. Day.

Mr. Aspin, you have five minutes.

9:40 a.m.

Conservative

Jay Aspin Conservative Nipissing—Timiskaming, ON

Thank you, Chair.

Welcome to our witnesses. Obviously we have a wealth of information to help us with our study.

I was rather intrigued, David, by your analogy of the bulls. I liked that analogy that you're in the pack or you're the leader.

If we want to assume a leadership role in Canada in this data argument, what is the number one factor that we should pursue? Maybe I could get a priority from each of you. Or should we in fact be the leader at all?

Maybe we'll start with you, David.

9:40 a.m.

Open Data Consultant, As an Individual

David Eaves

I was really hoping we were going to go in reverse order for once.

I don't know whether this takes us out of the pack. This is going to be boring and technical, but the danger we have with open data right now is the thing we're tacking on at the end of the process. You have a government that creates data, analyzes it, does interesting things, and then at the very end we tack this thing on saying, by the way, you have to make it public with the rest of the world.

As a result, our open data initiative has a compliance problem. It's something ministries do.... It's rather like access to information: they don't really want to be doing it. They have to be doing it because the government has asked, but it doesn't actually support a business need right now at the ministry.

My argument would be that if you want to be a genuine leader and want to be thinking about what a government looks like in the 21st century, you have to stop thinking about the data as being an end product that sits at the top or at the end of the process, but rather as being core infrastructure for running government and as the platform upon which all good decisions and all government rests.

I talk about the term “dogfooding", which is when you use your own materials. You don't just publish data and hope other people are going to use it; you dogfood it: you create it and then you build your own infrastructure on top of it.

If we expect industry to be using government data, they're only going to start using it and really believing that we're committed to it when we're using it as well and build our own infrastructure on it.

So the number one thing I would do is go from here to there.

9:40 a.m.

Professor, Department of Computer Science, University of Toronto

Dr. Renée Miller

I'm responding to both questions too.

I think we shouldn't worry that the government data is not necessarily what is returned in search engines and so forth. I think what we should do is understand to what extent government data has been taken up by researchers and by industry and made into higher quality data.

David alluded to the fact that most researchers in Canada use data.gov data to do their research, and I can attest to that: my graduate students use data.gov data to do their research. But we republish it as richer data, using what we have done because we have gone in and found data that is interesting to us. In terms of that information flow, we have to both understand what data has been taken up by the community and use that understanding to motivate what additional data we provide through the open data portal.

So we use the expertise of the crowd to come back and say that actually we can improve the data we're putting out, to better spur economic growth.

9:40 a.m.

Conservative

Jay Aspin Conservative Nipissing—Timiskaming, ON

Thank you.

Mark?

9:40 a.m.

Technology Strategist, Western Canada Public Sector, Microsoft Canada Inc.

Mark Gayler

One of the areas I would focus on is data that is locked in siloed government data stores.

I'll give you a very simple example, I was working with the Government of Slovenia and I met with their bureau of statistics. One of the challenges they had is.... I don't know whether anybody here has worked with statistical data, but government statistical data is often locked inside very specific, very narrow, and niche statistics systems and is made available in very strange and, even from a technology point of view, almost impenetrable data formats. What was interesting was that the Slovenian bureau of statistics people were very familiar with the Canadian bureau of statistics, so from one statistics bureau to another they had a relationship and were familiar with each other's work. However, from a broad citizen perspective, the citizens really couldn't get easy access to this data.

I think the point was made before that much of the data that's locked in some of those siloed government data stores is really rich, valuable data for citizens, analysts, researchers, and even private entities. It's worth looking at how we get that data out of locked systems and make it more available to end users, citizens, and consumers using common tools and access methods that they have today.

9:45 a.m.

Conservative

Jay Aspin Conservative Nipissing—Timiskaming, ON

Thank you, Mark.

Now, Ginny?

9:45 a.m.

Executive Director, Stratford Campus, University of Waterloo

Ginny Dybenko

As highlighted in the SSHRC consultation document produced in October of last year, I believe we would benefit hugely in Canada from developing a more forward-looking digital research environment. Specifically that document calls for the development of a coordinated plan to establish and operate a number of world-class centres specializing in data management. Indeed, I think $3 million was already targeted toward the Open Data Institute in Waterloo at last budget.

9:45 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you for your answers.

Mr. Bryne, you have five minutes.

April 3rd, 2014 / 9:45 a.m.

Liberal

Gerry Byrne Liberal Humber—St. Barbe—Baie Verte, NL

Thank you very much, Mr. Chair.

Thank you to our witnesses for giving us a really excellent presentation and perspective on this.

I want to get some feedback on the reconciliation of data integrity within an open data environment.

I think some of the perspective here is that there's a single portal, a single channel, and a single standard for the integrity of the data, so that when you plug into the portal you are getting a set of data that has been tested and that you know to be authentic and for the integrity of which someone is accountable.

Renée, in your presentation you included the aspect of having a discussion about how many beds are available at a homeless shelter. That almost seems more of a blog. If there isn't credibility and authenticity of the data; if it is not tested and someone is not accountable for the data.... Undoubtedly, there will always be mistakes, no matter what standard you create, but there has to be a relatively highly certifiable standard for inclusion into an open data project; otherwise, it could be termed just a blog.

Could I get some perspective on that notion of the single data integrity concept? Governments have one perspective on all of this: they are accountable, or at least they have the capacity to be accountable. A group of community-based organizations with limited funds in a municipal environment has a lesser standard, I think it's fair to say.

Could you give a little bit of perspective on that?

9:45 a.m.

Professor, Department of Computer Science, University of Toronto

Dr. Renée Miller

Sure.

I would take the example of Wikipedia. From a broad community with a broad set of expertise, you can come down to finding good, high-quality information. Is everything in Wikipedia true? Absolutely not.

I think there are certain things you can use the power of the crowd and the aggregate opinions of the crowd for. Allocating resources in real time as to where you see the resources should go is, I think, a good use for that information.

If you're trying to do longitudinal studies, you probably need some oversight over the meaning of the data and need some curation over the data itself. I think we shouldn't, though, dismiss community-provided data just because it's not curated and may not have the same level of integrity, because it can still provide incredibly valuable information for people, particularly for public workers on the ground. It can give them a sense, a signal about where their resources, their information should go.

That is very different from an historian's trying to pin down exactly what happened. I think we have to weigh the differences that exist.

9:50 a.m.

Liberal

Gerry Byrne Liberal Humber—St. Barbe—Baie Verte, NL

Thank you.

David?

I would like to go to our teleconference as well, so just—

9:50 a.m.

Open Data Consultant, As an Individual

David Eaves

I'll try to be brief.

I agree with you. I think there needs to be accountability, especially around datasets that government is using to make decisions.

I am interested in crowdsourcing, but I think there are incredible limits around how to do it. Even in the example of peer to patent—it's a wonderful example—there are very tight constraints around what it makes it work. It's very easy to use crowdsourcing to disprove things. You may be identifying cases in which something is actually not true, such as identifying patents that are not valid; it's also great to identify datasets that are in error. It's much harder to use it to identify what is actually truthful or is actually a fact.

So one of the nice things is that we should be treating our open data portals as an engagement tool because they're actually a wonderful way to crowdsource errors, not because we want to find errors and make people accountable. There are always going to be errors in the data, so let's surface them more quickly so that we can then get to better quality data faster, so that governments make better decisions with more reliable datasets.

9:50 a.m.

Liberal

Gerry Byrne Liberal Humber—St. Barbe—Baie Verte, NL

Chair, could we go to Ginny and Mark?

9:50 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

I could give one minute to each of you.

9:50 a.m.

Technology Strategist, Western Canada Public Sector, Microsoft Canada Inc.

Mark Gayler

Sure.

The comment I was going to make on this is that first of all it's important that you have attribution: who is accountable for curating a particular dataset? That's very key here.

The second thing I would say is that it's very important to have an agreed feedback loop whereby, if you choose to crowdsource the accuracy of the data and you invite third parties to participate in it, you have a feedback loop that enables them to do it effectively, so that people see that the data gets updated within that authority—the authority of source of that data—and that the more accurate data is then reflected on a timely basis.

If you have that feedback loop, I think you then give people confidence that this is a real and a sustainable thing and that the quality of the data is improving over time. It's not something you can do as a one-off or a “let's try it and see”; I think you have to have that feedback loop and sustainability effort on top of it.

9:50 a.m.

Executive Director, Stratford Campus, University of Waterloo

Ginny Dybenko

Mark said it perfectly. I have nothing to add.

9:50 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you.

Ms. Ablonczy now has the floor for five minutes.

9:50 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

We wish we had a day with each of you because this is a very rich discussion.

As you know, this is part of a G-8 initiative, and there's been commitment by a number of countries to move in the direction that we're talking about, so I'd like each of you to focus on the internationalization or the global collaboration of the open data initiative. Although Canada may or may not be a leader, there is a mastermind principle that we want to tap into of sharing best practices and learning from others. I'd be interested in your observations on how Canada can improve its collaboration on the open data initiative and where we should put the most focus with our partners.

Mark, why don't you start?

9:50 a.m.

Technology Strategist, Western Canada Public Sector, Microsoft Canada Inc.

Mark Gayler

I think one of the things that I would point to, as a way of responding to this question, is that we're starting to see some interesting relationships outside of the traditional, say, government/industry relationship pattern, particularly around open data. One example that I would give you here is how the World Bank is starting to allocate some of its investments in stimulus funding. It now requires countries, nations, and states that it's working with to have an open data policy and to be able to provide evidence that they are being more transparent with their use of data and providing data services to citizens. That's happening sort of globally.

If we look at that as an example, I think Canada can learn from these examples and encourage similar relationships between government and industry participants because the more you join these collaborations together, the more participants you get working together, the richer the data becomes, and I think the impact of the data is more powerful on the community.

9:55 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

Ginny.