Evidence of meeting #13 for Government Operations and Estimates in the 41st Parliament, 2nd Session. (The original version is on Parliament’s site, as are the minutes.) The winning word was metadata.

A video is available from Parliament.

On the agenda

MPs speaking

Also speaking

Corinne Charette  Chief Information Officer of the Government of Canada, Treasury Board Secretariat
Stephen Walker  Senior Director, Information Management Decision, Chief Information Officer Branch, Treasury Board Secretariat
Gordon O'Connor  Carleton—Mississippi Mills, CPC
Sylvain Latour  Director, Open Government Secretariat , Treasury Board Secretariat

9:15 a.m.

NDP

Pat Martin NDP Winnipeg Centre, MB

Thank you, Chair, and thank you, witnesses.

I think it is useful to take a minute or two now before we proceed to your main presentations so committee members have a better grasp of the scope of what we're trying to achieve. I still have questions. Just as I think I get it, I realize that I don't quite get it. There's more here than meets the eye. You mentioned the meeting in Northern Ireland where the G-8 members committed to the G-8 open government charter. Is that how you phrase it?

My questions are twofold. First of all, the open data initiative, there is great hope throughout the land among the access and privacy communities that maybe this is it. No more will we be frustrated with access to information requests where it takes a thousand days and costs $10,000 to get a tidbit of information out of the government. It will all be there, and we can simply go and look for it. That's the best case scenario that everyone dreams of if it's true open government.

Who gets to decide what is revealed on the open government portals? Who will do the editorializing? Who will do the redacting, when you black things out? You must have a department of redaction that will be carefully redacting everything that the government doesn't want to release now, surely they didn't want to release then. Who, in your world, does the editing?

Second, Mr. O'Connor hit the nail on the head with his first question, this issue of metadata. I see that the third commitment in the Open Data Charter of the G-8 is to contribute to the G-8 metadata mapping exercise. That's where CSEC comes in. There are 2,200 employees in a building worth $1.2 billion doing this metadata tracking. The budget and the scale of this initiative, if you include how many thousands of employees you have in Treasury Board who are engaged in this, what is your total budget? Why does it take 2,200 employees to track the emails that Canadians send to each other as per our obligation to this G-8 metadata mapping? Maybe you could expand a bit on what our commitment is, too. Are we involved in an international tracking of what everybody is saying to everybody else in the world? Is that what's costing so much money?

9:20 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

One thing that's important to clarify is that the mapping of metadata exercise referred to in the G-8 charter is about standardizing the type of metadata. It has nothing to do with what CSEC does or information that they do or don't collect.

It is about defining standards for describing data sets internationally. For instance, if we were going to define standards for how we would describe a lake and the geospatial qualities of a lake, all jurisdictions would describe those geospatial properties with the following four, five, or six attributes and these would be understood internationally. If we were mashing up data about a lake from data sets in the U.K., the U.S., and Canada, we would be working with the same kind of data but from individual data sets with information from different jurisdictions.

9:20 a.m.

NDP

Pat Martin NDP Winnipeg Centre, MB

Let me get that straight then, you're not mapping my correspondence to Mr. O'Connor during the night, after hours. The things that I say to him back and forth regularly is not what you're trying to track.

9:20 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

We're not tracking anything. In fact, metadata and this mapping exercise—and you raise a good point that we'll have to be a lot crisper with our terminology, so thank you—are really about aligning the data standards for how we describe these data sets internationally and ensuring that developers in civil society can mash up open data sets internationally because that's really their value. Be it in health, in environment, in economic terms, or whatever, do we describe this kind of data the same way? Because that is key. That's number one.

Number two, in terms of the number of employees we have doing this at Treasury Board Secretariat, I think it's about 10. That's the extent of our department, but I can guarantee you that we are not over-resourced to do so, and we work with colleagues across the departments and agencies in the federal government who, a lot of times, are not dedicated open data specialists. Quite the contrary, they're working in program areas, and they are doing this because they are equally committed to open data.

In the last point you raised, you highlighted the difference between open data and open information. Open data is not about access to information, although it certainly is in support of accountability and transparency and trust in government. Open data is about making the data that we do collect, this machine-readable fundamental data, available. Open information is about publishing documents or reports online, which would eliminate the need for an access request for those documents or reports.

For instance, right now, and also as part of our open government action plan commitments, we committed to publishing searchable summaries of ATI requests. In fact, ATI request summaries have been available now for over a year, I believe, since we started posting the summaries. People can go to the data sets and read through and select some of them and so on. Open information is another stream of activity in the open government action plan, but it is not the open data.

Open data is really about this great machine-reusable data. In the past, you could have received a data set through an access request; however, you certainly don't need an access request. You can go online anywhere in the world, search through the catalogue as we'll shortly see, and download it to your own computer or PC or your own CD, and work with it as you like.

9:25 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

I will give you the time to give your presentation. It will then be possible for committee members to ask other questions.

9:25 a.m.

Senior Director, Information Management Decision, Chief Information Officer Branch, Treasury Board Secretariat

Stephen Walker

I'm going to start with the page that you can see up on the screen now. This is the home page for data.gc.ca. This is our one stop shop for all of the open data that the Government of Canada makes available at any time to citizens, researchers, voluntary organizations, the private sector, the media. It maintains access and discovery of all open data. It also has some other open government activities, but I'm going to focus mostly on the open data.

The page has been designed with large tiles, as you can see, so that you can easily find what all the features on the site are, so that you can quickly jump to the information that you're looking for. Key, I think, for our conversation today is that tile in the top left-hand corner, the search data.

I'm going to proceed as if I was an average open data user. I click on “search data” and I'm going to pretend that I'm somebody who's looking to buy a new house and I'm interested in what the safety is in the neighbourhood that I'm considering buying a house in. I would type in crime, for example, hit submit, and all the data that is currently made available from the Government of Canada related to crime would come up.

Oftentimes there is a lot of data. We need to be able to help the user filter down those results to a smaller amount, so that they can find what they're looking for a little bit faster. I could reorder the data sets alphabetically or by the date that they were actually created or last modified, and their relevance. I'm going to leave it at relevance for now.

The left-hand side provides a whole variety of filters that can be used to narrow down the search results. I'm going to keep proceeding as if I'm looking for neighbourhood safety information, so I'm going down and see that under subject there's law. I click on law. The number of data sets comes down, still a fair amount, but as I move down the data sets looking for the information I'm looking for, I see crime statistics for Canada, the provinces and the territories, and I know that that's the information that I'm looking for.

I click on that data set and this is the metadata. This slide provides all of the information that we have on that data set, who the publisher is—in this case it's Statistics Canada—what subject it falls under, the date it was last published, and a short description and title of the data set. Those for us are the mandatory metadata fields that must be completed by any department or agency that is making data available.

Before I go into the specific information related to this data set, because I want to point out some of the features for each data set, I want to highlight the licence, which is right here. This licence is a significant point of progress for us working within open data.

As Corinne mentioned, it used to be that individual departments would make their information and data available under a variety of licences, most of which were several pages long and written in legalese that was very difficult to understand by the layman and oftentimes there were specific restrictions on the reuse of that data. For example, most often you couldn't reuse that data commercially.

Our new licence is written in plain language. It's extremely simple. It's based on best practices for open licensing internationally. We are sharing this licence with other jurisdictions within Canada, promoting adoption on a pan-Canadian level, so that data users will be able to bring data together from multiple jurisdictions within Canada at both the provincial and the jurisdictional level, and that can be combined and mashed together.

I'll just go back to the data set and point out a couple more features.

Back before we launched the most recent version of data.gc.ca this past summer, we held a series of round tables across the country with the open data community to hear what it was they would most like to see in the revised, revamped open data portal.

They wanted the ability to rate the data sets themselves and tell us what they thought of the data. They wanted to be able to provide individual comments on that data in the hopes that we could perhaps improve that data. They wanted to be able to share the data easily with others.

All of those features have been incorporated into the new data.gc.ca. You'll see up here on the right side that you can rate the data. It's a five maple-leaf scale. You would simply pick the rating that you're giving the data. You can provide individual comments below and then submit, and it becomes part of the ongoing consumer rating of that data. You can share the data via Facebook, Google, or Twitter, and you can provide comment on the data and share those comments with all other users of that data.

If I want to download the data, I simply click on one of these buttons. The data is made available in different file formats to ensure flexibility of use by the individual users. The data sets are made available in French and English, and there is additional supporting documentation to help the users use the data, and all of these are one-button downloads. Press the button and the data downloads—I won't do that right now.

I should just mention before I leave this page that there's an openness rating down at the bottom. We've incorporated the use of an international openness scale that's used by other jurisdictions to indicate the level of openness of the data sets. It's based on the five-star scale. Most of the data that we hold is three stars and above. This speaks to whether or not the data is being made available in a well-structured format, whether or not you require proprietary software in order to be able to open the data instead of an open software program, and we, the U.S., the U.K., and a variety of other jurisdictions, are using this scale.

If I go back to the search page and pretend that I wasn't able to find the data set that I was looking for, there's a button that says “Can't find what you're looking for?” up at the top. We're very keen to get feedback and information from potential open data users on what data they'd like to see that we haven't yet made available. That helps us to prioritize our work, by working with individual departments to have that data made available. If you didn't have the data and you clicked on that, you'd see a variety of data sets that have already been requested. So you'd look at that first to see if the data that you're looking for has already been requested, and if you see it, you could add your voice to those who have already requested that data.

Behind the scenes what we do is we work with this list and individual departments to find that data and to try to make that data available, and then we update here, on this page, when we've been able to make that data available.

So, for example, here, with the national household survey released in May 2013, now when you click on that it would take you to the actual data set. If you couldn't find the data that you're looking for, you could submit a new data set and it would become part of this list, and again, other individuals would be able to come in after you and add their thumbs-up or their support for getting that data set as well.

Now I'll go back to the home page to show you a couple of other features specific to open data. I'll start with the showcase. We use this area of the site to provide examples and illustrate the use and the utility of open data. We keep a whole section called open data in action, which provides information on specific projects within the Government of Canada, most of which are collaborative, working with other jurisdictions, that use open data specifically to inform a particular policy area.

The oil sands monitoring portal is a joint initiative between Environment Canada here within the federal government and the Alberta government. It specifically focuses on open data. Together the two jurisdictions make more open data available out to the academic world to support greater research.

Also available through the showcase is an apps gallery, which provides access to a comprehensive listing of the apps that have been made available and developed by the Government of Canada using open data. These apps are downloadable for mobile devices.

If I click on the left-hand side, for example, to find a specific app for my phone, I can click on “mobile” and see all of the apps that are currently made available by the Government of Canada for download into mobile phones. “Recalls and safety alerts”, for example, is an app using open data that can be downloaded. I can download the app straight from the site.

I'll give you just a couple more features of the site. About data.gc.ca, I want to point out that a variety of the information resources that are put on this site are for the open data layperson, designed to get them interested in open data and explain what can be done.

We talk about the licence, making it clear that the data that is available can be released and reused on an unrestricted basis. There's a section on frequently asked questions. There's also “Open Data 101”, a handbook on open data to get people who have yet to start really using open data off the ground with the basics of what open data is, how it can used, and how to work with open data.

At the other end of the scale, the site has a developers' corner, which is really for the open data user who has some experience already. These are potential developers, for the most part, people who are interested in building applications using federal data or federal data combined with data from either the private sector or other public sector jurisdictions.

Here we have a little bit more sophisticated information around working with data sets; how to use an application programming interface, a software tool that makes access to data that changes frequently within the federal government more easy to use if you're building an app that will want to access that data on an ongoing basis; information about our metadata element set; and then Open Data 101.

That really brings me to the end of the tour.

9:35 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

With that, before we embark on the third and concluding part of our presentation, are there any questions about the demo?

9:35 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Ms. Ablonczy, do you have any specific questions about this presentation?

9:35 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

Yes.

You mentioned on page 15 that since the launch of the open data portal in June of 2013, there have been 88,000 downloads.

I'm wondering if you triaged the kinds of downloads that have been made and if you can tell us what kind of data is in hottest demand.

9:35 a.m.

Senior Director, Information Management Decision, Chief Information Officer Branch, Treasury Board Secretariat

Stephen Walker

I should have shown this to you before, because there is a page that provides information on the current top 25 downloaded data sets. There's also information on how much traffic the site is getting on a month-to-month basis, the total number of departments that are contributing data sets, and how many data sets from each department, and so forth.

This page is updated on a monthly basis because it continues to change. Data sets that are very popular can move up and down that scale from month to month.

9:40 a.m.

Conservative

Diane Ablonczy Conservative Calgary Nose Hill, AB

Thank you.

9:40 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Ms. Day, you have the floor.

9:40 a.m.

NDP

Anne-Marie Day NDP Charlesbourg—Haute-Saint-Charles, QC

Thank you, Mr. Chair.

I have two supplementary questions. First of all, I checked the word “mappage“ and it does indeed exist in French. It refers to cartography, or mapping in English. I did not know this word, but I will try to use it in the future.

Do you have connections with the private sector, universities and research centres? Are links created with these groups in order to constantly improve the open data file?

Secondly, if an average person types a word into a search engine, such as Google or Bing, will this appear in their choices to obtain the information that they need? Obviously, if it is Statistics Canada, it is a given. If the words are “delinquency“, “situation“, “environment“ or any other word for which one might wish to obtain data, is the site well placed to give this information and can it be accessed easily?

9:40 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

As for links with the private sector, we do not have any formal links as such. We were happy to see the Open Data Institute, which is found in Canada, the United Kingdom, and the United States, mentioned in the last budget. This is a growing movement. It is a non-profit that essentially brings together the private and academic sectors as well as all stakeholders who are interested in working in a concerted way throughout Canada.

So we will have our Open Data Institute located in Canada. It will be created by this organization that will bring people together. We will work with them in cooperation with the provinces and municipalities. They also have links with the academic sector and with other sectors.

As for the private sector, we do not necessarily have direct links. These are more indirect. Private sector businesses support this movement. To that effect, there are also certain enterprises such as ESREA. It is a longstanding organization that promotes tools and software that allow the best possible use of scientific data. So this is really informal cooperation, but across a wide range of sectors to provide in the best way possible more open data for everyone's benefit.

As for your question about Google, Mr. Latour can answer it.

9:40 a.m.

Sylvain Latour Director, Open Government Secretariat , Treasury Board Secretariat

I can answer this part of the question.

Yes, we ensure that all information contained in the portal and everything that can be directly searched is also available for search engines like Bing and Google. That means it is possible to get exactly the same results from the Google web page by using the same keywords.

When we talk about mapping with other governments, the advantage is that it makes it easier for Google to find data from Canada and also from the United States and England when making a search. By using the same metadata, we are making it possible for these private search engines to search on a level playing field. This increases our ability to make information available to citizens.

9:40 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Thank you.

Mr. Trottier, you have the floor.

9:40 a.m.

Conservative

Bernard Trottier Conservative Etobicoke—Lakeshore, ON

Thank you, Mr. Chair.

I appreciate that all the data on the open data portal is provided free of charge. I was looking at the app gallery. Presumably all the applications there are also provided free of charge. Are any mechanisms whereby, say, a private application developer develops a great app, the government buys that application, and then in turn provides that to citizens free of charge?

9:40 a.m.

Senior Director, Information Management Decision, Chief Information Officer Branch, Treasury Board Secretariat

Stephen Walker

It hasn't happened yet, but I suppose it is possible. Our biggest wish is that the data made available by data.gc.ca will trigger new development external to government and that some of those tools, some of those new apps, will be useful to citizens. Most of the time, I think they'll want to distribute those themselves, but I certainly will be watching carefully, as we will be with the results of the CODE contest that just finished this week. We expect to be looking at over 100 new apps. I think one of the things that we're really interested in is whether or not any of those apps would be good for use by the federal government.

9:40 a.m.

Conservative

Bernard Trottier Conservative Etobicoke—Lakeshore, ON

If I can just add to that question, if there were a great private application developed so that there could be a transaction between a user of the app and the app developer, are you allowed to promote that application on the website? Really, you'd be promoting a private interest.

9:45 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

The apps on our website are free apps. Developers that create apps using our data and anybody else's data are free to decide to sell those apps and put those apps on the Apple store or any other app gallery out there and actually charge money for them, but we would not necessarily promote those or put them on our app gallery.

9:45 a.m.

Senior Director, Information Management Decision, Chief Information Officer Branch, Treasury Board Secretariat

Stephen Walker

No, not yet for sure.

9:45 a.m.

Conservative

Bernard Trottier Conservative Etobicoke—Lakeshore, ON

Thank you.

9:45 a.m.

NDP

The Chair NDP Pierre-Luc Dusseault

Mr. Martin, you have the floor.

9:45 a.m.

NDP

Pat Martin NDP Winnipeg Centre, MB

I appreciate all of this, and it's helping to give some definition as to what type of information will be put up for open.... If the default is to be openness, that's an important directive. Currently the default seems to be secrecy. It's like pulling teeth to get sensitive information out of the government through the access to information regime. Even though you say this is not set up to replace or to do the job of ATI, you mention in the opening page of your website, data.gc.ca, that this is really an extension of the spirit, if not the letter, of the Access to Information Act.

I'm still suspicious and I'm still interested, but you didn't answer my question as to who screens. Who ultimately gets to say what goes up and what stays down in terms of the portal? Is it the minister, is it the government of the day, or is there some overarching, independent authority, such as the Information Commissioner, who says that cutting the hair of Afghan detainees should be public information and should go up on the portal, and that you shouldn't have to wait a thousand days and go to court to find out whether or not you cut the hair of the Afghan detainees?

Who is your boss who says what goes up and what does not go up?

9:45 a.m.

Chief Information Officer of the Government of Canada, Treasury Board Secretariat

Corinne Charette

Well, there are two questions there.

First, in terms of what data sets are made available for publication, the first thing that's important is that what's on the open data portal is not classified information. Clearly it is not information that is classified secret or confidential. It is information that is public, so that the data sets themselves should be made freely available.

Now, within departments there is a wealth of data and of data sets that are not yet published. While we're very proud of the 200,000 data sets we have on the portal today, we believe there are a lot more.... That is the focus of the directive, to require that departments conduct an inventory of the data sets and then prioritize the data sets with our help. We work with them to identify high-value data sets and to help them make them available.

Ultimately, the departments have to be in a position to maintain and assure the integrity of the data and ensure that the data they are promoting—