Thank you for inviting me. My apologies: there were some pictures to go with this, which you will eventually get.
The driving of the last spike in 1885 was the culmination of a nearly two-decade effort to bring certainty to a nation. It forged a national identity in steel and steam, in iron and timber, and 150 years later, Canada, a prosperous nation with strong linkages to the south and opening markets in Asia and Europe, seeks certainty in the development and delivery of energy data.
Why is data important? I guess you've had a lot of presentations on why it's important. I'm sure that in your daily life you do know.
For the next 10 minutes, I'll outline why an agency that can provide timely, reliable, and transparent energy data is necessary. I'll discuss the necessary elements of data management, acquisition, and sharing, define leadership gaps in transitioning to data-driven decision-making, and the steps to greater energy certainty, not only as a national policy but as a national imperative.
In preparing for this presentation, I came across a May 2017 Economist briefing that drew a striking analogy between data and oil. “Data”, the authors propose, “are to this century what oil was to the last...[the] driver of growth and change.” They continued:
The new economy is...about analysing rapid real-time flows of often unstructured data...photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine.
The statement resonates with me as someone with a lifetime of experiencing work in and around the mining and energy industries, first as a researcher-scientist and oil production worker, later as a communications professional and atomic radiation worker, and, for the last 16 years, as an investigator of railway and pipeline accidents.
Why is data important? Or, more importantly, why is a national data agency necessary?
As the article hints at and a colleague of mine recently told me, the fundamental problem is that we're not getting snapshots of information about energy fast enough to make informed decisions about things such as energy planning and environmental impacts and such. It might be easier to ask, what are the costs of not having a national energy data agency in Canada? I think we're all living that.
If you had my pictures right now, you'd see a picture of the Mackenzie Valley, where the pipeline was first proposed as a joint venture partnership. A new effort came forward 27 years after that, which finally I think overcame the barriers related to aboriginal and industrial co-operation. As most of us probably also know, in December 2017, after a six-year-long process to reach approvals, the partners walked away—again.
We're losing opportunities, whether these are the right opportunities or not. I think the persons responsible for that.... I think there was a comment to this effect: “I don't know what the problems are, but a process that should take two years in a business cycle can't take six”. The fundamental economics change so quickly that they can't do what they'd like to do.
The story also goes towards September 15 of 2008—in my story—where there was a meeting convened in New York of the leaders in the financial world to discuss the bankruptcy of Lehman Brothers. When they asked these leaders of business what their exposure to Lehman Brothers was, nobody knew. Once again, it was a great crisis in data.
What you need to know is that to manage, one needs to measure. To measure, you need to audit. To audit, you need data. In today's day and age, to know what's happening and what is going to happen, you need to know what is happening now. You need real-time data. The EDM Council, the Enterprise Data Management Council, which grew out of efforts to teach the finance industry how to do this, described a holy trinity of data management.
You need unique and precise identification of things. You need unified views of meaning across organizations, locations, linkages, and interconnections, and the procedure is actually the reverse of what you might think. You've got to start with what your business practices are, what you are trying to do, and work back, reverse-engineer, to what you need to do that, what the critical data elements are that are necessary to do these processes. Then once you've identified those critical data elements, you need to clearly and uniquely identify them, the taxonomy and the ontology, so you actually can work with them and everybody's working with the same understanding.
You need to establish a unified view across organizations.
I'd like to just note that many of us are dealing with something called Phoenix. Most people see Phoenix as a data processing problem, and it is in fact not a data processing problem. It's a data meaning problem. They started with the process, and then tried to make.... I guess the analogy might be that they started with the person and tried to make the pants fit the person.
We are now in a situation where we don't have the data we need, but we also don't have a common understanding. Further with the focus on modern data storage, large storage models, we all know the concept of the data lake. Really a lot of the discussions I heard when I was first sitting here, half an hour ago, were about structured data, and that's the least of our problems. In fact, our biggest problem may be that we're waiting until this data is structured before we actually take it and use it and apply it. It's too late then. It has already got the data tax applied, and the data tax is one of the things that you guys are discussing. You can't get the data fast enough to make meaningful decisions. You lose the opportunity whilst you're waiting for something to happen.
There's another thing that we're lacking. I spent the last two years in a program at Columbia University, a masters in applied analytics, and I went there because this was a program that was focusing on the leadership skills you need to do data processes. I was one of the people who worked on Lac-Mégantic, one of the investigators, and when we came through that process, there were 18 causal factors to the accident. As a person standing back, an individual, as I sit before you, what I saw was 18 opportunities to intervene that were missed, and when you looked at it, there was more than enough data. There was tons of data, but there wasn't data being prepared and provided in a timely manner and analyzed in such a way that you could take action to prevent the catastrophe.
Interestingly enough, the data that is currently out there in the public venue is the data mostly from media, and media gets involved when it's newsworthy. So the focus in the resource sector is typically on the low-frequency catastrophic events, which have horrific results. We're failing on two levels, one because we're not getting the data to prevent them or mitigate them, and on the other side because we're not sharing the information of what is happening.
One of the things that I've been trying to push forward is that we can't just say.... This institute is important, but you can't just think of it in isolation. Data cannot function in isolation. We have lots of silos in government where we've collected wonderful information, and I sometimes call it hoarding. I come from a family of great hoarders, so I understand a little bit about it. The reality is that we collect an enormous amount of data—and this is not unique to government. Data is dirty, it's messy, it's hard to work with, it's frustrating, it's inconsistent, and we don't want to give it out until we know we've got it right, and that's not how it works in today's day and age.
We need to get that data out of its silos. We need to get it into a process where we can actually access and use it, and the data lake and the modern data principles don't care what format it's in, as long as we've identified what it is and we know where to get it, and we'll process it when we use it, right? I call that schema on read.
What the program at Columbia was designed to do was to create a—