Thank you for the invitation and for this opportunity to speak about the artificial intelligence bill and its application to data.
I am not going to reiterate some of the things that Mr. Gambs discussed earlier. However, I would like to come back to certain things in the bill that are not entirely clear, in my opinion, or that should be clarified, particularly when we are talking about biased output. This is one of the things that caught my attention: what is a biased output and how is a biased output arrived at?
Artificial intelligence will never give 100% true output. It is always based on learning, and that learning is what determines that it gives a recommendation or decision, or that it generates new information, new data.
If a person is the subject of biased output, is that the responsibility of the business or organization that created the bias? Is a bias normal? A machine learning system might have a certain degree of success, 90% or 97%, for example. Artificial intelligence will never be 100% true, today. What caught my attention is really the definition of biased output.
I want to draw attention to the learning and the data. Learning takes place using data, but the business has the complete ability to fragment the data among various organizational structures. A piece of data, of information, can even be transformed. The bill raises the fact that there would have to be information about how data is managed and how it is anonymized.
There is also anonymous or de-identified data, as was mentioned. But how can we make sure that the business has not fragmented that data in such a way that it could retrace it? That information cannot be fund in an audit. This is a very important factor to consider in terms of enforceability. I can present you with an entire manual that shows that I have properly anonymized my data and how I manage it, but you cannot be certain that what I used for the learning was that anonymized data. Even if we can go back to find out a bit about the data that was used, as Mr. Gambs said, that is always going to be a difficult and relatively complex job to do.
The last point I would like to address is when we talk about a high-impact system, as you define it. We can say that it is the loss of confidentiality, integrity or availability of data that may have serious or catastrophic consequences for certain individuals or certain entities. If the business defines its system as having a 97% success rate, that means it will always have a 3% failure rate.
So does the case you are looking at fall into that 3%? How can we determine that we are in one of those situations, where a prejudice or bias against a very specific person is created, in spite of the fact that the learning was done correctly?
There are therefore a number of challenges relating to the data that you use: how do you make sure that it is anonymous, that it has not been fragmented or modified? The business will have the complete ability to retrace the data, but an auditor who wanted to do the same thing would find the job very complicated and very complex.
Even if things are done very properly, what is a bias and what is a biased output? How do we make sure that biased output, which does not work and which harms an individual, does not fall within the 3% failure rate in the learning?
Thank you. I am available to answer your questions, in English and French.