Hello and thank you for inviting me and offering me the opportunity to address you.
I am going to give my presentation in French, but then I will be able to answer questions in English or French. In these five minutes, I am going to try to focus on the concepts of privacy, explainability and fairness in artificial intelligence.
First, there is an important element that does not seem to be addressed in the bill. When you are training a learning model, essentially, it will summarize the personal data that was used for training it. An assessment of the privacy-related factors will therefore have to be done, taking into account state of the art attacks. In my research community, for example, we try to show that using a learning model, or a "black box", like a neural network, training data can be reconstructed.
In addition, a challenge that we will have in the future, and that we have now, is that most learning models that people develop are improved using pre-trained models that were themselves trained using personal data that we do not necessarily know the origin of. I would therefore say that there are going to be very considerable challenges in this regard, in particular in the case of high-impact artificial intelligence systems.
We can also see that there are going to be difficulties regarding the creators and users of the models. For example, in the bill, section 39 of the Artificial Intelligence and Data Act says that people are responsible for the use of a learning model, but when we talk about foundation models, which are the basis of tools like ChatGPT, those models can be used for a lot of things. It is therefore difficult for the creator of a model to predict all the beneficial or harmful uses that could be made of it, and so, in practice, we have to distinguish between the person who created the model and the use made of it in a particular case.
Regarding explainability, which is the second important subject, apart from providing an explanation to someone about the reason for a prediction, they also have to be given a clear explanation of what data was collected, the final result, and the impact on the individuals. It is particularly necessary to be transparent in these regards and to provide a comprehensible explanation in the case of high-impact artificial intelligence systems so the person has remedies. Without a good explanation, essentially, they cannot question the decision made by the algorithm, because they do not understand it. In the case of high-impact systems that affect people, they should also have the ability to contact a human being, somewhere in the process, who has a solution that allows for the decision to be reviewed. This is a concept that is missing in the bill.
Overall, therefore, an impact analysis has to be done that takes into account not only privacy-related factors but also these ethical issues. I have not mentioned fairness, but that is also an important point. Apart from the law, another challenge we are going to encounter will be to adopt standards based on the fields of application, in order to define the correct fairness indicator and incorporate it into artificial intelligence systems, and the right form of explanation to offer. It will not be the same in the medical field as it is in banking, for example. The protection mechanisms to put in place in each context will have to be defined.
I would like to conclude my presentation by talking about the risk associated with fairwashing, an issue I have done some work on. Essentially, it requires concrete standards that define the fairness indicator to be used in a particular context, because there are many different definitions of fairness. Debates have already arisen between companies that built artificial intelligence systems and researchers regarding the fact that a system was discriminatory. The company said the right indicator had not been used. Without precise standards put in place by the stakeholders, therefore, companies could cheat and say that their model does not discriminate, when they have chosen a fairness indicator that works to their advantage. It is also very easy to come up with explanations that seem realistic but in no way reflect everything the "black box" does.
I would therefore say that the fairwashing issue could become apparent when the bill is put into effect. We have to think about ways of avoiding this and adopt concrete standards that will not necessarily be in the legislation, but will be defined afterward, to avoid the legal uncertainty surrounding fairness indicators and forms of explanation relating to privacy issues.
Finally, if I have 30 seconds left, I would first like to address one last point regarding privacy. The difference between the definition of anonymized data and the definition of de-identified data is always difficult for me, because, as a privacy researcher, I know there is no perfect method of anonymizing data.
The bill refers to anonymized data, an irreversible process, and de-identified data, a process that could be reversed someday. In fact, I think there really is no perfect method. Therefore, even when we are told that data is anonymized, in general, there are always risks that it will be possible to identify the person again by cross-referencing with other data or other systems. The difference between the definitions of these two terms could be clarified, or in any event should be clarified by providing additional explanations.
I hope I have not gone too far over my speaking time.