Part of the reason they're becoming more biased is that, typically, these models are being fed increasingly large numbers of data. For certain models, it is advantageous to have a lot of data. The more data you give the model, the more likely it is to get some data that is not ideal.
We saw this in the report with this model clip, which is a multimodal linguistic model. This model was asked to assign the probability of an American astronaut, Eileen Collins, being.... The model was asked, “What is this image of?” The model assigned a higher probability that this photograph was of a smiling housewife in an orange jumpsuit with the American flag than that it was of an astronaut with the American flag.
That's not our finding. That's a finding from a paper of Birhane et al., 2021. It's illustrative of the fact that when you give these data a lot of models, which might be required for higher performance, they might catch some conspiratorial and biased data. If we're not filtering that data proactively, it could be very likely that these models behave in toxic and problematic ways.