Most large language models largely trained on the open web. That's true of Canadian researchers at the University of Toronto, commercial competitors or other academic models out there. It's difficult to know exactly what was in or what was out, but the training is very broad.
On April 20th, 2023. See this statement in context.