The technology can be set up to crawl along, as you mentioned. Some technologies might be doing that already, crawling the Internet, trying to find whatever they can out there and learn from it.
Whether a company is going to take those documents, make copies, bring them in-house, and train on their own systems depends on the lawful acts of that company. A lot of them will probably refrain from doing that, given the uncertainties in the law right now.
Conceivably, from a technology perspective, that kind of system can be created.