It's actually not effective. It's not that it's too complex; it isn't necessarily going to be an effective way to evaluate whether a system operates appropriately for that use case.
What we do for customers is provide good, clear information about recommended use cases for the models we offer to customers. The only way to evaluate how a model performs appropriately for your use case is to test it.
In that testing process with your data, you're going to be able to test and evaluate if it's performing appropriately for you use case. Just throwing up a bunch of models is not going to be that effective in giving us that information.