I think there's inherently an issue that models will literally never be able to handle, which is that when somebody comes along with a new way of doing something that's really excellent, the models will not recognize it. They only know how to recognize excellence when they can measure it somehow.
