See:
All models are wrong but some are useful by James Clear
The map is not the territory by Farnam Street
All evals are fake but some are useful by Anthropic’s Logan Graham
What is the fashion style for ChatGPT, o1, Claude, Qwen, or Figure? What does fashion signal about the underlying model? :)