Examining the Inadequate Insights from the Majority of AI Benchmarks
![Gettyimages 176980461](https://techgroundnews.com/wp-content/uploads/sites/4/2024/03/GettyImages-176980461-768x511.jpg)
Here’s why most AI benchmarks tell us so littleOn Tuesday, startup Anthropic released a family of generative AI models that it claims achieve best-in-class performance.
The reason — or rather, the problem — lies with the benchmarks AI companies use to quantify a model’s strengths — and weaknesses.
“Many benchmarks used for evaluation are three-plus years old, from when AI systems were mostly just used for research and didn’t have many real users.
In addition, people use generative AI in many ways — they’re very creative.”It’s not that the most-used benchmarks are totally useless.
However, as generative AI models are increasingly positioned as mass market, “do-it-all” systems, old benchmarks are becoming less applicable.