Model evaluation and the human on the other side
For months some developers have been trying to one-up each other using performance evaluation metrics. But some companies may look to use a softer touch.
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd676e885-1962-458a-9170-4c84c0d1c0cd_1024x1024.png)
Each new open source model released these days isn’t complete without a suite of scores to evaluate its performance.
But as time goes on and companies begin to find ways to implement language models (APIs or otherwise), there’s a growing recognition amo…