Supervised

Supervised

Share this post

Supervised
Supervised
Model shelf life and the AI ouroboros
Copy link
Facebook
Email
Notes
More

Model shelf life and the AI ouroboros

If leaderboard turnover is any indicator, enterprises won't get "state of the art" open source models in production.

Matthew Lynley's avatar
Matthew Lynley
Aug 11, 2023
∙ Paid
3

Share this post

Supervised
Supervised
Model shelf life and the AI ouroboros
Copy link
Facebook
Email
Notes
More
Share
A woman in a lab coat staring at a colossal sign on a wall showing the rankings of team members,sign is 20 times her size, blade runner 2049 aesthetic — midjourney

Author’s note: Now that I’m about three months into the development of Supervised, I’m going to continue my work to make this a sustainable independent journalism publication—with the hope of one day becoming a newsroom, and not just a newsletter.

Beginning August 25, two of the three weekly issues of Supervised will be available for subscribers only, while readers will receive a short preview of the post. The “Still on my Radar” section will now only appear on Fridays, and it will be only available to paid subscribers.

Thank you to all of my readers, and please continue to send feedback and suggestions (and tips)! You can reach me by the email or Signal number at the bottom of every post.

In addition, due to some travel next week, Tuesday’s issue will be moved to Thursday. There also won’t be an issue on Wednesday due to a semi-overloaded schedule.


Turnover in open source “state of the art”

The performance of open source models is increasing at a rapid pace now that Llama 2 has hit commercial-ish availability. You can see it just from the sheer level of turnover on the Hugging Face open LLM leaderboard.

That speed of development opens up an interesting challenge for companies exploring deploying open source models. There are a lot of benefits to using open source models over APIs—including, more recently, concerns around the reliability and uptime of GPT-4. But the turnover in what’s considered “state of the art,” if we’re talking existing benchmarks and the Open LLM leaderboard, is incredibly high.

For larger enterprises with longer adoption cycles, by the time a model is actually in production (either internally or within a product) it’s probably already been beaten on the Hugging Face open LLM leaderboard by another open source model. Or, more likely, it’s been beaten several times over.

Essentially since the launch of Llama 2 we’ve seen leaderboard toppers that have a “shelf life” of less than a week, if we’re going off the initial commit dates for all the models evaluated on the Hugging Face Open LLM leaderboard and their scores from the leaderboard evaluation. And these models are all within less than a point of each other upon release.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Matthew Lynley
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More