December in AI: What's old is new, and what I've learned covering AI in 2023
Plus: we're back to arguing over how much data is worth, but this time Content is involved.
While it hasn’t quite been a full year yet since the launch of Supervised, the first year in the new era of modern AI is now officially in the books.
Last year started off with a flurry of activity, with the birth of open source model methodology and the launch of GPT-4. As we settle into 2024, we’re going to start seeing AI move from the kind of experimental, fun toy phase into one where real companies are built on top of real tools. Last year was about laying the groundwork and infrastructure for what will be built and launched this year, and it will definitely be a weird year—perhaps even more so than 2023.
In addition to a recap of what happened in December, I thought I would also share what it’s been like covering the space as it’s emerged. Or, more specifically, some things I’ve learned from what I’ve covered and what kind of response I’ve gotten from readers.
My goal with Supervised was—and still is—to try to understand everything just a step deeper than what you typically might read elsewhere. And while I’m a user of a lot of these tools, I ‘m obviously not in the PhD-slash-expert category in the space. As a result I’ve tried to keep everything focused on a broader industry perspective, serving companies and developers scrambling to get this stuff into production in the first place.
So, let’s get to it! For the December recap, here’s what came up most often and the biggest themes from last month:
Governance tooling on the rise: As we start moving into that “prod” phase of AI, increasingly companies I talk to are taking a much closer look at a lot of traditional data paradigms—again, and again, and again. That’s ranged from observability, to lineage, to the data pipelines themselves. But this broadly falls under the umbrella of governance, which I think will be one of the biggest themes of 2024.
A tale of two launches: December saw the announcement of Google’s long-awaited Gemini model, of which only its middle “Pro” tier is actually available. And the launch itself was quickly panned as a marketing stunt—particularly in its benchmark comparison. Then Mistral casually dropped a link to one of the biggest advances in open source AI model development.
The data sets take focus: The New York Times is going after OpenAI for using its data in training its GPT-series models. We’re about to enter one of the weirder phases of the AI Discourse over what data is, and is not, fair game for using in models—but it also opens up a whole set of other questions about what data itself is worth in the first place.
Like I said, this industry gets weirder every month, and everything old is new again. Let’s go ahead and start with what’s on the horizon—the growing prominence of data governance.
What governance offers as a path forward for AI
For practically the entire year, the whole idea of actually getting AI into enterprises seemed very far off in the distance. That, though, is changing amongst companies I talk to—many who are actually close to getting model tooling into production, if they can check a few final boxes.
Unsurprisingly, those check boxes are pretty classic data engineering problems that we already ran into with the emergence of the modern data stack in the early 2020s. More specifically, they’re problems concerning data governance, which is essentially a catch-all term for “what is my data doing, where.”
The two that come up most often lately are lineage and observability. The former is understanding how a model behaves in action and how to evaluate its performance. The latter is just understanding what data is going into it (training or inference) and what’s coming out.
The abstraction layers like Databricks and Snowflake are the main candidates to stitch together the required governance tooling required by those larger enterprises. And they come with the benefit that they already host a company’s data, having spent years wooing contracts from the companies with stricter governance requirements. And some startups looking to go platform-level (like a Clarifai, for example) could try to center their product around governance tooling for AI.
That’s essentially put Databricks, Snowflake, and others in a position where they can offer a viable path forward for even the holdout companies, assuming the tooling is in place to satisfy all those data governance requirements. Datadog, meanwhile, has long owned the observability space, and while the general observability elements of AI—which do differ slightly from classic observability—feel up for grabs, there hasn’t been an obvious non-platform candidate to take it just yet. (In fact, amongst people I talk to, there hasn’t even been an obvious candidate to go directly after Datadog.)
On top of that, there’s been a proliferation of competitive and cheap model endpoints well beyond OpenAI at this point, particularly with the launch of Mistral’s new model, and the cost of GPUs (particularly A100s) continues to drop.
What seemed very far off in the distance—actually getting AI in production, because it had to be leashed behind a VPC—increasingly seems more viable. But that pathway increasingly looks like it’s going to go through an extensive set of governance tooling, and it’s still not clear who exactly will be swooping in to fill those needs.
Speaking of Mistral…
It’s Mistral’s world, and we’re all living in it
Google made the most Google-y unveiling possible for its latest AI model family, Gemini—complete with marketing promotional materials and all. But the demo video was edited to make it look better. And Google’s performance benchmark comparison wasn’t a direct comparison to its rival GPT-4—instead, it compared a modified version of a score OpenAI provided to… the score OpenAI provided.
Anyway, as all this was happening, Mistral tweeted out a link to a new open source model—a mixture of experts, like GPT-4. More importantly, it was competitive with GPT-3.5 Turbo—and companies were serving the model at a rate as low as half the price of GPT-3.5 Turbo. The release of Mixtral made it possible for startups like Anyscale, Perplexity, and Together AI become direct threats to one of OpenAI’s core businesses practically overnight.
Yes, the Llama-series models were available as endpoints, but running Llama 2 70B was roughly the same price as OpenAI. Together AI was already pushing the envelope with Llama 70B at $0.90 per million tokens, but serving Mixtral has provided those startups a way to provide a viable and performant alternative to GPT-3.5 Turbo.