Batch processing and the rise of CPUs
Plus: the death of pre-training seems greatly exaggerated.
Both issues for the week are coming together in this one! First we’ll talk about how enterprises are actually using language models in prod, and then how pre-training still seems alive and well.
When ChatGPT launched, CEOs at pretty much every company you talk to gave a top-down mandate to implement AI into their products.
And to be clear, at the time, no one knew what that looked like. But the extreme hype around language models—along with the fact that they, in a lot of ways, worked out of the box—made them impossible to ignore. The fever dreams of replacing entire work functions were pervasive. And the common refrain was along the lines of, “if not me, then one of my competitors will do it.”
That hype cycle has ended, thankfully, and language models specifically have started to find a home within larger companies. But the reality among enterprises and platform providers I talk to, though, is that the use cases aren’t as epically transformative as we might have originally predicted.
Actually, they’re (at face value) pretty straightforward. Instead of creating complex reasoning tasks and full-on automation that replaces entire teams, many enterprises I talk to are either using language models, or plan to use language models, for batch data processing.
That can include summarization, classification, entity extraction, or even cleaning up data. And it shouldn’t exactly come as a shock—that’s what companies were doing with BERT, one of the earliest open source language models, in a lot of use cases. The emergence of the Llama-series models and Mistral’s models have just jacked up the quality of those results and reduced the barrier to entry to the point that they can be more efficient and (a little more) trustworthy.
“All of our customers have this dream of the full-on automation, but the steps to get there feel a little more incremental than a massive shift than all of our chats and emails and SMS handled by something that’s perhaps a black box,” Ben Gleitzman, CTO and co-founder of Replicant, a call center automation tool, told me. Replicant is backed by Salesforce Ventures, Norwest, Atomic, and others.
The “batch” part here refers to analyzing large amounts of data on a non-urgent timeline. The tasks here are usually relatively simple and straightforward and not in the kind of “write an email for me in the style of X” fashion that require the quality level of, say, a GPT- or Claude-series model. Instead, you’re more or less on a fact-finding mission within troves of proprietary company data.
While you could do it through one of OpenAI’s APIs, open source models like Mistral’s smaller 7B model, are actually well-suited to the problem—particularly for enterprises—because they’re efficient, cheap, and in many cases are already plugged into data abstraction layers. Databricks and Snowflake have both announced access to Mistral’s open source models through endpoints, and it’s an easy jump to a fine-tune because the data’s already there.
“We see a lot of batch processing on Snowflake, from summarization and the sentiment analysis of support tickets, to batch data extraction from SEC filings or the competitive analysis of lost deals,” Baris Gultekin, Head of AI at Snowflake, told me. “Large language models increase the productivity of analysts substantially, enabling them to quickly extract insights from text by using something as simple as the English language, all in a cost-effective way across millions of rows of data.”
More importantly, these smaller models don’t necessarily need the extreme power of a cluster of H100 GPUs. It’s even to the point that Apple’s Anwi Hannun, who is behind Apple’s excellent MLX Mac development and inference framework, and Llama.cpp creator Georgi Gerganov are competing over who can get the highest throughput for Mistral models on a Mac.
As these models continue to improve, and can run on low-power hardware, they’re becoming a very attractive option for companies looking to remove a lot of “eye tests” in workflows that might be offloaded to support centers. And while we wait for a “killer” use case for language models (if one ever emerges), it turns out that some enterprises have already found a reason to implement it—even if it’s a bit of a snoozer.
Where the advantage of smaller models come in
Anyone who’s worked in sales, research, marketing, or customer service has done their time sitting in on dozens of calls listening to what customers or prospects are saying and asking for, independent of what the data shows. Jeff Bezos gave a particularly good summary of why on an extensive interview with Lex Friedman:
When the data and anecdotes disagree, the anecdotes are usually right. It doesn’t mean you just slavishly slavishly follow the anecdotes, it means you go examine the data. It’s usually not that the data is being mis-collected it’s usually that you’re not measuring the right thing.
Now, there are a lot of ways to slice that statement. But one part of it is that understanding what customers need or are looking for comes down to a human with empathy and experience eyeballing something, and getting some insight that can’t be mechanically extracted with traditional machine learning. Or, at least, something you can’t extract at a level that’s fully trustworthy and error-proof.
While early, modern language models offer a potential chance to capture some of that so-called “anecdata” by trying to understand semantically what information is coming in. Many companies that would benefit from using it for batch processing have spent years collecting information from, for example, customer support resolutions. And the results are usually pretty binary: did the customer resolve their process and have a good experience, or not.
From there, you could extract flags that suggest what direction to send a ticket, or what the largest pain points customers have had recently. We’ve been trying for years to automate this with machine learning, but the advent of language models, even smaller ones like Mistral 7B, make the process much more approachable due to their flexibility and ease of implementation. And those language models are able to deliver real results in cases where you would want to eyeball data.