This issue was finished on a mobile device so I apologize ahead of time for any typos and formatting errors.
OpenAI’s sprawling portfolio problem
While OpenAI’s latest model, o1, is clearly a massive improvement in performance it’s also creating a potentially new challenge: product sprawl.
Its latest products, o1 and o1-mini, essentially gives users and customers a tradeoff: you can wait longer and, for now, spend more, but the output should be much better. It’s not the kind of API you’d plug into a call center, but it does fill kind of a new niche for a company that already covers a whole range of niches beyond just a “one thing that does almost everything well.”
OpenAI’s o1 use case is basically the “you have to think about it for a second” problem. Box CEO Aaron Levie gives an extremely good enterprise example of having to find a very specific parameter of a contract—in this case, the date of the final signature on a contract, and thus the date it effectively went live. This is a really crisp “you had to think about it for a second” problem: there was an added layer of complexity to parse to an otherwise straightforward, in this case the possibility that people signed the contract on different days.
That’s also a kind of newish use case for OpenAI, where historically it might have been a long set of API calls and begging the models to do what they need to do in the form of prompt tuning or retrieval augmented generation (or RAG). Instead, that whole process could—hypothetically—get compressed into a single call or two and simplify some architectures.
But at this point OpenAI has been undergoing a kind of “SaaS-ification” where it matures into a real business. A number of executives and co-founders have continued to make their way out the door, like co-founder and chief scientist Ilya Sutskevar leaving to found a new safety-focused AI startup that raised $1 billion and co-founder John Schulman leaving to join Anthropic. As of this week, that list also includes CTO Mira Murati.
As OpenAI matures into a real business, it’s running head-first into the challenge of avoiding product creep at a time when its product portfolio (or in this case, model portfolio) is starting to get more and more complex. Larger product portfolios are inherently more difficult to manage, and the biggest challenge companies at this phase face is keeping it from getting unwieldy and difficult to communicate value to customers. And with the quality of models constantly improving, any small number of distractions could lead to a usurping by one of the other frontier modeling companies.
So, just quickly, let’s recap OpenAI’s now very large portfolio of APIs:
GPT-4o: a multi-modal, expensive (though not immune to price cuts) model that’s supposed to be Good and General Purpose.
GPT-4o mini: a less powerful version of GPT-4o that’s designed to be a successor to its workhorse GPT-3.5 Turbo model to satisfy a large number of simpler use cases.
Fine-tuned versions of the above to provide enterprise-specific needs, though they are fed through an API which might turn off some more security-conscious companies.
Batch versions of the above with a 24 hour completion window, for a 50% price discount.
o1: a model that trades speed and price for quality by allowing it additional time to “reason” about an answer. Basically, that “stop and think about it” question.
o1-mini: like 4o mini, a smaller version of o1 designed for… the same “stop and think about it” question. But we’ll just run under the assumption there’s a set of problems this is really good for.
Whisper: arguably the best speech-to-text model on the market that was most assuredly built for generating training data that OpenAI needs.
Text embeddings: the not-quite-the-best embeddings product whose advantage generally seems to be that it’s provisioned with other OpenAI products that reduces procurement’s headaches.
Text-to-speech: an API that you could, hypothetically, slot into something like a call center assuming the latency works. It has both a normal and HD version of the API.
Advanced Voice assistant: a technical marvel of a product that lets you have an active conversation within ChatGPT where the obvious killer “use case” still isn’t super clear.
ChatGPT: OpenAI’s “productized” version of all of the above in one frontend enterprise-friendly wrapper.
Amid all of this, Meta has started to show additional chaos by releasing updated models in almost every single one of those categories—except they’re open-ish source and power a very different array of products even though they exist in the same “bucket.” With o1, OpenAI is essentially taking another shot at category creation, but also risking a ballooning portfolio.
Here’s how all the pricing ends up playing out, and including a handful of competitors as a point of reference:
And these costs are probably nowhere close to where things will land after o1 has been out for a while. OpenAI has already started incrementally increasing rate limits, with o1-preview recently going up from 30 per week to 50 and o1-mini going from 50 per week to 50 per day.
While OpenAI’s appeal has always been in some sweet spot between convenience, price, and performance, its ballooning portfolio certainly poses as much of a challenge as an opportunity. I seriously doubt that the prices will remain this high, as its next GPT model will be ready at some point. But for now it at least gets something into the hands of developers that has an aggressive price tag at a time when it’s trying to raise a colossal round of funding.
The challenge here, though, is the same that any maturing company starts to face over time: product creep. While OpenAI technically has “two” products in the form of its APIs and ChatGPT, those products all have a ton of branches that serve a very wide number of use cases. The APIs also go well beyond just chat completion and text generation, and include a whole variety of modalities. And its voice product is probably the most awkward part of its portfolio, seemingly a “wow” part of ChatGPT.
And product creep is a Known Problem in startup-land, if you can even call OpenAI a startup any more. As a startup matures, and it’s user and customer base grows, it has to directionally develop—anticipating the handful of use cases that satisfy the most customers without building everything for everyone. Or, more succinctly, do a few things but do them well.
Offering this wide array of use cases gives OpenAI the ability to funnel users to some kind of steady state where it isn’t necessarily making money, but at least it isn’t losing money. On the API front that’s been traditionally pushing users to its workhorse models (particularly fine-tuned versions), but that’s a little less clear with ChatGPT.
ChatGPT and finding efficiencies in inference
While the new models are rate-limited in ChatGPT, that is also an extremely important part of its business beyond the APIs. OpenAI COO Brad Lightcap told staff that OpenAI has more than 10 million paying subscribers for ChatGPT and an additional 1 million subscribers for businesses, according to The Information. (Bloomberg earlier reported that OpenAI has 1 million paid business users for ChatGPT.)
We can of course do a lot of napkin math, but would arrive at the same conclusion: ChatGPT’s enterprise business is, and will be, a massive part of OpenAI’s business beyond the APIs.
But it’s not like the revenue from ChatGPT will just scale up with usage of o1 directly like it would with the API. The cost consideration for actually running it, either via API or within ChatGPT, is probably going to change as pre-training compute resources shift toward inference.
Fortunately for OpenAI, this is an area where there’s already a lot happening. One approach that’s getting a ton of extra attention right now among those in the industry I talk to is Monte Carlo Tree Search as a way to narrow the amount of compute to generate a high-quality result. And distillation, another way of “shrinking” the larger models, is also gaining a lot of momentum as interest shifts to managing inference costs of high-performance models.
“That’s the sweet spot, blending strategies from traditional predictive machine learning with years of work that went in there with modern techniques,” Sri Ambati, co-founder and CEO of model developer and platform h2o.ai, told me. “Tree search is an absolute a genius trick, it’s very easy low-hanging fruit that’s combined with the brilliance of LLMs.”
The other bit that comes up most often in conversations with experts and sources is Noam Brown’s work at OpenAI. Brown is widely considered a premier expert in game theory and many have wondered how his work would be applied to OpenAI’s products. Brown was also a co-author of a paper that in part examined applications of Monte Carlo Tree Search in developing human-like agents.
The challenge OpenAI has for its APIs is that, based on most enterprises and platforms I talk to lately, costs are either number 1 or 2 on the list of considerations for building an AI app for production use cases. But OpenAI has certainly shown a willingness to drop its prices over time to remain competitive.
And it’s also telling that OpenAI’s ChatGPT enterprise business seems to be the significant driver for its business, making all this development essentially in service of that enterprise kit. The alternative—building something bespoke internally by using really cheap stuff off the shelf—is increasingly compelling for companies with way more advanced governance and cost requirements.
The cynical take here is that OpenAI is trying to say something along the likes of “hey, see, we’re still building extremely advanced stuff, don’t ignore our fundraising calls.”
The flip side of that argument, though, is OpenAI—with what appears to be a massive funding round that’s oversubscribed, according to CNBC—has already matured into a company better suited to enterprise products. And now OpenAI just has to make sure it isn’t confusing enterprises on sales calls with a colossal portfolio of models.
An opening in Datadog’s armor
Datadog has enjoyed an incredibly enviable position of power and a relatively pristine track record of launches that cause more than a few companies adjacent to it to be nervous every time something comes out.
Startups, investors, and developers in the rapidly-burgeoning AI space had long expected Datadog to storm into building tools for evaluating the performance and success of LLMs this year. It was already a considerably difficult space that has been trying to move beyond a “vibes” based approach, but if there were a large company best positioned to do it, it would have likely been Datadog.
But Datadog’s annual conference came and went in June, and what came instead for those startups (and the investors in them) was a bit of a sigh of relief.