The Observability Game
No, not just about board observers—we're talking about the software. Plus, OpenAI's colossal pending headache in the enterprise.
Today we’ll be talking about observability—as in, the software, not the directive. But first, briefly covering the obvious with the latest developments in the OpenAI saga.
OpenAI, even if Sam Altman returns, has created a whole new class of headaches
There’s now an “agreement in principle” for Sam Altman to return to OpenAI. And to be clear, “in principle” does not mean definitive, and the whole situation changes on a daily basis. But we’re also starting to get the drip feed of what was happening on the periphery of the decision to oust Altman.
Altman, if he returns to OpenAI, obviously restores a lot of stability across the board to the company. The 95+% of OpenAI’s employees rallying around Altman get what they wanted, Microsoft protects its investment, and one of the most popular AI tools continues to live on and develop in its current form.
The instability in the company, however, has done it zero favors if it hopes to grow its API business into larger enterprises. The biggest hangup among large potential corporate customers—in addition to reliability and cost—is that its GPT-series models is still accessed via API. And the chance that companies that keeps its data inside a VPC would ship it out to OpenAI after this debacle is even lower than it was prior to Altman’s ouster.
Here’s the thing: most companies and executives I speak with expected OpenAI to crack this problem in some form or another. Maybe it was a shard of GPT-4 available behind a firewall, or some technical achievement that helped them successfully woo those companies hesitant around using an API. For the most part it was just a waiting game until the right product arrived and someone had pressure-tested it.
Getting that working at a technical level is one problem. Getting an organization to trust you with their data in the first place, regardless of it lives, is a completely different problem. That problem revolves around sales and business development, and is typically predicated on the idea that a company is competent and will continue to operate in a similar manner for the foreseeable future. No surprises.
OpenAI did have a peculiar governance structure that made it uniquely vulnerable to this exact scenario—which is something OpenAI and Microsoft could throw up as a disclaimer. But it was such a fringe possibility that when it did happen, it shocked the entire industry. The Information also reported that more than 100 companies contacted Anthropic over the weekend as the OpenAI saga unfolded. And we also got a very well-timed fundraising extension announcement from AI21 Labs.
The appeal of Snowflake and Databricks in this situation is that they’ve spent quarters, if not years, winning over customers that have concerns around privacy. With that data on their platforms, they can offer a language model service on top of that. Snowflake and Databricks could a customization through fine-tune or LoRA, or in the case of Databricks, a flat-out pre-train with MosaicML. (The latter says you can get to GPT-3 quality for around $450,000, though the whole point of emerging techniques is to get it even smaller than that.)
When looking at custom OpenAI models, the company says to expect a lead time of several months with a minimum spend in the millions of dollars. OpenAI may have a temporary advantage from a technical expertise—and maybe RLHF data—from its development of GPT-4. The same company interested in a custom model from OpenAI is probably also evaluating a pre-train from scratch using something like Databricks’ MosaicML.
And then there are companies like Anyscale, Replicate, and Together AI, which are inching their way closer to offering a turnkey option for inferencing a fine-tuned or pre-trained model within a company’s walls. Again, for now this problem isn’t true serverless as a company is going to have to manage its inferencing setup—but most companies I talk to expect it to get there at some point. (All this also leaves aside whatever Hugging Face ends up doing in this space.)
Increasingly the competition in the enterprise for OpenAI feels to be coalescing into companies that already have those relationships—and the data to go with it—in the form of Databricks and Snowflake (and to a certain extent MongoDB). It also includes companies trying to rethink the serving and inferencing problem from scratch.
The damage here to its potential enterprise motions is going to be pretty long-lasting, even if a new governance structure all but guarantees something similar won’t happen again. But that’s always the cost of working with a startup and building on top of an API.
An open field in observability
If you buy into a future where modern workflows are powered by a network of semi-autonomous AI bots working in concert (as in agent networks), there inevitably needs to be monitoring layer on top of all that.
This is actually a pretty classic cloud developer operations (and machine learning operations, to a certain extent) problem that has its own name: observability. But one of the peculiar parts of “observability” in AI in the language model sense that’s emerging is it looks quite different from classic observability.
The standard metrics that you’d monitor with observability (like product uptime or model drift) still apply, but it’s inherently difficult to gauge what “good” is for a language model. And from companies and investors I talk to, it’s proving to be a somewhat weird problem emerging in AI: there are already companies serving observability, but who—and what—handles the language models?