OpenAI clears out some developer debt
Plus: a big move for text-to-speech startup ElevenLabs as OpenAI enters the race.
Author’s note: In a prior version of this post that was sent out to subscribers, I wrote that OpenAI’s text-to-speech API is priced at $0.015 per input character. It is priced starting at $0.015 per 1,000 input characters, with that earlier un-edited version of the issue going out erroneously. I felt this was a pretty serious misfire and am re-sending a new, corrected version of the story. Please disregard the prior version of the story!
Update: Due to the long weekend and still catching up on some reporting, Friday’s issue is going to be moved to Monday.
OpenAI has lately spent a lot of time focusing on its consumer products. But with its developer day this week, it’s starting to broadcast a continuing focus on its API tools—particularly trying to address its ongoing developer issues with GPT-4.
GPT-4 is getting some updates in the form of doubling its tokens-per-minute rate limit, one of the current Achilles heels for developers trying to deploy GPT-4 in production. It’s also dropping the prices for GPT-4, though it still remains in a considerably expensive tier, as well as expanding context windows for its models more broadly.
OpenAI also announced a series of relatively expected agent-based tools. It’s more consumer-y custom assistant (slash agent precursor‚, called GPTs, places it into a kind of marketplace position for developers. And it also launched an assistants API, an apparent chain-like tool that serves as a primitive for agent-based workflows.
It’s also expanding its available context windows more broadly, raising GPT 3.5-Turbo to around 16,000 tokens and providing a comically large 128,000 token window for GPT-4. The latter is kind a little curious given that research seems to suggest providing less, and more relevant, context rather than jamming an entire book into a prompt gets you the best results.
Again, a lot of predictable stuff to try to address developer concerns and increasing competition from a variety of different vectors. There were two points that really stood out from the developer day to me: the lack of a new embeddings model to update ada-002, and its new move into text-to-speech.
The former comes at a time when there is increasing competition for embeddings models from a lot of different angles. The latter, meanwhile, comes at a time when a rather substantial round seems to be happening in that exact space.
OpenAI’s embedding competition
OpenAI’s lack of an updated embeddings model is pretty understandable given the popularity of ada-002, its current text embedding model. But it does feel like there’s a lot of emerging competition when it comes to providing an API for embeddings—or, potentially, an endpoint inside of a VPC that would give enterprises the peace of mind that their data isn’t getting shipped out to an OpenAI.
Cohere last week announced it was launching a version 3 of its embeddings model at the same price point of OpenAI’s ada-002 (0.1 cents per one million tokens). That model scored better on the standard evaluation benchmark for embeddings—which for a considerable time now OpenAI has not led. But it provides a simple, potentially higher-performant API that could serve as a drop-in replacement for ada-002.
The other emerging entity here is the launch of Voyage AI, an embeddings-focused startup that includes the Stanford trio of Chris Ré (Together AI), Fei-Fei Li, and Christopher Manning as academic advisors. Voyage AI founder Tengyu Ma announced it more formally on LinkedIn last week.