The embedding model market is heating up
Plus: investors face an old foe in next-generation AI startups, and Dbt formalizes the semantic layer into something production-grade.
Before we get started, a note to all readers:
When I started, I set several goals for myself to ensure this would be a viable business going forward. I told everyone I talked to that I wanted this eventually to be a newsroom, and not just a newsletter.
For the past few weeks, I’ve been dealing with a back injury that has prevented me from working in full capacity. This has been a really unfortunate setback and of a scale that I didn’t anticipate—both in terms of how much work I can do, and the quality of the work.
It’s been a very harsh reality check on the challenges of building a sustainable independent operational in San Francisco. The Bay Area is a place that is punishingly expensive, but also absolutely critical to be in to effectively cover one of the fastest-moving industries in history.
So, to make up for lost time and try to ensure this remains viable, there will be some changes going forward. Some of these are ones I have been considering for several months now based on feedback from many places, but I felt this was as good a time as any to institute them:
Existing subscribers will receive a comped month for Supervised, which roughly aligns with the number of paid issues missed due to recovery.
All issues for Supervised through the end of the year will be paid issues, which will come out twice a week. I will re-assess when to implement a third issue in January, and whether it will be free or paid.
Subscription prices will increase to $17 per month, or $170 per year, for new subscribers only beginning next Friday. This is based on extensive conversations I’ve had with sources, colleagues, and friends over the past several months. This means you can subscribe between now and Friday to lock in a subscription for $10 per month, or $100 per year.
If you’re an organization interested in a group subscription, please reach out to me directly to work out the details.
I am evaluating sponsorship opportunities for Supervised issues. If you’re interested, reach out to me directly for more information.
I’m also evaluating paid consultation and speaking engagements. If you’re an organization interested in that, please reach out to me directly.
All of this, again, is subject to change as needed and based on feedback—but I figured I should do it all at once rather than roll these out incrementally. In addition, I’ll be taking next week off completely from publishing to try to accelerate recovery and catch up on reporting, and will be publishing one issue the following week. This is all baked into the full comped month for existing subscribers.
I deeply appreciate all of the support everyone has given, as well as everyone’s patience, as I try to work through all this.
Now, on to the actual issue!
Embedding models come to the forefront
Earlier this week, Hugging Face CTO Julien Chaumond surfaced up a new package from Hugging Face: text-embeddings-inference.
Embeddings play a critical role in AI, and stand to become even more important thanks to the increased popularity of retrieval augmented generation (or RAG). RAG has emerged as a mechanism to provide both a layer of governance on the output of data in a prompt by tracing information back to the source, as well as helping to combat hallucinations from model prompts.
That requires companies to embed information from unstructured data (like documents) in a vector database in some searchable format that can then get pushed into a prompt. That embedded information provides examples of how to do something or information to retrieve to use in the final prompt. And it’s been kind of an under-appreciated potential business given all the focus has largely been on inference APIs like GPT-4 and reasoning tools like LangChain.
But there’s growing momentum here. And if RAG ends up being one of the necessary implementations in model inference, every company will probably end up using an embedding tool in some fashion. The new TEI shows just another push by the company that owns the most popular open source AI packages to try to formalize this into more of a first-class citizen.
In fact, one of the openings that many sources I’ve talked to over the past months is whether a startup would (or could) actually snap up a market for production-grade embeddings in a similar fashion to how some of the model inference startups like Replicate or Together have grown in popularity. (Replicate for example offers some embedding APIs with a price-per-second for GPUs approach.)
The most popular embedding model is OpenAI’s ada-002 embedding model. It’s priced at $0.0001 per 1,000 tokens, which follows a drastic 75% price cut that happened in June earlier this year. (It was kind of funny timing given I had just had a conversation with someone who commented that they loved ada, but wished it weren’t so expensive, a few days before this.)