Let's check back in on the vector databases
Long context windows provide an alternative to RAG, but many companies are still seeing an appetite.
Author’s note: this is a longer-than-usual issue so this will be the only issue for this week while I catch up on reporting for next week’s issues. See everyone Monday!
In the past few months it feels like two schools of thought have emerged in the online discourse: gazillion-token context windows will fix everything and make language models more accurate and efficient; and retrieval augmented generation (or RAG) will fix everything and make language models more accurate and efficient.
There are merits on both sides, and the reality as usual is probably somewhere in the middle. But the RAG case—the path of least resistance for most enterprises, for a bunch of reasons we’ll get into in a bit—necessitates having all that information they want available for language models in a different format. Specifically, that data, like documents or Slack messages, has to be converted to a more unified vector format with an embedding model and stored in a convenient place for retrieval.
That’s led to the growing importance of vector databases. And the growing need for the format to feed info-hungry prompts has blossomed into a whole ecosystem of both startups and larger companies bolting vector search onto their key products. And it’s led to one of the most competitive and fascinating races in the story arc of AI.
One of my first issues for Supervised when I launched was about how vector databases had gathered an absurd amount of hype on the back of the launch of ChatGPT and eventually Llama 1. Now that companies are settling into figuring out what AI looks like in actual production (with a deceptively large focus on batch data processing), it feels like as good a time as any to see how the space has evolved over time—particularly where these fit in a future where RAG and massive context windows both, at least hypothetically, serve the same end.
“While we do expect the length of context windows to continue to increase, that won’t nullify the need for RAG,” Brittany Walker, general partner at CRV, told me. “We believe RAG and long-context windows complement each other rather than compete with each other. RAG is efficient and performant, and retrieval helps the LLM focus on the right information. Long-context windows enable the LLM to process more context for one particular query.”
Vector databases have, in some ways, always been a part of the whole “feature versus product” debate since they became more popular around the launch of ChatGPT—which has particularly become more acute with the emergence of Postgres’ vector format as well as products from MongoDB, Snowflake, and Databricks. But there’s also an argument to be made that AI is a once-in-a-generation technology that necessitates a new way of thinking about everything underneath it.
“I strongly believe that every application in the future will have AI in it,” Weaviate CEO Bob Van Luijt told me. “Some applications will have AI sprinkled over them, some will have it at the heart of the application, if you take the AI out it doesn’t exist anymore. If you build a web app and you want to sprinkle some AI over it, great, use MongoDB, especially if you’re already using it. We’re saying, if you want to build an ai-native application that has AI at the heart of the application, that’s when you should consider Weaviate.”
Weaviate is now one of five vector database startups that have gained a lot of buzz among developers, investors, and experts I’ve talked over the past several months. And these startups continue to gain momentum as smaller, more efficient open source models pop up and embedding models become cheaper. A year after the launch of GPT-4, highly competitive open source models—like Mistral’s Mixtral—have emerged for cases that would specifically benefit from RAG. They don’t need to have the level of quality of a GPT-4, but need accurate and up-to-date information.
Whether or not long context window models do create the kind of tension they can potentially create (if you believe the online Discourse) is still up in the air. As Walker told me, they could very well complement each other in the end by excelling at different use cases.
But the chaotic growth of the AI ecosystem, which in many ways mirrors the emergence of the modern data stack, shows that there’s room for optimizations and improvements at every step of the process. And that means there’s room for a whole variety of startups—even ones in the same category.
So, let’s check in with what’s happening among vector databases, particularly two that seem particularly on the upswing.
The growing case for a vector database startups
RAG is generally the first stop by enterprises for trying to improve language model performance. The next step is fine tuning, though even with companies like Together AI the process is still complex. And pre-training remains the kind of last resort and better for highly specialized models, such as those in biotechnology.
Over the past few months, I’ve been routinely asking developers and investors what they think about each vector database startup—particularly what they are good at. Here is a kind of very abridged and nowhere close to comprehensive version of the commentary for them:
Pinecone: It’s probably the “safest” bet as a mature vector database platform in addition to releasing a serverless product. It’s not an open source tool and there may be some questions around the level of performance compared to what Qdrant offers at scale. But Notion, Gong, and Plaid are just a few examples of companies using Pinecone.
Chroma: It’s a great developer experience and is the first touch point most developers will have with a vector database. It works great in single-purpose uses and is really accessible, particularly for LangChain users. Chroma is an Apache 2.0 open source project. And Chroma Cloud offers an interesting promise if the code doesn’t really change and is just adapted for a hosted version.
Qdrant: The most performant of the vector databases, Qdrant is known for its ability to scale up for applications that require much more throughput. Qdrant is written in Rust. Qdrant is also Apache 2.0.
LanceDB: On the stealthier side, people refer to LanceDB as a kind of “new Parquet.” LanceDB is also written in Rust. LanceDB is also Apache 2.0.
Weaviate: Funnily enough, people are pretty split on Weaviate because it does a little bit of everything well but doesn’t have any highs or lows that developers specifically zero in on like Chroma or Qdrant. But Weaviate has a very software engineer-first feeling, which might make it more suitable to fitting into workflows. It’s written in Go.
The argument most incumbents make when an emerging database technology comes around—like time series or graph—is that, well, there will be some use cases for purpose-built databases for some certain task, but not always. And at first, most vector database startups were just riding the hype wave of AI—starting in March last year with the explosive funding round for Pinecone on its relatively small revenue. (To be clear, many of these existed for longer, but the importance of the format in the future of workflows wasn’t necessarily apparent until the launch of ChatGPT.)
While there were some startups (like Neeva web-scale search, which Snowflake acquired in June last year) were working with RAG architecture in the beginning of the year, at the time it was considered very cutting edge. By June what had seemed like a bandaid to manage hallucinations was en route to becoming a core part of a language model pipeline. And today, it’s pretty much secured a permanent home and even given rise to a number of startups that exist to the “left” of a vector database.
There are now a number of unstructured data ETL companies like Menlo-backed Unstructured.io and General Catalyst-backed Datavolo. (Datavolo announced its $21 million funding round this week.) Chunking, which essentially breaks data into smaller snippets to pull into a prompt, has its own startups as well like Y Combinator-backed Reducto. And the embeddings layer also has Voyage AI.
Two companies are also focusing on building a more comprehensive suite built around models optimized RAG. Cohere (which counts sentence-transformers creator Nils Reimers on staff) recently launched Command-R (and today announced a Command R+) And there’s also Contextual AI, a startup that lately has very close attention from investors.