The data stack sprawl clears the way for Databricks
As the specter of consolidation in the modern data stack returns, the platforms like Databricks stand to come out on top.
Author’s note: Due to the high volume of conferences over the next few weeks, there will be one paid issue next week (Friday), and two the following week. In addition, there will only be one paid issue for the first week of October as I take my first real(ish) break from my laptop after surviving a few weeks of conference hell. See y’all at Disrupt!
Databricks this week announced a new mega-funding this round this week, providing it a new near-alignment with one of its semi-rivals, Snowflake.
That funding round also comes at a time when a specter is re-emerging now that the hype around AI is starting to fade into the background: the new data stack is getting too damn big, and it’s going to get worse with AI. And Databricks, positioning itself as a robust platform player that offers something for everyone, stands to emerge from it as one of the biggest winners of what could be a looming consolidation and shake-out of the modern data stack.
That sprawl continues to be a growing theme among investors, customers, and experts I speak with lately. The early 2020s created what Dbt Labs CEO Tristan Handy called at the time “Cambrian Explosion II” for the data stack. One or multiple startups emerged for pretty much every possible step of the data ingestion, processing, management, and usage process. And with another Cambrian explosion happening in AI, it stands to get even more tangled.
Those companies, which adopted a lot of these tools, are now applying more scrutiny to it and where they can optimize it by consolidating and removing steps. And it isn’t just Databricks that stands to gain from that scrutiny: any of the major platform providers, trying to provide a full-stack experience, can make a compelling pitch as the modern data stack has ballooned since the early 2020s.
Modern Data Stack, which provides a really handy guide to who’s using what, has literally 30 categories to go with the current data stack. Each of those categories has at least a handful of startups, with 22 of those categories having more than ten options.
With a $43 billion valuation, Databricks is increasingly converging on the market cap of Snowflake (which already flirted with sub-$40 billion in the last year). The two have emerged as both the key players, and natural acquirers, in the data layer—and inevitably find themselves overlapping.
Databricks and Snowflake have done this weird dance, periodically stepping into each others’ turf before going in different directions. Since July, it certainly feels like they are starting to put some distance between themselves from a strategy perspective, with Snowflake opting to partner closely with other companies. While Databricks has its own vector search capability, Snowflake opted to partner with Pinecone.
Snowflake’s focus on working with partners and its work on container services doesn’t necessarily suggest it’s fully betting on that daisy chain, though. It could probably easily integrate a lot of these features internally like vector search, but instead could simply wait it out and see which ones end up consolidated. But it could still end up leaving its products behind Databricks’—which would include months (or years) of user feedback.
They aren’t explicitly rivals due to the fact that many customers use both, because Databricks offers great Lake-based products while Snowflake offers great Warehouse-based products. Though, to be sure, Databricks’ bet is that the two will converge in a Lakehouse—a notion partly adopted by Snowflake with its bet on Iceberg Tables.
But all this essentially primes Databricks to play out what looks like a strategy that was already extremely successful for one company: Amazon Web Services.