What's next for Supervised
I've made the extremely difficult decision to end my full-time job with the newsletter, which will shift to more of a hobby. Plus, is the thinning out in the modern data stack finally here?

If you haven’t heard already, a few months ago I made the very, very difficult decision to wind down my full-time job writing this newsletter due to a back injury. Now that I’m feeling better post-surgery and getting eager to participate in all the exciting stuff going on in this space, I’m going to start writing again — for free — until I find a full-time gig.
If you want my thoughts on the modern data stack, skip ahead. If you want the full story of my newsletter decision and my takeaways from the experience, here it is:
A lower back injury causing extreme pain left me barely able to get write, much less actively try to report—which involves a lot of meetings, events, and being able to sit for long periods of time and focus on conversations. I spent six months, unsuccessfully, trying to rehab my injury and treat it through physical therapy, acupuncture and lifestyle changes. That unfortunately didn’t work, and I’m now writing this on the other end of back surgery.
Before launching the newsletter, I set several ambitious growth targets—ones that would make it viable to run a newsletter as a full-time gig in the Bay Area. As a result of the injury, I wasn’t able to hit them.
Momentum is everything when you are running your own thing. This was growing quickly for a year, and then came to a crashing halt once I stopped publishing and turned off billing as I tried to rehab my injury.
Launching my own project has always been a dream of mine, and I had hoped if it were to come to an end, it would be because of a series of my own mistakes and errors. Unfortunately, I massively underestimated the value your physical health plays into the actual health—and survival—of your product.
For now, I finally have some reprieve from the pain (and some creative energy) to write for you while I recover. But I’m now going to have to do it as a hobby while I figure out what to do next. (If anyone has ideas, I’m all ears!)
Words cannot express my gratitude if you’ve been a paid supporter of Supervised—especially those who were there from day one. You all made it possible to chase that dream, even if it ended in devastating fashion. I hope I was able to deliver on the value and promises I made when you came along for the ride.
To everyone who was even a casual reader—I know your inbox is crowded! Thank you for taking even a little look at what I was working on over here.
Finally, I want to say this over, and over, and over again for others that are running their own thing (or thinking of starting one): take care of your health. I know I did my best work when I was fully healthy.
Some final logistics to address:
Billing will remain off for the foreseeable future. This also means only existing paid subscribers will be able to see what is below a paywall in older posts.
I’ll continue to write as opportunities arise, though on a much more limited publishing schedule.
If you’re a paid annual subscriber and you are interested in a pro-rated refund, please reach out to me directly at m@supervised.news. I’ll do my best to process them in the coming months.
Again, my deepest thanks to everyone: my readers, my sources, and all the companies and startups that took the time to talk to me—especially right as I was getting off the ground. As always, you can reach out to me at any point for now, well, anything. My line will remain open, and I want to know about your life too!
Early signs of a consolidation of the modern data stack
(Full disclosure: I spoke with a handful of the companies in the broader modern data stack and MLops stack about potential roles. That has no impact whatsoever on my analysis here, which is based on historical reporting and prior newsletters.)
The looming consolidation of the modern data stack has been a constant theme (and somewhat of a joke) for the better half of a decade. Venture money flooded the ecosystem going back to the early 2020s, sending companies that were rewriting parts of the analytics and data science stack to multi-billion dollar valuations. In the same way that AI is incredibly crowded, the modern data stack has been incredibly crowded for a significantly longer period of time.
But in the last few months, the largest players in the industry have indeed finally started picking off smaller companies that made sense within a company, and not necessarily a full-fledged public company. We’re starting to see signs of it in both the modern data stack and the machine learning ops stack, which you could even argue are kind of interchangeable by now.
While not exactly part of that consolidation, my general observation here is that this kicked off in earnest with Dbt Labs acquiring SDF in January this year, which in some circles was considered a potential challenger to parts of Dbt as a kind of pre-compiled SQL tool. And it also wasn’t solely interesting for being in the category of “what if X, but Rust?” (Tobiko is also in this conversation, which in June last year raised $21.8 million from Theory Ventures and Unusual Ventures—as well as MotherDuck CEO Jordan Tigani, Fivetran CEO George Fraser, and Census CEO Boris Jabes.)
Since then Weights & Biases, basically the main option in model experiment and evaluation—and in some ways now observability—was acquired by CoreWeave. The Information reported that earlier talks between the companies landed around $1.7 billion. (I originally reported that Weights & Biases held talks for a $2 billion funding round around the beginning of March 2023, and that Weights & Biases had passed $30 million in recurring revenue in April last year.)
Databricks also acquired Neon, a serverless Postgres, for around $1 billion. Neon’s (correct) bet was that Postgres was never going to go away, and there was an opportunity to potentially extend it to AI applications with embedding and retrieval tools like pgvector. It also bet early that, though there would be a need for some vector format, the massive scale of independent vector databases like LanceDB wasn’t going to be necessary for most companies. Indeed, Databricks largely positioned the announcement as important for agentic workflows.
The list goes on—both in the form of acquisitions, but also in plenty of conversations about acquisitions.
Fivetran, the largest provider for ETL tools, acquired Sequoia-backed reverse ETL startup Census. The reverse ETL category contained not one, but three companies that touched it in some form or another: Census, Hightouch, and Rudderstack.
OpenAI acquired Cursor competitor Windsurf for $3 billion, bringing a proper IDE into the fold rather than plugging a key into Cursor (or having extensive back-and-forths with Claude Sonnet). You could call this an app on top of AI, but really, this is potentially so deeply integrated into workflows it’s simply part of the stack.
Observability colossus Datadog acquired feature flagging and experimentation startup Eppo, which Alex Konrad over at his new publication Upstarts Media reported was for $220 million. This is after Datadog first whiffed on its AI model evaluation last year while a number of startups were gaining momentum, and maybe signals an openness to buy into areas rather than build.
MongoDB acquired Voyage AI, a specialist in embedding products, for $220 million in February. This also came after a lot of hand-wringing last year over whether MongoDB would launch its own embedding model after playing the field for so long.
The Information reported that Snowflake held talks to acquire Redpanda, a streaming data service. This is also part of a potential shift (and ongoing joke) about all data coming in through streaming data pipelines, rather than batch data processing. (I’ll let you all argue amongst yourselves the difference between streaming and micro-batch.)
The very crowded space that was, by some (most?) accounts, drastically over-funded has begun an early thinning out process through acquisitions. And this in some ways is in service to two impending changes that are intertwined: data lakes en route to being an increasingly attractive option for inexpensive storage; and the fight over owning the data control plane for a company’s data. (For the former, look no further than the battle for Tabular, which Databricks snatched from Snowflake by more than doubling an early offer around $600 million that I first reported.)
Late last year (before I had to begin my rehab attempt), a number of categories entered the conversation for ownership of a more complete data control plane. Dbt largely owned the data transformation layer. Atlan, Alation, and Collibra were dueling for ownership of the data catalog covering lineage and governance (which Snowflake released its own open source product). Fivetran owned classic ETL and (still) stands to grow into AI workflows—especially as lakes move toward a preferred option. The customer data pipeline was (and seemingly still is) up for grabs, with Hightouch—recently valued at $1.2 billion in a round that included Bain Capital Ventures, Y Combinator, and Amplify partners—and Rudderstack still going after it.
(One outcome I’m keeping my eye on right now is what happens with Informatica and whether it is able to modernize in a way that it can truly make a run for ownership of the data control plane. As one source once joked to me, “I don’t want to have to deal with five logins.”)
The rise of Snowflake and Databricks gave birth to all of these micro-categories that are seemingly consolidating out to a smaller number of broader categories that make sense with expanded ownership of a general data pipeline. And while it doesn’t seem like there will be a collapse, there are a number of clear leaders emerging—all with slight overlap with one another, but largely playing nicely.
(To just also toot my own horn here, the majority of companies here were on my 2023 startups watch lists for AI and Big Data! I’ll revisit that in the next issue and provide a few updates.)
The difference now is that all of these companies have since adjusted their branding-slash-products to fit the mold of modern AI. And this whole process of the modern data stack extending itself into AI has been a constant theme since the launch of ChatGPT. Enterprises have stricter requirements like robust data lineage and governance, which also feed directly into systems like those using RAG. And good AI is predicated on good data, after all, whether that’s flowing into. model customization or data retrieval.
As a final note, one thread I am following very closely is the second coming of query engines like Starburst, another startup that was in my early watch list. Starburst—which includes Andreessen-Horowitz, Altimeter Capital, Coatue, and Index as investors—raised an enormous amount of funding that valued it at $3.35 billion in the mania around the modern data stack. But the shift in focus towards lakes with query engines on top of it has the potential to enormously benefit Starburst.
I suspect we are nowhere close to the end of this thinning out, which has been a very long time coming. The bigger question I have is whether this is going to be a thinning out leads to significant outcomes for firms and companies, or whether we’re talking about dollar-in-dollar-out acquisitions.
If you did indeed make it all the way down here, I want to say it one more time: thank you, from the bottom of my heart, for being a supporter and reader of my work as I went on this insane adventure. I’m forever in your debt.
As always, if you have any complaints, ideas, or thoughts you’d like to share, feel free to reach out to me. You can find me at m@supervised.news or on Signal at (415) 690-7086.


