The emerging reasoning stack
We have an ambiguous bucket for AI's latest hot problem to build out an inference pipeline. Plus: LangChain starts to make sense.

Author’s note: due to the high volume of conferences in the coming weeks, there will be two paid issues coming out the final week of September. In addition, there will only be one paid issue coming out the first week of October while I shut-ish my laptop for my first break since I started working on this.
We’ve come through about a month of tech conferences in San Francisco now and pretty much all of them—including Salesforce—have had AI as the core theme. (Also, hello to the RIP San Francisco crowd in the back.)
There’s a kind of meme that’s grown louder in that period along the lines of “what happens when everything’s already done training?” Realistically that’s just an awkward way of saying “oh yeah we should figure out how to use AI in some practical form instead of just fiddling with models to do some cool stuff.”
Alongside that we’re starting to see the emergence of a new scaffolding around basically that problem in a kind of “stack” for inferencing those models. We saw that pretty aggressively in each of those conferences, and you feel it come up over and over in discussions these days. And it’s all getting thrown under what’s considered the prime problem du jour in AI: Reasoning.
More specifically, reasoning translates to whether or not an AI model can emulate some train of thought to logically arrive at some correct conclusion rather than just getting an answer right from the get-go (or “zero shot”). These language models have shown to be incredibly capable in a lot of zero shot scenarios, but once the problems get more intricate and complex and require recent information, they can go off the rails.
Sequoia’s Pat Grady and Sonya Huang (of which the firm is an investor in LangChain) lay that problem out as one of the primary points of development in what they call generative AI’s “Act Two.” They also include a number of others, including retrieval augmented generation, reinforcement learning from human feedback, and what I’d generally refer to as serverless GPU compute, which we’ll get to in a second.
But there are two things that have come up lately that seem to hammer home a few of these. One comes in the form of Anyscale, the developers behind Ray, trying to ease the burden of model serving and inference. And the other is one that is incredibly long overdue: LangChain actually starting to make sense beyond just being a prototyping framework for emulating reasoning through the form of agents.
“I think the reasoning capabilities, there's still a long way to go there,” Anyscale CEO Robert Nishihara told me. “Even just knowing if the stuff you're saying is like self consistent or is right or wrong, there’s still some huge gaps there.”
Anyscale this week put out two versions of an easily accessible model endpoint: a publicly-accessible API for Llama-series models, and an endpoint tool that sits on-premise for enterprise-built language models. The former is practically a drop-in replacement for other foundation model calls (like GPT-4 or Claude 2) managed by Anyscale, while isn’t necessarily a serverless product and requires hardware managed by that customer.
Meanwhile, LangChain—coming off its incredible “LangChain is pointless” moment—is starting to show signs of where it actually fits in the reasoning stack beyond just orchestrating language model flow for agents. And its answer came out less than two weeks after that viral moment.