AI in September: Reasoning engines, OpenAI goes consumer, and the data quality dilemma
Plus: what happens to data engineering in all this?
Author’s note: Happy October! I’m still dealing with an extremely unpleasant injury that’s making it a bit difficult to work, so I’ll be dropping back to 2 paid issues per week for the next two weeks unless something absurd like Apple buying Cohere happens. I’ll provide an update when I’ll be back to the regular schedule mid-October. I’ll have returned from a recovery break beginning next Tuesday and be back to responding to emails/DMs/etc. then.
The fall frenzy of AI is in full effect, with a glut of announcements coming out of the biggest companies like OpenAI and Meta and LangChain’s Harrison Chase showing up at what feels like at least a half-dozen events in the past four weeks. (Supervised will do a formal count on that at a later time.)
Last time we checked in, we started to see some of the hype around AI fade into the background as companies decided they needed to figure out how to make some practical use of AI instead of throwing everything at the wall. It also probably helped that we all got GPT-4 rate-limited to oblivion.
We started to see some of the largest companies build out more formal strategies around offering model gardens, providing access to Llama 2 alongside foundation model APIs like OpenAI, Cohere, and Anthropic. And we also started to see some of the platforms take advantage of what feels to be a pending stack sprawl for AI.
That was, of course, four-ish weeks ago. Four weeks later we have a whole new set of stuff blowing up—the biggest of which being an increased focus on the tech behind building a stack for inferencing all the models that are finally readily available and showing a lot of potential.
Here’s a quick overview of some of the biggest themes in AI that happened this month that we’ll cover—both in terms of the scale of the announcements, and what everyone seems to be talking about at all these conferences and events.
AI’s gaze turns to reasoning. LangChain and LlamaIndex are finally maturing into more potent technologies and retrieval augmented generation (RAG) picking up momentum. Now experts see an opportunity to build an agent-like system to solve complex problems.
OpenAI goes consumer. While we did get a substantial API update with the GPT 3.5 instruct model, OpenAI’s September was largely spent rolling out new consumer products attached to ChatGPT. OpenAI now seems to also be on a collision course with Meta with its slew of launches in AI.
Data quality rears its head. With companies emphasizing the effectiveness of fine-tuning smaller and more highly available models—particularly Llama 7B—companies now have to figure out how to assemble the right data sets.
Bonus round: what the hell happens to data engineers in all this?
With that, let’s get started!