Google's weird Gemini launch and the framework abstraction dilemma
Google's marquee model makes an odd comparison to its primary competition.
We’ll be covering two topics today: a quiet machine learning framework release from Apple which could have some substantial ripple effects; and Google’s Gemini launch. We’re going to start with the Apple framework, which the developers just threw up on GitHub unceremoniously.
Before we get to that though, here’s the summary of all of the people I reached out to about Google’s launch of its Gemini language model this week: it was weird.
More specifically, everyone is generally excited that Google is coming out with what looks like a multimodal model that can be competitive with GPT-4. It’s the first serious challenger to OpenAI from a performance perspective and does a clean sweep of most of the available benchmarks (yes, we’re doing this again!).
But Gemini Ultra isn’t available, and we don’t have any pricing, and the only real exposure we have to it so far is that a fine-tuned version Gemini Pro (its mid-tier model) is integrated into Bard. The results seem to be pretty mixed. And Google also didn’t do itself any favors with one of its comparison benchmarks that it released.
So while there’s a lot of interest in the model, the consensus is that the announcement felt weird. And we’ll get to that later. But, first…
Abstracting out a fragmented machine learning development world, one startup at a time
We’ve talked a lot before about how Apple is in this interesting position to insert itself into the broader development cycle of AI models. There are already non-Apple packages for inferencing quantized versions of open source models like llama.cpp or oLlama, though their appeal extends more broadly to edge devices and not just Mac devices.
Mac devices, though, are beefy. The latest MacBook Pro model offers a comically large 128GB of memory, while the Ultra goes all the way up to 192GB. And the rest of the hardware is pretty good! The tricky part, though, is a lot of this stuff is still relegated to CPU and not-great-support for its GPU thanks to AI’s broader addiction to Cuda.
This week, Apple unceremoniously released a new framework called MLX. The framework takes a very JAX-y approach to using the actual power in a Mac device with what developers I’ve spoken with so far consider a relatively clean interface. More importantly, and surprising given it’s Apple, is that it’s actually open source under the MIT license.
Apple’s unique architecture—it uses Metal Performance Shaders, or mps, to execute machine learning tasks on its custom GPU hardware—doesn’t adapt well to the most popular AI framework PyTorch. And the biggest challenge Apple faces is effectively fitting into the developer workflow—largely a PyTorch-first workflow built on Nvidia’s CUDA—rather than trying to craft a new one where developers have to work around its limitations.
Effectively, it feels like that’s what Apple is attempting with MLX. But it’s also something another startup is trying to effectively abstract out at a much broader scale than just making it easier on a MacBook Pro. Modular, run by Apple and Google emeritus Chris Lattner and most recently raised $100 million, develops the Mojo machine learning framework. Modular is backed by General Catalyst, GV, SV Angel, Greylock, and Factory.
Mojo tries to address the issue data scientists and machine learning engineers have for developing and optimizing high-powered machine learning frameworks without the baggage of a whole extra set of skills—namely, Nvidia’s CUDA (and, to a certain extent, OpenAI’s Triton framework).
“There's probably ten times the number of people that know Python than those that know C++, and there’s probably ten times fewer people that know CUDA than C++,” Chris Lattner, CEO of Modular, told me. So there's this very, very big pyramid towards super specialization and narrow niches. That's actually a huge industry problem, because everybody knows Python and you’re saying there's this capability that you have to learn Rust for or CUDA for and there's no contiguous path from my kid that knows Python to becoming a CUDA engineer. You have to stop, restart, and relearn a completely different universe—that's what Mojo solves for.”
It’s pretty hard to overstate the challenges that come with shifting from a Pythonic mindset to C++ development for a more casual developer like myself (as in, not someone trained in classic computer science). It’s part of the reason so much work has gone into trying to abstract out the complexities of C++ in as many hack-y ways as possible and make them accessible to Python developers. And Data Scientists, by and large, live and breathe Python.
Modular hosted a summit this week, ModCon, where it unveiled a suite of announcements under a platform it calls MAX. That platform essentially reaches further down the deployment pipeline from the actual development itself, extending a Mojo developer’s reach into model serving and inference. A variety of companies already play in the model serving space, including Anyscale, but these companies are all largely PyTorch born and bred.
(JAX, at least anecdotally, still has yet to gain any major traction beyond Google and a handful of very forward-thinking machine learning startups. I once spoke to an executive at a very plugged in AI company who told me they’d “add JAX support when my customers actually ask for it.”)
If there were any indicator for the demand of a toolset like this—and a community to build up around it—it probably lied in the hundred-plus of people at the event.