Revisiting that old Google AI memo
A few things have changed since a Google researcher sounded the alarm on Google's risk to open source AI in a leaked memo last year.
Hope everyone had a productive few weeks! I’m back and resuming publishing, though slowly ramping back up as I get back in touch with people, so there may be fewer issues for the next two weeks or so. Thanks everyone for you patience!
A bit more than a year ago at the dawn of the comically short period that we’ll call “modern AI,” a memo from a Google researcher found its way onto to SemiAnalysis that essentially served as a warning of the risks open source AI presented to foundation model providers like OpenAI and Google. (DeepMind CEO Demis Hassabis also confirmed its authenticity to The Verge.)
The original Llama model from Meta had recently leaked, and it became pretty clear that the developer community was going to run off with it and develop a lot of new techniques to improve the performance of open source models, many of which were actually adopted by Apple for its forthcoming AI suite Apple Intelligence. But at the time the catch phrase was that Google “had no moat, and neither does OpenAI.”
A few things have changed since then! And it seems like as good a time to look at where Google actually sits in this whole mess as its latest version of Gemini Pro, its own foundation model, and its latest micro-model Gemma 2 2B both sit atop yet another leaderboard respectively for models in their class.
Google has since run this kind of multi-track approach to AI, one with its own Gemini-series foundation models with the expectation you will run up a million tokens per prompt and a platform with smaller models (including its own Gemma series models) available with Google Vertex. But while it’s been pretty easy to dunk on Google for not having a real AI strategy as its AI search products seem to have not gone so well, there might be some more subtle signals that Google might not be as floundering as it seems on the surface.
While OpenAI invests heavily in creating a suite of foundational technology across a wide variety of modalities, it’s relying on powerful models that require powerful hardware and doesn’t work with the open source community. Meta, meanwhile, has invested heavily in training a wide variety of open-ish source models. Meta can deploy those open-ish source models its products including the best learning from other users, but carry potential risks of everyone having access to the same base technology it has. And in both situations, they’re still largely reliant on hardware provided from Nvidia.
Google has found a way to do both, and has the potential to do so without heavy reliance on external hardware providers. Having a powerful foundational model product effectively gives it a SaaS business to monetize all that work around training and development for current and potential future use cases. (A Google spokesperson noted that while they did use TPUs for internal workloads, they also run internal workloads on GPUs.)
And ingratiating itself in the developer community gives it a direct line to the rapid experimentation happening there, as well as an opportunity to nudge the development arc of AI in directions that could potentially benefit Google. It’s much in the same way Meta was able to benefit from the deep learning community largely adopting PyTorch, taking some of the best learnings and applying them to Meta products.
Google has quietly assembled a kind of comprehensive stack that—while it obviously has invested in making Nvidia hardware available—could give it a level of autonomy that many of the other AI developers and providers don’t necessarily have. At a time where everyone is reliant on Nvidia, the years of development on this immense stack on Google Cloud are paying out, even if it has yet to convert all this into a compelling consumer product. In its most recent operating quarter, Google said its Cloud business for the first time passed $10 billion in quarterly revenue and $1 billion in operating profit.
That certainly includes hardware with its TPUs, but Google also owns a software stack that companies have been quietly picked up by others, with both xAI and Apple disclosing they used JAX for their development of Grok and Apple Intelligence models respectively.
In that memo, the Google researcher flagged a handful of signals that could threaten Google’s (and OpenAI’s) dominance in AI from the open source community, and by extension companies that would use open source technology. But Google has essentially covered the majority of them while also maintaining development and release of a larger, more powerful product at the same time that competes more directly with Anthropic and OpenAI.
When ChatGPT launched in November 2022, Google was essentially caught flat-footed and it had to scramble to figure something out. We ended up getting a series of poorly-received products like its attempts at an AI-powered search engine. But setting aside that consumer strategy, Google has clearly created this kind of top-to-bottom developer experience that it can use either internally or serve externally.
Google was quietly throwing its resources behind JAX for internal use cases several years ago, even before language models became a big focus in modern AI. And while PyTorch still remains a preferred deep learning framework (part of that thanks to Google’s own histopry with TensorFlow) JAX has seemingly started to make its way into larger organizations. (And let’s not forget Google created practically the original major open source language model with BERT.)
And while search continues to be Google’s primary cash engine—which (in extreme emphasis) could end up disrupted by AI—its products are all intricately linked to its own developer stack. That was true with its research breakthroughs with MapReduce and TensorFlow, and it seems like it will continue with its development of AI-based language models.
What was called correctly in that memo, and what Google has done
At the time the original memo leaked, Google had launched Bard just a few months earlier, and would launch its PaLM 2 model shortly after that while also teasing the existence of Gemini. The former was, well, not great, and the latter demonstrated an emphasis on larger models—and chasing OpenAI. (And, to be fair, Bing’s search was also widely panned.)
The whole sequence of events was comical enough that people were essentially wondering if Google was facing an existential threat as the hype around AI absolutely exploded. That hype led to colossal funding rounds in a lot of emerging companies—some of which have been effectively acquihired (and acquilicensed, I guess). And Google’s emphasis on larger models was one of a number of potential problems it was dealing with in the memo, among others, as Meta’s original and much smaller Llama model captivated the developer ecosystem.
In rough, broad strokes, this was what the researcher covered back when it leaked in May last year—much of it which turned out to be directionally where the industry would end up heading leading up to Apple’s detailing of its own on-device AI suite, Apple intelligence:
On-device model inference: While on-device turned out to be an extremely popular outcome, this can essentially be abstracted out to low-power inference—like on a CPU—which is predicated on quantized versions of smaller models. Google ended up moving into this with Gemini Nano.
Fine-tuning for personalization: Low-rank adaptation, at the time one of a handful of experimental techniques the open source community was running off with, turned out to be a preferred approach for customization—including being deployed by Apple.
The quality gap for large foundation models was closing: If the leaderboardification of AI has taught us anything, it’s that all these larger models developed by Google, Anthropic, OpenAI, and others are very close to each other in quality and the one-upsmanship doesn’t look like step function improvements.
Focusing on larger foundation models was slowing Google down: At the time, the hype was around building a ChatGPT competitor that could do a whole lot of stuff all at once. But Google essentially showed it could do a bit of everything and do it pretty well, though it’s not clear exactly what the kind of payout on the other end will be for its Gemma series models.
People might not pay for a restricted model behind an API if free open source models are available: We’ll get to this one in a second, but did not end up entirely correct—though that could still change in the future. But we still did, indeed, see a rapid race to the bottom on price.
At the time, companies were hoovering up as much Nvidia hardware as they possibly could, which led to an overall shortage for GPUs. Meanwhile we started to see projects like Llama.cpp (and eventually Ollama) start to demonstrate the capabilities of smaller versions of models through a process called quantization that were able to run on edge devices like laptops.