Everyone is talking about Nvidia’s jaw-dropping earnings results — up a whopping 265% from a year ago. But don’t sleep on Groq, the Silicon Valley-based company creating new AI chips for large language model (LLM) inference (making decisions or predictions on existing models, as opposed to training). Last weekend, Groq suddenly enjoyed a viral moment most startups just dream of.
Sure, it wasn’t as big a social media splash as even one of Elon Musk’s posts about the totally unrelated large language model Grok. But I’m certain the folks at Nvidia took notice after Matt Shumer, CEO of HyperWrite, posted on X about Groq’s “wild tech” that is “serving Mixtral at nearly 500 tok/s” with answers that are “pretty much instantaneous.”
Shumer followed up on X with a public demo of a “lightning-fast answers engine” showing “factual, cited answers with hundreds of words in less than a second” —and suddenly it seemed like everyone in AI was talking about and trying out Groq’s chat app on its website, where users can choose from output served up by Llama and Mistral LLMs.
This was all on top of a CNN interview over a week ago where Groq CEO and founder Jonathan Ross showed off Groq powering an audio chat interface that “breaks speed records.”
VB Event
The AI Impact Tour – NYC
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.
While no company can challenge Nvidia dominance right now — Nvidia enjoys over 80% of the high-end chip market; other AI chip startups like SambaNova and Cerebras have yet to make much headway, even with AI inference; Nvidia just reported $22 billion in 4th quarter revenue — Groq CEO and founder Jonathan Ross told me in an interview that the eye-watering costs of inference make his startup’s offering a “super-fast,” cheaper option specifically for LLM use.
In a bold claim, Ross told me that “we are probably going to be the infrastructure that most startups are using by the end of the year,” adding that “we are very favorable towards startups — reach out and we’ll make sure that you’re not paying as much as you would elsewhere.”
Groq LPUs vs. Nvidia GPUs
Groq’s website describes its LPUs, or ‘language processing units,’ as “a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications with a sequential component to them, such as AI language applications (LLMs).”
By contrast, Nvidia GPUs are optimized for parallel graphics processing, not LLMs. Since Groq’s LPUs are specifically designed to deal with sequences of data, like code and natural language, they can serve up LLM output faster than GPUs by bypassing two areas that GPUs or CPUs have trouble with: compute density and memory bandwidth.
In addition, when it comes to their chat interface, Ross claims that Groq also differentiates from companies like OpenAI because Groq does not train models — and therefore don’t need to log any data and can keep chat queries private.
With ChatGPT estimated to run more than 13 times faster if it were powered by Groq chips, would OpenAI be a potential Groq partner? Ross would not say specifically, but the demo version of a Groq audio chat interface told me it’s “possible that they could collaborate if there’s a mutual benefit. Open AI may be interested in leveraging the unique capabilities of LPUs for their language processing projects. It could be an exciting partnership if they share similar goals.”
Are Groq’s LPUs really an AI inference game-changer?
I was supposed to speak with Ross months ago, ever since the company’s PR rep reached out to me in mid-December calling Groq the “US chipmaker poised to win the AI race.” I was curious, but never had time to take the call.
But now I definitely made time: I wanted to know if Groq is just the latest entrant in the fast-moving AI hype cycle of “PR attention is all you need”? Are Groq’s LPUs really an AI inference game-changer? And what has life been like for Ross and his small 200-person team (they call themselves ‘Groqsters’) over the past week after a specific moment of tech hardware fame?
Shumer’s posts were “the match that lit the fuse,” Ross told me on a video call from a Paris hotel, where he had just had lunch with the team from Mistral — the French open source LLM startup that has enjoyed several of its own viral moments over the past couple of months.
He estimated that over 3000 people reached out to Groq asking for API access within 24 hours of Shumer’s post, but laughed, adding that “we’re not billing them because we don’t have billing set up. We’re just letting people use it for free at the moment.”
But Ross is hardly green when it comes to the ins and outs of running a startup in Silicon Valley — he has been beating the drum about the potential of Groq’s tech since it was founded in 2016. A quick Google search unearthed a Forbes story from 2021 which detailed Groq’s $300 million fundraising round, as well as Ross’s backstory of helping invent Google’s tensor processing unit, or TPU, and then leaving Google to launch Groq in 2016.
At Groq, Ross and his team we built what he calls “a very unusual chip, because if you’re building a car, you can start with the engine or you can start with the driving experience. And we started with the driving experience — we spent the first six months working on a compiler before we designed the chip.”
Feeding the hunger for Nvidia GPU access has become big business
As I reported last week, feeding the widespread hunger for access to Nvidia GPUs, which was the top gossip of Silicon Valley last summer, has become big business across the AI industry.
It has minted new GPU cloud unicorns (Lamda, Together AI and Coreweave), while former GitHub CEO Nat Friedman announced yesterday that his team had even created a Craigslist for GPU clusters. And, of course, there was the Wall Street Journal report that OpenAI CEO Sam Altman wants to deal with the demand by reshaping the world of AI chips — with a project that could cost trillions and has a complex geopolitical backdrop.
Ross claims that some of what is going on now in the GPU space is actually in response to things that Groq is doing. “There’s a little bit of a virtuous cycle,” he said. For example, “Nvidia has found sovereign nations are a whole thing they’re doing, and I’m on a five-week tour in the process of trying to lock down some deals here with countries…you don’t see this when you’re on the outside, but there’s a lot of stuff that’s been following us.”
He also pushed back boldly on Altman’s effort to raise up to $7 trillion for an massive AI chip project. “All I’ll say is that we could do it for 700 billion,” he said. “We’re a bargain.”
He added that Groq will also contribute to the supply of AI chips, with plenty of capacity.
“By the end of this year, we will definitely have 25 million tokens a second of capacity, which is where we estimate OpenAI was at the end of 2023,” he said. “However, we’re working with countries to deploy hardware which would increase that number. Like the UAE, like many others. I’m in Europe for a reason — there’s all sorts of countries that would be interested in this.”
But meanwhile, Groq also has to tackle mundane current issues — like getting people to pay for the API in the wake of the company’s viral moment last week. When I asked Ross if he planned on figuring out Groq’s API billing, Ross said “We’ll look into it.” His PR rep, also on the call, quickly jumped in: “Yes, that will be one of the first orders of business, Jonathan.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.