The Hardware Lottery - How Technology Shapes Machine Learning Research

The Hardware Lottery: How Technology Shapes Machine Learning Research

A conversation with Sarah Hooker, Research Scholar at Google Brain

Introduction

The success of ideas in machine learning isn't purely based on their merit. Instead, as Sarah Hooker argues in her thought-provoking paper "The Hardware Lottery," breakthroughs often depend on having the right hardware available at the right time. In a recent interview with Machine Learning Street Talk, Sarah discussed how hardware constraints have shaped AI research, the challenges of model interpretability, and the hidden biases that emerge when we compress neural networks.

Sarah Hooker is a research scholar on the Google Brain team, focusing on training models that go beyond test set accuracy to fulfill multiple criteria: interpretability, compactness, fairness, and robustness. Before Google Brain, she founded Delta Analytics, a Bay Area nonprofit, and co-hosts the "Underrated ML" podcast with her brother Shan Hooker.

The Hardware Lottery Explained

"The Hardware Lottery, at its core, is really about how science progresses and what in the marketplace of ideas causes inertia," Sarah explains. "Inertia is all the things except for the value of an idea that may amplify it or set it back."

She argues that in computer science—a field less than a century old—the biggest swings in progress have been linked to ideas that matched the downstream compatibility of available hardware and software. The lottery metaphor highlights how factors independent of an idea's quality can determine its success or failure.

When asked what makes hardware particularly special in this lottery, Sarah draws a compelling analogy:

"It's not necessarily that hardware is the only thing that determines what comes next... Within the framework of computer science, hardware has arguably determined at several juncture points which thesis of ideas survive and which thesis of ideas are neglected."

Looking at AI history, she points out how general-purpose CPUs developed from 1969 onward disadvantaged connectionist approaches (neural networks) because they couldn't efficiently handle the matrix multiplications that turned out to be critical for these networks' success. It wasn't until the early 2000s when the right hardware (GPUs) combined with the right software ecosystem (frameworks like Torch, Theano, and later TensorFlow and PyTorch) that deep neural networks could finally break through.

The Evolution of Hardware for Deep Learning

The rise of GPUs for deep learning was, in Sarah's words, "almost a fluke." GPUs were developed for gaming, not AI research, and researchers essentially repurposed this technology through "software engineering magic."

This illustrates a key challenge in the hardware space: it's extremely small and primarily driven by commercial end-use cases rather than research needs. While Google has developed TPUs (Tensor Processing Units) to help deploy neural networks in practical applications, the hardware community in general remains closed, with few examples of open-source hardware design.

"Hardware communities don't publish, and there are very few examples of open-source hardware design," Sarah notes. "The machine learning community is exceptional because the pace of iteration is very fast... whereas for hardware, the timeline is typically two to three years for development, costing $85-130 million, and the cost of being scooped is extremely high."

This difference in culture makes co-design between hardware and algorithms difficult. Machine learning researchers have largely given up on influencing hardware because of different timelines and ecosystems that don't even share the same language.

The Long Tail Problem and Model Compression

One of Sarah's fascinating insights relates to how neural networks allocate their parameters. When discussing her work on pruning (removing weights from neural networks), she explains:

"What we find is that what you give up [when pruning] is performance on the long tail... The majority of parameters in a network are used to encode, in a very costly way, low-frequency instances in a distribution."

In other words, most of a model's capacity is spent memorizing rare examples rather than learning general patterns. This raises an important question: "Why don't we just treat the long tail better? Why don't we leverage that knowledge to use less capacity?"

Sarah suggests that instead of treating all examples equally during training, we could potentially be much smarter about how we handle rare cases, potentially saving enormous computational resources.

This insight connects to her work on sparse training—starting with fewer parameters rather than pruning a large network after training. The challenge is that our current hardware, software frameworks, and optimization methods are all designed for dense networks. As she points out: "If we think more about what are the choices we've made based on a distribution that is dense, can we rethink this formulation now for a distribution that is sparse?"

Bias and Fairness in Compact Models

In her paper "Characterizing and Mitigating Bias in Compact Models," Sarah investigates how compression techniques like pruning might affect model fairness.

"Fairness often coincides with treatment of the long tail," she explains. "That's mainly because fairness is partly concerned with when protected attributes end up underrepresented in the dataset."

Her research shows that when models are compressed, performance disproportionately degrades on underrepresented groups. For example, in the CelebA dataset, "blonde males" was the most underrepresented group and was disproportionately impacted by pruning.

What makes this research particularly valuable is that it doesn't require explicit labeling of protected attributes, which is often challenging or even illegal to collect. Instead, by observing which images are most impacted by compression, researchers can automatically surface subsets of data that need further auditing.

"By looking at what images are impacted, you actually come up with an unsupervised way at test time to surface the subset of data points that need further auditing by the human."

The Future of Hardware and Machine Learning

Sarah sees the field approaching a cliff where simply adding more parameters to models is becoming unsustainable:

"GPT-3 just came out and that cost $12 million for a single training. The writing is on the wall. The question is, what do we do about it and why are we not talking about it more?"

She contrasts our current neural network approaches with human intelligence:

"Our own intelligence, because of staying local and also not processing all examples equally, is already far more efficient. Our brain runs on the energy equivalent of an electric shaver, whereas if you compare it to the regime of parameters we're currently using in networks, it's bonkers. We're using the energy equivalent of thousands of flights to train a single model."

Looking forward, Sarah suggests that new directions in hardware research are needed, but acknowledges the challenge of allocating resources effectively:

"The cost of exploring [new hardware directions] is enormous, and so when particularly governments have to decide how to allocate R&D budgets, it's very difficult to allocate when your hypotheses are so initial."

She believes that software needs to step up first, providing better feedback loops and easier deployment across different hardware types, which would help researchers form better hypotheses about which hardware directions are most promising.

Conclusion: Stepping Stones and Serendipity

When asked whether the hardware lottery is simply part of the universal story of innovation—where great ideas must wait for the right stepping stones—Sarah acknowledges the universal nature of this phenomenon but emphasizes its particular importance in computer science:

"I don't think there's going to be a time where we eliminate the hardware lottery because really, at its core, it's about inertia in the uptake of ideas... What I do argue is that the cost of friction is very high in computer science right now."

She draws a connection to Anna Karenina's principle (referencing Tolstoy's observation that "happy families are all alike; every unhappy family is unhappy in its own way") in technology development—success requires many things to go right, while failure can happen in countless ways.

As Sarah poetically concludes, the path of innovation is rarely straightforward, and the right combination of hardware, software, and algorithms might lead us to breakthroughs we can't yet imagine. The challenge lies not just in developing new ideas, but in creating ecosystems where those ideas have a fair chance to flourish.

This blog post was adapted from Sarah Hooker's appearance on Machine Learning Street Talk. Sarah is a research scholar at Google Brain focusing on interpretability, compression, fairness, and robustness in machine learning models.

Search This Blog

Surf Find Post

The Hardware Lottery - How Technology Shapes Machine Learning Research

Comments

Post a Comment

Popular posts from this blog

Video From YouTube

GPT Researcher: Deploy POWERFUL Autonomous AI Agents

Building AI Ready Codebase Indexing With CocoIndex