Richard Sutton, a leading figure in the field of Artificial Intelligence, has profoundly influenced how we approach AI development. In his seminal 2019 blog post, “The Bitter Lesson,” Sutton emphasizes that the ultimate factor in enhancing AI performance is the computational power of hardware.
The more powerful the hardware, the more capable and generalized the algorithm becomes. This stands in stark contrast to the pursuit of novel methodological solutions aimed at pushing the boundaries of generalization. These solutions often incorporate human domain knowledge to enhance model performance. However, while this knowledge is costly and often difficult to obtain, it also biases the model, making it somewhat less general and prone to failure under certain conditions, as defining general rules that encompass the full variability of human nature is inherently challenging. Generally, human knowledge is effective in finding solutions that require fewer computations and less data to function, but they end up being less general and overly specific. Conversely, the massive amounts of data coupled with substantial computational resources, as seen in the recent era of large language models, have shown impressive results and emergent generalization capabilities.
This lesson is bitter because researchers often invest time in finding algorithms that best represent how humans optimally solve problems. While this can lead to impressive and satisfying results, these methods break when it comes to scalability and generalizability, unlike scaling computational power. A prime example is how the field of computer vision has transformed with the advent of deep learning—from manually labeled features to deep learning algorithms that we barely understand yet surpass the performance of manually designed features by a wide margin.
The primary takeaway messages from Sutton’s “The Bitter Lesson” are:
- Scaling computational power is a more fruitful direction than excessively introducing human biases and knowledge into AI algorithms. Therefore, algorithm development should proceed with the expectation that computational power will continue to grow exponentially.
- We should reconsider our assumptions about how the human brain works and processes information. Designing heuristics that emulate human cognitive processes is challenging and more complex than what we currently achieve.
However, it is important to acknowledge that reliance on computation alone does not solve all problems. Current technologies still benefit significantly from great human inventions inspired by how humans process information. The most notable examples include the Convolutional Neural Network (CNN), which efficiently integrates translational invariance into learning representations, and the Transformer network, which places attention mechanisms at the forefront.
Leave a comment