It seems that the more groundbreaking deep learning models there are in AI, the more massive they become. This summer’s most visited natural language processing model, GPT-3, is a perfect example. To achieve the level of accuracy and speed to write like a human, the model required 175 billion parameters, 350 GB of memory and $ 12 million exercise (think of exercise as a “learning phase”). But big AI models like this one not only have the cost, they also have a huge energy problem.
UMass Amherst Researchers found that the computing power required to train a large AI model can cause over 600,000 pounds of CO2 emissions – that’s five times the amount of a typical car over its lifespan! These models often require even more energy for processing in real production environments (also known as the inference phase). NVIDIA appreciates that 80-90 percent of the cost of running a neural network model comes from inference rather than training.
In order to make further advances in AI, it is widely believed that we need to make a big compromise on the environment. That’s not the case. Large models can be scaled down to run on an everyday workstation or server without sacrificing accuracy and speed. But first, let’s look at why machine learning models got so big in the first place.
Now: doubling the computing power every 3.4 months
Just over a decade ago, researchers at Stanford University discovered that the processors that power the complex graphics in video games, called GPUs, could be used for deep learning Models. This discovery led to a race to develop increasingly powerful dedicated hardware for deep learning applications. In turn, the models created by data scientists grew larger and larger. The logic was that larger models would produce more accurate results. The more powerful the hardware, the faster these models run.
Research by OpenAI proves that this assumption is widespread in this field. Between 2012 and 2018, computing power for deep learning models doubled every 3.4 months. That means that over a six-year period, the computing power used for AI has increased a shocking 300,000-fold. As mentioned above, this power is not only used for training algorithms, but also for use in production settings. Newer Research from MIT suggests that we may hit the upper limits of computing power sooner than we think.
Additionally, resource constraints have limited the use of deep learning algorithms to those who can afford it. If deep learning can be applied to everything from detecting cancer cells in medical imaging to stopping hate speech online, we cannot afford to restrict access. On the other hand, we cannot afford the environmental impact if we continue with infinitely larger, more performance-hungry models.
The future is getting small
Fortunately, researchers have come up with a number of new ways to shrink deep learning models and reuse training datasets using smarter algorithms. This allows large models to run in production settings with less power and still achieve the desired results depending on the use case.
These techniques have the potential to democratize machine learning for more companies that don’t have millions of dollars to invest in training algorithms and move them to production. This is especially important for “edge” use cases where larger, specialized AI hardware is physically impractical. Think tiny devices like cameras, dashboards, smartphones, and more.
Researchers downsize models by removing some of the unnecessary connections in neural networks (clipping) or by making some of their math operations less complex to process (Quantization). These smaller, faster models can run anywhere with similar accuracy and performance as their larger counterparts. This means that we no longer have to race to the top of computing power, which causes even more environmental damage. Making big models smaller and more efficient is the future of deep learning.
Another important topic is the repeated training of large models in new data sets for different use cases. A technique called Transferring learning can help prevent this problem. In transfer learning, ready-made models are used as a starting point. The knowledge of the model can be “transferred” to a new task with the help of a limited data set without having to retrain the original model from scratch. This is a critical step in reducing the computational power, energy, and money required to train new models.
The final result? If possible, models can (and should) be scaled down in order to consume less computing power. And knowledge can be recycled and reused instead of starting the deep learning training process from scratch. Ultimately, finding ways to reduce the model size and its associated computational power (without sacrificing performance or accuracy) will be the next great opportunity for deep learning. In this way, anyone can run these applications in production at a lower cost, without having to make a massive compromise on the environment. Anything is possible if we think small about big AI – even the next application to stop the devastating effects of climate change.
Published on March 16, 2021 – 18:02 UTC