When I first started with machine learning (AI), the models were small, the datasets were limited, and, compared to today, computers were also much smaller.
Recent breakthroughs in this field can be attributed to improved algorithms, including the invention of ReLUs, CNNs, LSTMs, and Transformers. Additionally, the significant increase in training dataset sizes and computational resources has played a crucial role.
In the past, I could train most models on a personal computer, and later, on workstations with dedicated GPUs. Nowadays, training large models from scratch demands a substantial amount of computational resources, which translates to a significant cost.
To provide a concrete example, training a language model as massive as Llama 2 70B takes approximately 1,720,320 GPU hours on Nvidia A100-80GB GPUs. In other words, training this behemoth using 5000 GPUs takes around 14 days.
If you were to rent A100 80GB GPUs at a rate of 1.5 Euros per hour, the cost would amount to approximately 2.5 million Euros.
This poses a challenge for smaller companies and universities that struggle to compete with larger companies that have access to extensive computational resources.