Researchers develop method that reduces AI energy needs by 95%
BitEnergy AI has unveiled a revolutionary technique, Linear-Complexity Multiplication (L-Mul), which could drastically lower the energy consumption of artificial intelligence models. The new method could reduce power usage by as much as 95% without sacrificing quality. The L-Mul method substitutes energy-hungry floating-point multiplications with simpler integer additions in AI computations, providing a more efficient way of dealing with large and small numbers in binary form.
L-Mul's approach to AI energy consumption
The increasing energy requirements of AI have become a major concern, with models like ChatGPT using 564 MWh every day—enough to power 18,000 American homes. The Cambridge Centre for Alternative Finance estimates the entire AI industry could use 85-134 TWh every year by 2027. L-Mul tackles this problem by simplifying how AI models do calculations. Instead of complex floating-point multiplications, it uses integer additions to approximate these operations, resulting in faster calculations that consume less energy while being accurate.
Promising results and potential applications of L-Mul
The researchers at BitEnergy AI say "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products." This means a model using this technique would need much less energy for both computation and ideation. Apart from energy savings, L-Mul also shows better performance in some cases, beating existing 8-bit standards by delivering higher precision with less bit-level computation.
L-Mul's integration and operational advantages in AI models
L-Mul can be seamlessly integrated into transformer-based models, which are the backbone of large language models like ChatGPT. Tests on popular models like Llama, Mistral, and Gemma have demonstrated some accuracy gain on certain vision tasks with this algorithm. At an operational level, L-Mul is more efficient than traditional methods. For example, multiplying two float8 numbers takes 325 operations with current AI model operations, but L-Mul only takes 157—less than half.
Future prospects and challenges for L-Mul implementation
Despite its potential, the L-Mul technique faces a major challenge - it needs specialized hardware for optimal performance. The researchers at BitEnergy AI are working on developing such hardware that natively supports L-Mul calculations. They plan to "implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design." This could pave the way for a new generation of fast, accurate, and cost-effective AI models, making energy-efficient AI a tangible reality in the future.