Summarize

Researchers develop method that reduces AI energy needs by 95%

By Mudit Dube

Oct 09, 2024

04:10 pm

What's the story

BitEnergy AI has unveiled a revolutionary technique, Linear-Complexity Multiplication (L-Mul), which could drastically lower the energy consumption of artificial intelligence models. The new method could reduce power usage by as much as 95% without sacrificing quality. The L-Mul method substitutes energy-hungry floating-point multiplications with simpler integer additions in AI computations, providing a more efficient way of dealing with large and small numbers in binary form.

Energy efficiency

L-Mul's approach to AI energy consumption

The increasing energy requirements of AI have become a major concern, with models like ChatGPT using 564 MWh every day—enough to power 18,000 American homes. The Cambridge Centre for Alternative Finance estimates the entire AI industry could use 85-134 TWh every year by 2027. L-Mul tackles this problem by simplifying how AI models do calculations. Instead of complex floating-point multiplications, it uses integer additions to approximate these operations, resulting in faster calculations that consume less energy while being accurate.

Impact assessment

Promising results and potential applications of L-Mul

The researchers at BitEnergy AI say "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products." This means a model using this technique would need much less energy for both computation and ideation. Apart from energy savings, L-Mul also shows better performance in some cases, beating existing 8-bit standards by delivering higher precision with less bit-level computation.

Integration benefits

L-Mul's integration and operational advantages in AI models

L-Mul can be seamlessly integrated into transformer-based models, which are the backbone of large language models like ChatGPT. Tests on popular models like Llama, Mistral, and Gemma have demonstrated some accuracy gain on certain vision tasks with this algorithm. At an operational level, L-Mul is more efficient than traditional methods. For example, multiplying two float8 numbers takes 325 operations with current AI model operations, but L-Mul only takes 157—less than half.

Implementation hurdles

Future prospects and challenges for L-Mul implementation

Despite its potential, the L-Mul technique faces a major challenge - it needs specialized hardware for optimal performance. The researchers at BitEnergy AI are working on developing such hardware that natively supports L-Mul calculations. They plan to "implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design." This could pave the way for a new generation of fast, accurate, and cost-effective AI models, making energy-efficient AI a tangible reality in the future.