Sakana AI and Nvidia Achieve 30% Faster H100 Inference by Skipping 80% of Invalid Computations

Sakana AI and Nvidia have open-sourced TwELL, a sparse data format that enables H100 GPUs to skip 80% of invalid computations in large language models without sacrificing accuracy. The solution delivers up to 30% faster inference and 24% faster training on H100s while reducing peak memory usage. In testing on a 1.5-billion-parameter model, the approach reduced active neurons to below 2% through lightweight regularization during training, with no performance degradation across seven downstream tasks.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments