Tilde Research Discovers Muon Optimizer Kills 25% of Neurons; Aurora Alternative Achieves 100x Data Efficiency Gain

According to Tilde Research, the Muon optimizer adopted by leading AI models including DeepSeek V4 and Kimi K2.5 has a hidden flaw: it causes over 25% of MLP layer neurons to permanently die during early training. The team designed Aurora, an alternative optimizer, and open-sourced it. A 1.1B parameter model trained with only 100B tokens matched the performance of Qwen3-1.7B trained on 36T tokens across language understanding benchmarks like HellaSwag and Winogrande, demonstrating roughly 100x data efficiency improvement. Aurora adds 6% computational overhead compared to Muon and can serve as a direct replacement.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments