According to Andon Labs' latest Vending-Bench 2 evaluation, GLM 5.2 ranked second in a long-term business simulation test. The benchmark simulated a vending machine company's 365-day operations, with models making daily decisions on inventory and pricing based on financial data to assess decision coherence over extended tasks.
GLM versions demonstrated consistent linear growth, with average monthly profit improvement near $1,000 (GLM 5 scored $4,432 average, GLM 5.1 reached $5,634). In contrast, Kimi K2.7 Code underperformed relative to K2.6, while Minimax M3 improved significantly over M2.5 but remained substantially below both Kimi and GLM series in overall profitability.