According to BlockBeats, Coinbase CEO Brian Armstrong stated on June 27 that the key to maintaining stable AI costs while token usage grows exponentially is not restricting usage, but using better default models and caching mechanisms. Coinbase is defaulting to open-weight models such as GLM 5.2 and Kimi 2.7 through its LLM gateway, while still encouraging engineers to select appropriate models for specific tasks. The company noted that 91% of employees never hit usage caps, so rather than lowering quotas, it shifted to lower-cost default models.
Coinbase has implemented cache-aware request handling and smart model routing based on cache hit rates. For example, after optimizing cache implementation, LibreChat's cache hit rate improved from 5% to 60%. Through these practices, Coinbase has reduced AI spending by nearly half while token usage continues to grow.