xAI announced on July 1 the launch of the Voice Agent Builder Beta, a fully no-code AI voice agent building platform that enables users to build enterprise-grade voice agents in 2 minutes using natural language prompts. The platform adopts an end-to-end Speech-to-Speech single voice path tightly coupled with Grok Voice, surpassing GPT in benchmarks.
(Source: xAI website)
According to xAI's official release, Grok Voice Think Fast 1.0 ranks first on the τ-voice Bench voice benchmark leaderboard, directly surpassing Google Gemini 3.1 Flash Live and OpenAI GPT Realtime 1.5 in both response speed and reasoning capability.
xAI explained that Grok Voice is trained on real call scenarios designed to be "the most difficult," covering low-quality phone noise, strong accents, user interruptions, and ambiguous commands, and natively supports over 25 languages.
xAI officially explained that traditional enterprise AI voice customer service requires connecting three independent systems—Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS). This assembled architecture increases multi-hop latency, error rates, and operational costs.
Voice Agent Builder adopts an end-to-end Speech-to-Speech single voice path tightly coupled with Grok Voice, where the entire voice processing pipeline operates without segmented switching, aiming to reduce latency and minimize concatenation errors.
According to xAI's official feature description, the four core functional modules of Voice Agent Builder are as follows:
Knowledge Base: Supports uploading Word, Excel, PDF, JSON, and other formats, which can be organized into shared Collections across agents, ensuring consistency in product specifications and policies.
Tools & Connectors: Built-in Google/Outlook Calendar, web search, X (Twitter) search, and Notion; supports transfer to human agent, end call, and real-time team notifications.
Voice & Telephony: Offers over 80 built-in voices; supports brand voice cloning with just 2 minutes of audio; can obtain a free phone number provided by xAI, or connect to existing PBX systems via SIP.
Transparent Pricing: Compute API fee is $0.05 per minute, with no additional platform fee; when using a phone number provided by xAI, an additional communication fee of $0.01 per minute is charged.
According to xAI's official announcement, Voice Agent Builder includes built-in monitoring mechanisms (Observability) and guardrails for enterprise users: each call is automatically recorded with a generated transcript; administrators can view the tools used by the AI during calls at any time; and strict dialogue boundaries can be set, such as prohibiting the AI from reading out customer credit card numbers or discussing off-topic political subjects.
xAI stated in the official announcement: "Listening with your ears is more accurate than looking at benchmarks—build an agent and call with your most difficult workflow to try it out."
According to xAI's official announcement, the compute API fee is $0.05 per minute, with no additional platform fee; if using a phone number provided for free by xAI, an additional communication fee of $0.01 per minute is charged.
According to xAI's official release, Grok Voice Think Fast 1.0 surpasses Google Gemini 3.1 Flash Live and OpenAI GPT Realtime 1.5 on the τ-voice Bench benchmark, ranking first on the leaderboard in both response speed and reasoning capability.
According to xAI's official announcement, the Voice Agent Builder Beta is now live on the xAI Console and open for trial.
Related News
Solana on-chain governance mechanism launched, proposals require 15% staked support to enter voting.
Claude Sonnet 5 launched, API pricing 60% cheaper than Opus
6 AI browsers tricked by '2+2=5' game, all SSH credentials leaked
Meituan LongCat-2.0 Open Source: 1.6 Trillion Parameters, No Need for NVIDIA GPU