GuidesVoiceVoice Agent

Grok Voice Agent API

Build interactive voice conversations with Grok models using WebSocket. The Grok Voice Agent API accepts audio and text inputs and creates text and audio responses in real-time.

WebSocket Endpoint: wss://api.x.ai/v1/realtime

Authentication

You can authenticate WebSocket connections using the xAI API key or an ephemeral token.

Important: Use ephemeral tokens for client-side authentication. If you use the API key directly in client-side code, it may be exposed.

Fetching Ephemeral Tokens

Set up a server endpoint to fetch ephemeral tokens from xAI:

Endpoint: POST https://api.x.ai/v1/realtime/client_secrets

Voice Options

The Grok Voice Agent API supports 5 different voice options:

VoiceTypeToneDescription
AraFemaleWarm, friendlyDefault voice, balanced and conversational
RexMaleConfident, clearProfessional and articulate
SalNeutralSmooth, balancedVersatile voice
EveFemaleEnergetic, upbeatEngaging and enthusiastic
LeoMaleAuthoritative, strongDecisive and commanding

Audio Format

Supported Audio Formats

FormatEncodingSample Rate
audio/pcmLinear16, Little-endianConfigurable (8000-48000 Hz)
audio/pcmuG.711 μ-law8000 Hz
audio/pcmaG.711 A-law8000 Hz

Default Audio Settings

  • Sample Rate: 24kHz
  • Channels: Mono
  • Encoding: Base64

Client Events

EventDescription
session.updateUpdate session configuration (voice, audio format, instructions)
input_audio_buffer.appendAppend base64-encoded audio chunks
conversation.item.commitCreate user message from audio buffer
conversation.item.createCreate user message with text
response.createRequest assistant response (manual VAD mode)

Server Events

EventDescription
session.updatedSession configuration acknowledged
conversation.createdConversation session created
input_audio_buffer.speech_startedVAD detected speech start
input_audio_buffer.speech_stoppedVAD detected speech end
response.output_audio.deltaAudio stream chunk
response.output_audio_transcript.deltaTranscript chunk
response.doneResponse completed

Using Tools

The Voice Agent supports:

  • Collections Search (file_search) - Search document collections
  • Web Search (web_search) - Search the web
  • X Search (x_search) - Search X posts
  • Custom Functions - Define function tools with JSON schemas

For complete API details, see the Voice API documentation.