Grok Voice Agent API

Build interactive voice conversations with Grok models using WebSocket. The Grok Voice Agent API accepts audio and text inputs and creates text and audio responses in real-time.

WebSocket Endpoint: wss://api.x.ai/v1/realtime

Authentication

You can authenticate WebSocket connections using the xAI API key or an ephemeral token.

Important: Use ephemeral tokens for client-side authentication. If you use the API key directly in client-side code, it may be exposed.

Fetching Ephemeral Tokens

Set up a server endpoint to fetch ephemeral tokens from xAI:

Endpoint: POST https://api.x.ai/v1/realtime/client_secrets

Voice Options

The Grok Voice Agent API supports 5 different voice options:

Voice	Type	Tone	Description
Ara	Female	Warm, friendly	Default voice, balanced and conversational
Rex	Male	Confident, clear	Professional and articulate
Sal	Neutral	Smooth, balanced	Versatile voice
Eve	Female	Energetic, upbeat	Engaging and enthusiastic
Leo	Male	Authoritative, strong	Decisive and commanding

Audio Format

Supported Audio Formats

Format	Encoding	Sample Rate
`audio/pcm`	Linear16, Little-endian	Configurable (8000-48000 Hz)
`audio/pcmu`	G.711 μ-law	8000 Hz
`audio/pcma`	G.711 A-law	8000 Hz

Default Audio Settings

Sample Rate: 24kHz
Channels: Mono
Encoding: Base64

Client Events

Event	Description
`session.update`	Update session configuration (voice, audio format, instructions)
`input_audio_buffer.append`	Append base64-encoded audio chunks
`conversation.item.commit`	Create user message from audio buffer
`conversation.item.create`	Create user message with text
`response.create`	Request assistant response (manual VAD mode)

Server Events

Event	Description
`session.updated`	Session configuration acknowledged
`conversation.created`	Conversation session created
`input_audio_buffer.speech_started`	VAD detected speech start
`input_audio_buffer.speech_stopped`	VAD detected speech end
`response.output_audio.delta`	Audio stream chunk
`response.output_audio_transcript.delta`	Transcript chunk
`response.done`	Response completed

Using Tools

The Voice Agent supports:

Collections Search (file_search) - Search document collections
Web Search (web_search) - Search the web
X Search (x_search) - Search X posts
Custom Functions - Define function tools with JSON schemas

For complete API details, see the Voice API documentation.

Voice Overview