Reasoning

grok-4-fast-non-reasoning variant is based on grok-4-fast-reasoning with reasoning disabled.

presencePenalty, frequencyPenalty and stop parameters are not supported by reasoning models. Adding them in the request would result in an error.

Key Features

Think Before Responding: Thinks through problems step-by-step before delivering an answer.
Math & Quantitative Strength: Excels at numerical challenges and logic puzzles.
Reasoning Trace: The model’s thoughts are available via the reasoning_content or encrypted_content field in the response completion object (see example below).

You can access the model’s raw thinking trace via the message.reasoning_content of the chat completion response. Only grok-3-mini returns reasoning_content.

grok-3, grok-4 and grok-4-fast-reasoning do not return reasoning_content. It may optionally return encrypted reasoning content instead.

Encrypted Reasoning Content

For grok-4, the reasoning content is encrypted by us and sent back if use_encrypted_content is set to true. You can send the encrypted content back to provide more context to a previous conversation. See Stateful Response with Responses API for more details on how to use the content.

Control how hard the model thinks

reasoning_effort is not supported by grok-3, grok-4 and grok-4-fast-reasoning. Specifying reasoning_effort parameter will get an error response. Only grok-3-mini supports reasoning_effort.

The reasoning_effort parameter controls how much time the model spends thinking before responding. It must be set to one of these values:

low: Minimal thinking time, using fewer tokens for quick responses.
high: Maximum thinking time, leveraging more tokens for complex problems.

Choosing the right level depends on your task: use low for simple queries that should complete quickly, and high for harder problems where response latency is less important.

Usage Example

Here’s a simple example using grok-3-mini to multiply 101 by 3.

import os

from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client(
    api_key=os.getenv("XAI_API_KEY"),
    timeout=3600, # Override default timeout with longer timeout for reasoning models
)

chat = client.chat.create(
    model="grok-3-mini",
    reasoning_effort="high",
    messages=[system("You are a highly intelligent AI assistant.")],
)
chat.append(user("What is 101\*3?"))

response = chat.sample()

print("Final Response:")
print(response.content)

print("Number of completion tokens:")
print(response.usage.completion_tokens)

print("Number of reasoning tokens:")
print(response.usage.reasoning_tokens)

import os
import httpx
from openai import OpenAI

messages = [
{
    "role": "system",
    "content": "You are a highly intelligent AI assistant.",
},
{
    "role": "user",
    "content": "What is 101*3?",
},
]

client = OpenAI(
    base_url="https://api.x.ai/v1",
    api_key=os.getenv("XAI_API_KEY"),
    timeout=httpx.Timeout(3600.0), # Override default timeout with longer timeout for reasoning models
)

completion = client.chat.completions.create(
    model="grok-3-mini",
    reasoning_effort="high",
    messages=messages,
)

print("Final Response:")
print(completion.choices[0].message.content)

print("Number of completion tokens:")
print(completion.usage.completion_tokens)

print("Number of reasoning tokens:")
print(completion.usage.completion_tokens_details.reasoning_tokens)

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "<api key>",
    baseURL: "https://api.x.ai/v1",
    timeout: 360000, // Override default timeout with longer timeout for reasoning models
});

const completion = await client.chat.completions.create({
    model: "grok-3-mini",
    reasoning_effort: "high",
    messages: [
        {
            "role": "system",
            "content": "You are a highly intelligent AI assistant.",
        },
        {
            "role": "user",
            "content": "What is 101*3?",
        },
    ],
});

console.log("\\nFinal Response:", completion.choices[0].message.content);

console.log("\\nNumber of completion tokens (input):", completion.usage.completion_tokens);

console.log("\\nNumber of reasoning tokens (input):", completion.usage.completion_tokens_details.reasoning_tokens);

import { xai } from '@ai-sdk/xai';
import { generateText } from 'ai';

const result = await generateText({
  model: xai('grok-4'),
  system: 'You are a highly intelligent AI assistant.',
  prompt: 'What is 101*3?',
});

console.log('Final Response:', result.text);
console.log('Number of completion tokens:', result.totalUsage.completionTokens);
console.log('Number of reasoning tokens:', result.totalUsage.reasoningTokens);

curl https://api.x.ai/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer $XAI_API_KEY" \\
-m 3600 \\
-d '{
    "messages": [
        {
            "role": "system",
            "content": "You are a highly intelligent AI assistant."
        },
        {
            "role": "user",
            "content": "What is 101*3?"
        }
    ],
    "model": "grok-3-mini",
    "reasoning_effort": "high",
    "stream": false
}'

Sample Output


Final Response:
The result of 101 multiplied by 3 is 303.

Number of completion tokens:
14

Number of reasoning tokens:
310

Notes on Consumption

When you use a reasoning model, the reasoning tokens are also added to your final consumption amount. The reasoning token consumption will likely increase when you use a higher reasoning_effort setting.

Streaming Responses Function Calling