Create Extended Thinking Chat

POST

/v1/messages

Anthropic Messages API Extended Thinking endpoint
Enable and control the thinking token budget via thinking.type: enabled and budget_tokens
Set stream: true for streaming output
Official docs: Extended Thinking

Authorizations

bearer

Type

HTTP (bearer)

Request Body

application/json

object

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Chat API.

System prompt to set assistant behavior.

object[]

Required

List of messages comprising the conversation so far. Python code example.

Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random; lower values (e.g. 0.2) make it more focused and deterministic. We generally recommend changing this or top_p, but not both.

Nucleus sampling alternative to temperature: the model considers tokens whose cumulative probability mass is within top_p. So 0.1 means only the top 10% probability mass. We generally recommend changing this or temperature, but not both.

Default 1
How many chat completion choices to generate for each input message.

Set to true for this Extended Thinking endpoint to stream thinking and reply content via SSE.

Defaults to null. Up to 4 sequences where the API stops generating further tokens.

Default inf
Maximum number of tokens to generate in the chat completion.

Total length of input and generated tokens is limited by the model context length. Python code example for counting tokens.

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, increasing the likelihood of new topics. More on frequency and presence penalties.

Default 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the text so far, reducing repetition. More on frequency and presence penalties.

Modify the likelihood of specified tokens appearing in the completion.
Accepts a JSON object mapping token IDs (from the tokenizer) to bias values from -100 to 100. Bias is added to logits before sampling; effect varies by model. Values between -1 and 1 should decrease or increase selection likelihood; -100 or 100 should disable or exclusively select the token.

Unique identifier for your end user. Helps OpenAI monitor and detect abuse. Learn more.

Object specifying the format the model must output. Setting { "type": "json_object" } enables JSON mode so the model message is valid JSON. Important: with JSON mode you must also instruct the model to produce JSON via system or user message; otherwise the model may stream whitespace until the token limit. If finish_reason="length", content may be truncated when generation exceeds max_tokens or context length.

Beta feature. If specified, the system will attempt deterministic sampling so repeated requests with the same seed and parameters return the same result. Determinism is not guaranteed; use the system_fingerprint response parameter to monitor backend changes.

List of tools the model may call. Currently only functions as tools are supported. Provide functions the model can generate JSON input for.

Controls which function (if any) the model calls. none means no function call, only a message. auto lets the model choose between message and function. Force a function with {"type": "function", "function": {"name": "my_function"}}. Defaults to none if no functions; auto if functions exist.

object

Extended thinking configuration; applies when type is enabled.

OpenAI Official Format

Chat Mode

Unified Standard API Format

Unified Standard Format

Chat Mode

OpenAI Format

Unified Standard API

OpenAI-Compatible Format

Replicate Official Format

OpenAI Compatible Format

Create Extended Thinking Chat

Authorizations

Request Body

Responses

Playground

Samples

Create Extended Thinking Chat​

Authorizations​

Request Body​

Responses​

Playground​

Samples​

Create Extended Thinking Chat

Authorizations

Request Body

Responses

Playground

Samples