Gemini OpenAI-Compatible Format Guide

Fast-Token lets you call Google Gemini models through the OpenAI API protocol. If you already use the OpenAI SDK or /v1/chat/completions, you can usually switch to Gemini by changing only base_url and model—no need to rewrite your app for the native Gemini API.

This page focuses on how to use the compatible format. For request bodies, response fields, and interactive debugging, see the ChatGPT docs and the links in the capability table below.

Two ways to integrate

Item	OpenAI-compatible format (this page)	Native Gemini API (other docs in this section)
Typical paths	`POST /v1/chat/completions`, `POST /v1/embeddings`	`POST /v1beta/models/{model}:generateContent`, etc.
Request body	OpenAI fields such as `messages`, `model`, `stream`	Gemini fields such as `contents`, `generationConfig`
Best for	Existing OpenAI clients, unified multi-model gateways, quick migration	Gemini-only features (`thinkingConfig`, Imagen-specific params, etc.)
Documentation	This page + Chat section	Endpoint pages under Chat, Images, Files, etc. in this section

Both approaches share the same API Key and gateway URL. Billing follows the corresponding model in the Model Catalog.

Setup

Gateway and authentication

Base URL: https://fast-token.com/v1 (same as Getting Started)
Auth: header Authorization: Bearer <Fast-Token_API_KEY>
Model name: copy a model ID containing gemini from the Model Catalog into the model field

Using the OpenAI SDK (recommended)

Point the official SDK base_url at Fast-Token; everything else works like OpenAI:

python

from openai import OpenAI

client = OpenAI(
    base_url="https://fast-token.com/v1",
    api_key="<Fast-Token_API_KEY>",
)

completion = client.chat.completions.create(
    model="gemini-2.5-pro",  # 以模型广场为准
    messages=[
        {"role": "user", "content": "用一句话介绍你自己"},
    ],
)
print(completion.choices[0].message.content)

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://fast-token.com/v1",
  apiKey: "<Fast-Token_API_KEY>",
});

const completion = await client.chat.completions.create({
  model: "gemini-2.5-pro",
  messages: [{ role: "user", content: "用一句话介绍你自己" }],
});
console.log(completion.choices[0].message.content);

For streaming, set stream: true in the request. See Create chat completion (streaming).

Chat-compatible capabilities overview

All of the following use the OpenAI-compatible paths. In the console or Apifox they are often grouped under “chat-compatible format”. In practice most scenarios use the same POST /v1/chat/completions (embeddings use POST /v1/embeddings), distinguished by model and message structure.

Capability	Description	Reference
Gemini image creation	Generate or edit images from text (and optional reference images)	Create chat image (non-streaming)
Chat	Standard multi-turn text; streaming and non-streaming	Non-streaming, Streaming
Chat — thinking 1	Dialogue with model “thinking” output (variant 1)	Streaming (`extra_body.enable_thinking`)
Chat — thinking 2	Dialogue with model “thinking” output (variant 2)	Same as above; exact models depend on the catalog
Vision (image understanding)	Upload images for description or Q&A	Vision (streaming), Vision (non-streaming)
Chat + file reading	Attach documents in chat for analysis	See “Files and multimodal” below
Text embeddings	Text to vectors	Create embeddings

Model selection

Use the model ID from the Model Catalog for each scenario. Try Gemini models that support chat, vision, image generation, or embeddings as labeled; if you get “model not found” or unsupported capability, pick another Gemini entry for that scenario.

Usage notes by scenario

Standard chat

Endpoint: POST /v1/chat/completions
Use a messages array for multi-turn chat; role supports system / user / assistant (same as OpenAI)
Non-streaming: stream: false or omit stream; streaming: stream: true
Common parameters (temperature, max_tokens, top_p, etc.) behave like OpenAI; see Create chat completion (non-streaming)

Thinking mode (thinking 1 / thinking 2)

Some Gemini thinking models expose thinking output in streaming requests via an extension field:

json

{
  "model": "gemini-2.5-pro",
  "messages": [{ "role": "user", "content": "解释相对论的核心思想" }],
  "stream": true,
  "extra_body": {
    "enable_thinking": true
  }
}

Thinking 1 and thinking 2 map to different models or routes on the gateway (depth, display, etc.). Choose models marked for “thinking” in the catalog and test each
For full thinkingConfig control (e.g. thinking token budget), use the native Gemini API docs in this section

Vision (image understanding)

In a user message, use a multimodal array in content: text + image_url (URL or Base64).

json

{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "这张图里有什么？" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/photo.jpg" }
        }
      ]
    }
  ],
  "stream": true
}

For Base64 images see Create chat vision (streaming) Base64. For richer native params (inlineData, etc.) see Image understanding.

Image creation

Use the chat endpoint with image-capable Gemini models for text-to-image and reference-based editing
Describe the task in natural language in messages; add text and image_url in content when you need a reference image
Request/response shape: Create chat image (non-streaming)
For fine-grained aspect ratio and resolution, also see native Image generation

Chat + file reading

Under the OpenAI-compatible format you can include documents, PDFs, etc. as part of multimodal input (supported MIME types and size limits are in the model catalog):

Prefer passing files in messages[].content using OpenAI multimodal conventions (image_url, or platform-supported file URL / Base64)
For large files or complex layouts, use native Document understanding (fileData / inlineData) and merge results into your chat flow in application code

Text embeddings

Endpoint: POST /v1/embeddings
Body: model + input (string or array of strings), same as OpenAI Embeddings
Examples and fields: Create embeddings; object shape: Embedding object
For Gemini-only options (taskType, output_dimensionality, etc.) use Gemini native text embeddings

Choosing compatible vs native Gemini API

Your need	Recommendation
Fast integration, reuse OpenAI SDK / existing code	OpenAI-compatible format (this page)
Streaming thinking, `generationConfig`, Google Search Grounding	Native API (e.g. Text generation + thinking (stream), Google Search)
Imagen image gen, TTS, video/audio understanding	Native API sections
Chat + vision + image gen + embeddings with an OpenAI client	OpenAI-compatible format covers the main path

FAQ

Q: Why doesn’t this match the OpenAI docs exactly?
A: The compatibility layer aligns request/response with OpenAI while Gemini runs underneath. Some OpenAI-only parameters may be ignored; follow what your model supports.

Q: What should I put in model?
A: Use the full model ID from the catalog (usually including a gemini prefix or suffix), not shorthand names.

Q: What is the streaming response format?
A: SSE with data: {...} lines and data: [DONE] at the end, same as OpenAI streaming Chat Completions; see Chat completion chunk object.

Q: Where is the full response JSON documented?
A: See Chat completion object.

OpenAI Official Format

Chat Mode

Unified Standard API Format

Unified Standard Format

Chat Mode

OpenAI Format

Unified Standard API

OpenAI-Compatible Format

Replicate Official Format

OpenAI Compatible Format

Gemini OpenAI-Compatible Format Guide

Two ways to integrate

Setup

Gateway and authentication

Using the OpenAI SDK (recommended)

Chat-compatible capabilities overview

Usage notes by scenario

Standard chat

Thinking mode (thinking 1 / thinking 2)

Vision (image understanding)

Image creation

Chat + file reading

Text embeddings

Choosing compatible vs native Gemini API

FAQ

Further reading

Gemini OpenAI-Compatible Format Guide ​

Two ways to integrate ​

Setup ​

Gateway and authentication ​

Using the OpenAI SDK (recommended) ​

Chat-compatible capabilities overview ​

Usage notes by scenario ​

Standard chat ​

Thinking mode (thinking 1 / thinking 2) ​

Vision (image understanding) ​

Image creation ​

Chat + file reading ​

Text embeddings ​

Choosing compatible vs native Gemini API ​

FAQ ​

Further reading ​

Gemini OpenAI-Compatible Format Guide

Two ways to integrate

Setup

Gateway and authentication

Using the OpenAI SDK (recommended)

Chat-compatible capabilities overview

Usage notes by scenario

Standard chat

Thinking mode (thinking 1 / thinking 2)

Vision (image understanding)

Image creation

Chat + file reading

Text embeddings

Choosing compatible vs native Gemini API

FAQ

Further reading