Skip to content

Gemini OpenAI-Compatible Format Guide

Fast-Token lets you call Google Gemini models through the OpenAI API protocol. If you already use the OpenAI SDK or /v1/chat/completions, you can usually switch to Gemini by changing only base_url and model—no need to rewrite your app for the native Gemini API.

This page focuses on how to use the compatible format. For request bodies, response fields, and interactive debugging, see the ChatGPT docs and the links in the capability table below.

Two ways to integrate

ItemOpenAI-compatible format (this page)Native Gemini API (other docs in this section)
Typical pathsPOST /v1/chat/completions, POST /v1/embeddingsPOST /v1beta/models/{model}:generateContent, etc.
Request bodyOpenAI fields such as messages, model, streamGemini fields such as contents, generationConfig
Best forExisting OpenAI clients, unified multi-model gateways, quick migrationGemini-only features (thinkingConfig, Imagen-specific params, etc.)
DocumentationThis page + Chat sectionEndpoint pages under Chat, Images, Files, etc. in this section

Both approaches share the same API Key and gateway URL. Billing follows the corresponding model in the Model Catalog.

Setup

Gateway and authentication

  • Base URL: https://fast-token.com/v1 (same as Getting Started)
  • Auth: header Authorization: Bearer <Fast-Token_API_KEY>
  • Model name: copy a model ID containing gemini from the Model Catalog into the model field

Point the official SDK base_url at Fast-Token; everything else works like OpenAI:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://fast-token.com/v1",
    api_key="<Fast-Token_API_KEY>",
)

completion = client.chat.completions.create(
    model="gemini-2.5-pro",  # 以模型广场为准
    messages=[
        {"role": "user", "content": "用一句话介绍你自己"},
    ],
)
print(completion.choices[0].message.content)
javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://fast-token.com/v1",
  apiKey: "<Fast-Token_API_KEY>",
});

const completion = await client.chat.completions.create({
  model: "gemini-2.5-pro",
  messages: [{ role: "user", content: "用一句话介绍你自己" }],
});
console.log(completion.choices[0].message.content);

For streaming, set stream: true in the request. See Create chat completion (streaming).

Chat-compatible capabilities overview

All of the following use the OpenAI-compatible paths. In the console or Apifox they are often grouped under “chat-compatible format”. In practice most scenarios use the same POST /v1/chat/completions (embeddings use POST /v1/embeddings), distinguished by model and message structure.

CapabilityDescriptionReference
Gemini image creationGenerate or edit images from text (and optional reference images)Create chat image (non-streaming)
ChatStandard multi-turn text; streaming and non-streamingNon-streaming, Streaming
Chat — thinking 1Dialogue with model “thinking” output (variant 1)Streaming (extra_body.enable_thinking)
Chat — thinking 2Dialogue with model “thinking” output (variant 2)Same as above; exact models depend on the catalog
Vision (image understanding)Upload images for description or Q&AVision (streaming), Vision (non-streaming)
Chat + file readingAttach documents in chat for analysisSee “Files and multimodal” below
Text embeddingsText to vectorsCreate embeddings

Model selection

Use the model ID from the Model Catalog for each scenario. Try Gemini models that support chat, vision, image generation, or embeddings as labeled; if you get “model not found” or unsupported capability, pick another Gemini entry for that scenario.

Usage notes by scenario

Standard chat

  • Endpoint: POST /v1/chat/completions
  • Use a messages array for multi-turn chat; role supports system / user / assistant (same as OpenAI)
  • Non-streaming: stream: false or omit stream; streaming: stream: true
  • Common parameters (temperature, max_tokens, top_p, etc.) behave like OpenAI; see Create chat completion (non-streaming)

Thinking mode (thinking 1 / thinking 2)

Some Gemini thinking models expose thinking output in streaming requests via an extension field:

json
{
  "model": "gemini-2.5-pro",
  "messages": [{ "role": "user", "content": "解释相对论的核心思想" }],
  "stream": true,
  "extra_body": {
    "enable_thinking": true
  }
}
  • Thinking 1 and thinking 2 map to different models or routes on the gateway (depth, display, etc.). Choose models marked for “thinking” in the catalog and test each
  • For full thinkingConfig control (e.g. thinking token budget), use the native Gemini API docs in this section

Vision (image understanding)

In a user message, use a multimodal array in content: text + image_url (URL or Base64).

json
{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "这张图里有什么?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/photo.jpg" }
        }
      ]
    }
  ],
  "stream": true
}

For Base64 images see Create chat vision (streaming) Base64. For richer native params (inlineData, etc.) see Image understanding.

Image creation

  • Use the chat endpoint with image-capable Gemini models for text-to-image and reference-based editing
  • Describe the task in natural language in messages; add text and image_url in content when you need a reference image
  • Request/response shape: Create chat image (non-streaming)
  • For fine-grained aspect ratio and resolution, also see native Image generation

Chat + file reading

Under the OpenAI-compatible format you can include documents, PDFs, etc. as part of multimodal input (supported MIME types and size limits are in the model catalog):

  1. Prefer passing files in messages[].content using OpenAI multimodal conventions (image_url, or platform-supported file URL / Base64)
  2. For large files or complex layouts, use native Document understanding (fileData / inlineData) and merge results into your chat flow in application code

Text embeddings

Choosing compatible vs native Gemini API

Your needRecommendation
Fast integration, reuse OpenAI SDK / existing codeOpenAI-compatible format (this page)
Streaming thinking, generationConfig, Google Search GroundingNative API (e.g. Text generation + thinking (stream), Google Search)
Imagen image gen, TTS, video/audio understandingNative API sections
Chat + vision + image gen + embeddings with an OpenAI clientOpenAI-compatible format covers the main path

FAQ

Q: Why doesn’t this match the OpenAI docs exactly?
A: The compatibility layer aligns request/response with OpenAI while Gemini runs underneath. Some OpenAI-only parameters may be ignored; follow what your model supports.

Q: What should I put in model?
A: Use the full model ID from the catalog (usually including a gemini prefix or suffix), not shorthand names.

Q: What is the streaming response format?
A: SSE with data: {...} lines and data: [DONE] at the end, same as OpenAI streaming Chat Completions; see Chat completion chunk object.

Q: Where is the full response JSON documented?
A: See Chat completion object.

Further reading

  • Getting Started — first call and API Key
  • API quick start — Base URL and client setup
  • List modelsGET /v1/models for available models
  • Other pages in this section — native Gemini REST API parameters and examples