Gemini OpenAI-Compatible Format Guide
Fast-Token lets you call Google Gemini models through the OpenAI API protocol. If you already use the OpenAI SDK or /v1/chat/completions, you can usually switch to Gemini by changing only base_url and model—no need to rewrite your app for the native Gemini API.
This page focuses on how to use the compatible format. For request bodies, response fields, and interactive debugging, see the ChatGPT docs and the links in the capability table below.
Two ways to integrate
| Item | OpenAI-compatible format (this page) | Native Gemini API (other docs in this section) |
|---|---|---|
| Typical paths | POST /v1/chat/completions, POST /v1/embeddings | POST /v1beta/models/{model}:generateContent, etc. |
| Request body | OpenAI fields such as messages, model, stream | Gemini fields such as contents, generationConfig |
| Best for | Existing OpenAI clients, unified multi-model gateways, quick migration | Gemini-only features (thinkingConfig, Imagen-specific params, etc.) |
| Documentation | This page + Chat section | Endpoint pages under Chat, Images, Files, etc. in this section |
Both approaches share the same API Key and gateway URL. Billing follows the corresponding model in the Model Catalog.
Setup
Gateway and authentication
- Base URL:
https://fast-token.com/v1(same as Getting Started) - Auth: header
Authorization: Bearer <Fast-Token_API_KEY> - Model name: copy a model ID containing
geminifrom the Model Catalog into themodelfield
Using the OpenAI SDK (recommended)
Point the official SDK base_url at Fast-Token; everything else works like OpenAI:
from openai import OpenAI
client = OpenAI(
base_url="https://fast-token.com/v1",
api_key="<Fast-Token_API_KEY>",
)
completion = client.chat.completions.create(
model="gemini-2.5-pro", # 以模型广场为准
messages=[
{"role": "user", "content": "用一句话介绍你自己"},
],
)
print(completion.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://fast-token.com/v1",
apiKey: "<Fast-Token_API_KEY>",
});
const completion = await client.chat.completions.create({
model: "gemini-2.5-pro",
messages: [{ role: "user", content: "用一句话介绍你自己" }],
});
console.log(completion.choices[0].message.content);For streaming, set stream: true in the request. See Create chat completion (streaming).
Chat-compatible capabilities overview
All of the following use the OpenAI-compatible paths. In the console or Apifox they are often grouped under “chat-compatible format”. In practice most scenarios use the same POST /v1/chat/completions (embeddings use POST /v1/embeddings), distinguished by model and message structure.
| Capability | Description | Reference |
|---|---|---|
| Gemini image creation | Generate or edit images from text (and optional reference images) | Create chat image (non-streaming) |
| Chat | Standard multi-turn text; streaming and non-streaming | Non-streaming, Streaming |
| Chat — thinking 1 | Dialogue with model “thinking” output (variant 1) | Streaming (extra_body.enable_thinking) |
| Chat — thinking 2 | Dialogue with model “thinking” output (variant 2) | Same as above; exact models depend on the catalog |
| Vision (image understanding) | Upload images for description or Q&A | Vision (streaming), Vision (non-streaming) |
| Chat + file reading | Attach documents in chat for analysis | See “Files and multimodal” below |
| Text embeddings | Text to vectors | Create embeddings |
Model selection
Use the model ID from the Model Catalog for each scenario. Try Gemini models that support chat, vision, image generation, or embeddings as labeled; if you get “model not found” or unsupported capability, pick another Gemini entry for that scenario.
Usage notes by scenario
Standard chat
- Endpoint:
POST /v1/chat/completions - Use a
messagesarray for multi-turn chat;rolesupportssystem/user/assistant(same as OpenAI) - Non-streaming:
stream: falseor omitstream; streaming:stream: true - Common parameters (
temperature,max_tokens,top_p, etc.) behave like OpenAI; see Create chat completion (non-streaming)
Thinking mode (thinking 1 / thinking 2)
Some Gemini thinking models expose thinking output in streaming requests via an extension field:
{
"model": "gemini-2.5-pro",
"messages": [{ "role": "user", "content": "解释相对论的核心思想" }],
"stream": true,
"extra_body": {
"enable_thinking": true
}
}- Thinking 1 and thinking 2 map to different models or routes on the gateway (depth, display, etc.). Choose models marked for “thinking” in the catalog and test each
- For full
thinkingConfigcontrol (e.g. thinking token budget), use the native Gemini API docs in this section
Vision (image understanding)
In a user message, use a multimodal array in content: text + image_url (URL or Base64).
{
"model": "gemini-2.5-pro",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "这张图里有什么?" },
{
"type": "image_url",
"image_url": { "url": "https://example.com/photo.jpg" }
}
]
}
],
"stream": true
}For Base64 images see Create chat vision (streaming) Base64. For richer native params (inlineData, etc.) see Image understanding.
Image creation
- Use the chat endpoint with image-capable Gemini models for text-to-image and reference-based editing
- Describe the task in natural language in
messages; addtextandimage_urlincontentwhen you need a reference image - Request/response shape: Create chat image (non-streaming)
- For fine-grained aspect ratio and resolution, also see native Image generation
Chat + file reading
Under the OpenAI-compatible format you can include documents, PDFs, etc. as part of multimodal input (supported MIME types and size limits are in the model catalog):
- Prefer passing files in
messages[].contentusing OpenAI multimodal conventions (image_url, or platform-supported file URL / Base64) - For large files or complex layouts, use native Document understanding (
fileData/inlineData) and merge results into your chat flow in application code
Text embeddings
- Endpoint:
POST /v1/embeddings - Body:
model+input(string or array of strings), same as OpenAI Embeddings - Examples and fields: Create embeddings; object shape: Embedding object
- For Gemini-only options (
taskType,output_dimensionality, etc.) use Gemini native text embeddings
Choosing compatible vs native Gemini API
| Your need | Recommendation |
|---|---|
| Fast integration, reuse OpenAI SDK / existing code | OpenAI-compatible format (this page) |
Streaming thinking, generationConfig, Google Search Grounding | Native API (e.g. Text generation + thinking (stream), Google Search) |
| Imagen image gen, TTS, video/audio understanding | Native API sections |
| Chat + vision + image gen + embeddings with an OpenAI client | OpenAI-compatible format covers the main path |
FAQ
Q: Why doesn’t this match the OpenAI docs exactly?
A: The compatibility layer aligns request/response with OpenAI while Gemini runs underneath. Some OpenAI-only parameters may be ignored; follow what your model supports.
Q: What should I put in model?
A: Use the full model ID from the catalog (usually including a gemini prefix or suffix), not shorthand names.
Q: What is the streaming response format?
A: SSE with data: {...} lines and data: [DONE] at the end, same as OpenAI streaming Chat Completions; see Chat completion chunk object.
Q: Where is the full response JSON documented?
A: See Chat completion object.
Further reading
- Getting Started — first call and API Key
- API quick start — Base URL and client setup
- List models —
GET /v1/modelsfor available models - Other pages in this section — native Gemini REST API parameters and examples