FAQ
Check this page first for common questions.
1. Model usage and security
Does Fast-Token store user API request content?
Fast-Token does not store any request content you send through the API, nor does it log response content returned by models. Fast-Token acts only as a relay channel to securely forward your requests to the appropriate model providers and return their responses to you unchanged.
Why do official Claude, GPT, Qwen products differ from API results?
The underlying models are the same; official products add extra engineering (for example built-in prompts).
- The web UI is like a fully furnished home with search, memory, calculators, system prompts, and more built in.
- API calls are like an unfinished shell: core capability only; developers must configure context and tools themselves.
Why is the GPT-5 family not recommended for translation tools?
GPT-5 models are reasoning models designed for complex reasoning and structured generation, not high-frequency real-time tasks.
Reasons
- Slower calls (more reasoning steps).
- Higher token usage (longer system prompts and reasoning context).
- Translation plugins may trigger safety policies unintentionally.
For translation or chat, prefer lightweight models such as GPT-4o mini or Gemini for faster, more stable responses.
Why does GPT-5 sometimes say "I am GPT-4" when asked who you are?
This is a language-model hallucination: the model may incorrectly describe its base model, origin, or capabilities. With GPT-4, GPT-5, Claude, and similar LLMs, developers may see confident but wrong self-identification.
Notes
- This is not the platform deliberately altering or swapping model output; it is normal LLM behavior.
- GPT-5 was not given the name "GPT-5" during training; the name was defined by the vendor after training.
- The model does not know its own name or knowledge cutoff; the OpenAI web product answers correctly because of built-in system prompts. We provide the official API, not the web product.
- Asking identity via the API can yield random, inaccurate answers because the model has no true self-awareness.
What if calls to models like Gemini-3-Pro often time out?
Increase your timeout. Gemini-3-Pro is a large model with long reasoning time; on complex tasks responses can exceed 30 seconds, so a 30s default often causes timeouts.
- If you must use Gemini-3-Pro, extend the timeout appropriately.
- If you need faster responses, use a lighter model such as Gemini 2.0, which suits shorter timeout settings.
Why did a single "hello" use many tokens?
Some third-party tools (e.g. Cline, Claude Code) automatically attach context or system prompts; that hidden content counts toward token usage.
Even if you only type "hello", the backend request may include long conversation history or preset text. That extra content comes from the tool, not from Fast-Token.
What are API concurrency limits?
There is currently no unified concurrency cap on the platform. If you hit concurrency issues, contact support.
Why do the same prompts produce different outputs each time?
LLMs use probabilistic sampling (temperature, top-p, etc.) and randomly choose among likely tokens.
- For more stable output, lower temperature or disable sampling.
- Differences may also come from context, system prompts, or network conditions.
Why do Claude answers sometimes stop early?
For Claude, Fast-Token supports two call styles:
- OpenAI Chat–compatible API
- Anthropic Claude native API
When calling Claude via the OpenAI Chat–compatible API, the default is max_tokens=4096. If you do not set a higher max_tokens, the model stops when that limit is reached. "Incomplete" answers are usually the default length cap, not a model fault.
How to generate longer text
On the OpenAI Chat–compatible API, set a larger max_tokens, for example:
completion = client.chat.completions.create(
model="claude-sonnet-4-6",
max_tokens=6000,
messages=[
{
"role": "assistant",
"content": "总是用中文回复"
},
{
"role": "user",
"content": "What is the meaning of life?, over 6000 words"
},
]
)max_tokens must not exceed the model’s maximum. If output is still truncated after raising the limit, share the model name and full request parameters for further investigation.
2. API calls and data
Which API endpoints are available?
A unified gateway compatible with major model conventions:
- OpenAI-style endpoint: https://fast-token.com/v1 (GPT and compatible models)
- Claude relay endpoint: https://fast-token.com (Anthropic SDK–compatible)
What data is recorded during API use?
We record only what is needed: account info, call logs, models used, token usage, and billing.
Privacy
- We do not store user inputs or model outputs.
- Data is used for billing and service improvement only, not content analysis or sharing with third parties.
- Fast-Token does not retain specific request payloads; if underlying cloud or model providers log access for security or compliance, that data is governed by their privacy policies.
3. Model knowledge and common phenomena
What is AI hallucination?
AI hallucination is when an LLM produces information that is false, unsupported, or fabricated.
Possible causes
- Training data bias or gaps.
- Overfitting.
- Randomness during generation.
Hallucination is common to all large language models, not a system failure.
4. Usage and troubleshooting
How do I monitor API usage and spend?
Use the Fast-Token console for call volume, token usage, and billing details.
You can break down by model and time range to optimize usage and cost.
What if a call fails or returns an error?
API errors include a code and message.
Common causes:
- Invalid request format.
- Model unavailable or over quota.
How do I manage API keys?
Generate, revoke, or rotate API keys in the console.
Security tips
- Do not expose API keys in public environments.
- Use separate keys per project.
- Rotate keys regularly.
5. Getting started and billing
How does the relay billing work?
- Multiple billing modes: per request, per token, and others.
- Real-time display of API usage and charges.
Which programming languages are supported?
Our API is RESTful; any language that can send HTTP requests works, including Python, JavaScript, Java, Go, PHP, C#, and more.
How do I migrate existing code?
Replace your original API base URL with our relay URL; keep other parameters the same. For example:
// Original
https://api.openai.com/v1/chat/completions
// Replace with
https://fast-token.com/v1/chat/completionsMost client libraries only need baseURL and API key updated for a seamless switch.
What if API requests fail?
Common causes and fixes:
- Authentication error: verify the API key.
- Insufficient balance: top up your account.
- Parameter error: check request parameters in the docs.
- Model unavailable: try another model.
- Timeout: network or load; retry later.
Contact online support if the issue persists.
How do I view call logs and usage?
After login, open Usage logs for call history: time, model, tokens consumed, and cost.
How is data security ensured?
- We do not store your request or response bodies.
- All API traffic uses TLS.
- Strict access control and permissions.
- Regular security audits and vulnerability scans.
How do I get help?
- Read the developer documentation.
- Contact online support.
Are there sample code or SDKs?
We provide examples and SDKs for Python, Node.js, Java, and more; see Documentation at the top of the site.