With the rapid advancement of global AI model technology, accessing cutting-edge AI services has become increasingly subject to geographical and payment restrictions. For instance, OpenAI implemented IP blocking for mainland China starting June 2024, and providers like Anthropic similarly prohibit services to unsupported regions. Additionally, mainstream overseas model vendors typically require linking Visa or Mastercard, coupled with stringent billing address verification, effectively barring many developers.
It is within the confines of these restrictions that a multitude of "AI proxy services" (AI 中转站) have emerged. These services utilize overseas servers as a relay, employing RMB payments to bypass foreign credit card requirements, effectively acting as "purchasing agents" to deliver powerful AI capabilities to users in restricted areas. While initially operating in a grey area, this business model has now attracted significant market participation, including notable companies and individuals.
Technical Deep Dive: How AI Proxy Services Work
Technically, an AI proxy service functions as a reverse proxy server positioned between the user and the large language model (LLM) provider. User requests are first routed to the proxy, which then forwards them to official APIs from providers like OpenAI or Anthropic, retrieves the results, and delivers them back to the user.
Currently, the market for these proxy services can be broadly categorized into three types:
- Web Mirror Sites: Offering a direct web interface for immediate user access. These have low entry barriers but high opacity regarding request routing.
- API Aggregation and Distribution Platforms: Geared towards developers, these platforms normalize heterogeneous API interfaces from multiple models into a standard format, then resell access on a token-based billing model.
- Enterprise AI Gateways: Designed for large organizations, providing intelligent routing, full-link auditing, data anonymization, and permission control, exemplified by solutions like Portkey.
The underlying technical logic across these forms is shared. Many commercial platforms are built upon the open-source project One API, which boasts over 30,000 GitHub stars and serves as a de facto infrastructure for many proxy services. One API's core operational modules include:
- Protocol Standardization: The proxy deeply unpacks user requests at the application layer, extracts core elements, repacks them into the format required by the target model, and streams data blocks in real-time to maintain the "typewriter" effect.
- Token Billing Interception: During forwarding, the proxy intercepts returned data packets to tally actual token consumption, then charges users based on a customizable "model multiplier," allowing for differentiated pricing.
- Multi-Account Polling Pool: To circumvent official API rate limits, proxies maintain a large pool of underlying API keys and distribute traffic using a polling algorithm, seamlessly switching to the next available account when one is blocked or exhausted.
The continuously decreasing technical barrier means that setting up a commercial platform is remarkably simple, often requiring just an overseas server and a single Docker command to go live quickly, leading to a surge in market players.
Cost & Compliance Challenges: The Grey Areas of Upstream Resource Acquisition
The ability of proxy services to offer lower prices often stems from their access to low-cost computing resources upstream. This includes leveraging cloud provider new user free tiers, abusing educational email accounts for discounts, and bulk reselling enterprise account privileges on e-commerce platforms. More illicit methods involve bulk registration of fake accounts, fraudulent use of international credit cards, and even the theft of API keys.
Furthermore, with Anthropic's introduction of mandatory KYC (Know Your Customer) identity verification, a new supply chain for resources has emerged. Intermediaries recruit individuals in regions like Nigeria, Kenya, and Cambodia for a few dollars to take photos, collecting facial and identification data in bulk, which is then resold to developers in China at many times the original cost. This commoditization of biometric data, mirroring earlier black markets for iris scans, has prompted warnings from researchers about the potential for opening fraudulent financial accounts and other far-reaching security risks.
Quality and Trust Issues: Model Deception and Billing Irregularities
The quality and transparency of AI proxy services are increasingly problematic. In March 2026, CISPA Helmholtz Center for Information Security published a paper titled "Real Money, Fake Models: Deceptive Model Claims in Shadow APIs," providing the first systematic security audit of these proxy systems.
The study tracked and tested 17 proxy services cited in 187 academic papers, revealing alarming findings:
- High Model Identity Verification Failure: 45.83% of nodes failed model identity verification, indicating that they were not running the models they claimed.
- Significant Performance Degradation: In medical Q&A tests, Gemini-2.5-flash achieved an 83.82% accuracy rate via the official API, which plummeted to approximately 37% through shadow APIs. In legal reasoning tests, all tested proxies lagged behind official APIs by over 40 percentage points.
Common deceptive practices, or "bait-and-switch" tactics, include charging official prices while running cheaper open-source models (e.g., passing off a diluted Llama as GPT-5); replacing expensive new models with cheaper older ones while charging higher fees; and users paying premium prices only to invoke low-end models. The study concluded that price and quality are randomly distributed in the AI proxy market, and paying more does not guarantee protection against model substitution.
Beyond model deception, billing irregularities are also prevalent. A 2026 ACM Internet Measurement Conference paper, "Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways," found that some commercial gateways overcharged by 62.8% compared to expected calculations, yet their reported usage data showed no anomalies, leaving users unaware of the discrepancies.
Furthermore, some gateways engage in covert "context truncation." To save costs, they silently discard earlier content once message history exceeds an implicit threshold. Tests showed that in a 25-round conversation, models on certain gateways lost track of information set in round 10 by round 24. This implies that applications relying on long-document analysis or multi-turn dialogues could be operating in a degraded state for extended periods.
In summary, users accessing AI services through these proxy intermediaries may face inflated costs, substandard models, unreliable performance, and potential data security risks.