1000usdinchina.com Dev Retrospective (6) - LLM Trip-Planner on a Free Tier: Qwen + Workers AI

Q: Is Cloudflare Workers AI really free?

It has a generous free tier measured in daily neurons at the account level. Light structured-extraction workloads can run within it at no cost; heavy usage moves to paid.

Q: How do you keep an AI feature cheap?

Give the LLM a narrow job of parsing intent into structured fields, let a deterministic engine compute, run on the edge, and fall back to a non-AI path when quota is tight.

1000usdinchina.com lets you describe a trip in plain language and get a structured itinerary back. That AI planner runs on Cloudflare Workers AI with a Qwen model — and it costs roughly $0/month. This post is how it works, why Qwen, and the free-quota trap that once burned production's entire daily allowance during dev testing.

This is post 6 of the series.

What the AI planner does
Why Qwen on Workers AI
The free-quota trap
Two creation paths that must not diverge
The cost: roughly nothing
FAQ

What the AI planner does

The product has a structured form, but it also accepts free text: "10 days, land in Shanghai, love food and old towns, leave from Chengdu, mid budget." The LLM turns that into the same structured fields the form produces — cities, days, budget tier, interests — which then feed the exact same estimator.

flowchart LR
    U[Natural-language trip request] --> Q[Qwen on Workers AI]
    Q --> S[Structured seed: cities, days, tier, interests]
    S --> E[Same estimator as the form]
    E --> R[Costed itinerary]

The LLM is a front door, not the brain. It parses intent into structure; the deterministic estimator does the costing. That keeps the AI's job small, cheap, and verifiable.

Why Qwen on Workers AI

Three reasons:

It's built into the edge. Workers AI runs models next to the Worker — no separate API vendor, no second SDK, no extra latency hop. The planner is already on Cloudflare; the model is too.
Qwen handles Chinese well. The domain is Chinese cities, cuisines, and place names. A model strong in Chinese parses "古镇" or "成都出发" correctly.
The free tier is generous enough. For a planner that runs a short structured-extraction prompt per request, the included allowance covers real usage at $0.

The free-quota trap

Here's the expensive lesson. Cloudflare Workers AI's free tier is a daily, account-level budget — measured in "neurons" (roughly 10k/day). Account-level is the trap: dev and production share the same daily pool.

flowchart TD
    A[Daily free quota: ~10k neurons, account-wide] --> B[Dev: probing the real model repeatedly]
    A --> C[Prod: real users' AI plans]
    B -->|burns the shared pool| D[Quota exhausted]
    C -->|hits empty pool| E[AiError 4006: production planner down]

I was testing the planner against the real model in dev — firing prompt after prompt to tune it — and exhausted the day's neurons. Production then hit an empty pool and started returning AiError 4006. Real users' AI planning went down because of my dev probing.

The fixes that follow from understanding it's an account-wide daily budget:

Don't hammer the real model in dev; cache responses and probe sparingly.
Treat the daily quota as a shared, finite resource across all environments.
Degrade gracefully: if the LLM is unavailable, fall back to the structured form path so the product still works.

Two creation paths that must not diverge

The product now has two ways to create a trip: the form and the AI chat. The discipline: any rule about routes, cities, or days must apply to both paths, and the LLM must emit the same structured seed fields the form produces. Otherwise the two paths fork — the AI plan obeys different rules than the form, and the estimator gets inconsistent input. One estimator, two front doors, identical structured contract between them.

The cost: roughly nothing

Because the LLM does one small structured-extraction call per request, and the planner falls back to the form when quota is tight, the monthly bill for the AI feature is about $0. The takeaway for indie builders: a differentiating AI feature doesn't require a big model bill — keep the LLM's job narrow (parse, don't compute), run it on the edge, and respect the quota as a shared resource.

Key takeaways

Use the LLM as a front door that parses intent into structure; let a deterministic engine do the actual work — cheaper and verifiable.
Cloudflare Workers AI's free tier is an account-wide daily budget; dev and prod share it.
Don't probe the real model hard in dev — you can starve production (AiError 4006).
Keep form and AI creation paths on one identical structured contract so they never diverge.
A meaningful AI feature can cost ~$0 if its job stays narrow and edge-native.

FAQ

Is Cloudflare Workers AI really free? It has a generous free tier measured in daily "neurons" at the account level. Light, structured-extraction workloads can run within it at no cost; heavy usage moves to paid.

Why use Qwen for a China travel app? Qwen handles Chinese place names, cuisines, and phrasing well, which matters when the domain is Chinese cities — and it's available directly on Workers AI at the edge.

What causes AiError 4006? Exhausting the account-level daily quota. Since dev and prod share the pool, aggressive dev testing against the real model can take production down. Cache and probe sparingly.

How do you keep an AI feature cheap? Give the LLM a narrow job (parse intent into structured fields), let a deterministic engine do the computation, run on the edge, and fall back to a non-AI path when quota is tight.

Next → Shipping a solo edge app with confidence: CI/CD, real-D1 tests, Lighthouse gates