How I Built a One-Person AI QA Agency with Skill Files and Local LLMs

There is a specific failure mode in AI-assisted QA work that most tooling discussions skip entirely, and it shows up earliest when you are working solo on a real engagement.

Every new chat session is stateless. You paste the ticket, describe the feature, explain your severity logic, set up the context, and by the time the AI is actually useful, you have rebuilt your methodology from scratch for the third time that week. This is not a workflow problem you fix with better prompts; it is an architecture problem, and the fix is a "Skill File."

A Skill File is a context document you load as a system prompt. It carries your test surface tiers, your three-path testing framework, your bug report format, your severity and priority logic, your Playwright conventions, and an explicit definition of what the AI can and cannot call. Load it once per session, and the AI operates inside your methodology from the first message instead of starting from a blank slate.

The local LLM layer solves a different problem: data privacy. On freelance or retainer engagements, tickets contain real product logic and sensitive client data. Sending that to cloud APIs on every session risks exposure. Running Ollama locally with the same Skill File as system context keeps the engagement data on your machine. For the output quality required on QA tasks, current 7B to 14B models are highly sufficient, making local LLMs a cost-effective infrastructure rather than a pay-per-session service.

The workflow uses a three-role setup: the engineer as the judgment layer, the cloud AI (loaded with the Skill File) for complex reasoning and active session output, and the local LLM for lightweight tasks and sensitive client data work. The Skill File remains the constant across all three.

One crucial shift to internalize is that AI dev teams already run their own automated QA layers—including linters, unit tests, and agent-generated Playwright scripts before a PR opens. By the time you review a ticket, the obvious paths have already been tested. Your human QA work starts where the agent's testing ends: testing what was built against what a real user would expect, not just what the technical specification dictated. Those two things are rarely the same.

The billing feature case in the post illustrates this perfectly. Every acceptance criterion passed, and the code was technically correct. Yet, there was no billing history or indication of recurring charges. A real user hitting that screen would have no idea what happened to their money. The agent built exactly what the ticket specified, but the ticket failed to specify the contextual orientation users need. That is where human intuition and systemic QA methodology shine.

[AgentUpdate Depth Analysis]

This practical case highlights a major paradigm shift in the AI Agent ecosystem: transitioning from ephemeral prompting to structured, reusable agent assets (Skill Files). In specialized domains like QA, stateless interactions bottleneck efficiency. By packaging testing methodologies, standards, and behavioral boundaries into a structured Skill File, developers essentially equip LLMs with specialized "firmware." Combined with local runtimes like Ollama, this hybrid cloud-local setup elegantly addresses the enterprise dual-challenge of data privacy and API marginal costs. As tools and protocols like MCP (Model Context Protocol) mature, this architecture—coupling local private data processing with cloud-based complex reasoning via shared context files—will define the future standard for production-grade AI Agent workflows.

How I Built a One-Person AI QA Agency with Skill Files and Local LLMs

Next Stories to Read

5 Anthropic Prompt Caching Patterns to Slash Your API Bill by 70%

Which AI to Choose in 2026? Claude, Perplexity, Gemini, and ChatGPT Compared

96 Hours of Autonomous Bounty Hunting: My AI Agent Earned $800 on GitHub

Related Tools & Resources

Skill Marketplaces

Anthropic Agent Skills

TokRepo

Skill Atlas