Kelly started as a weekend project — Austen Allred wanted an AI assistant to manage his inbox while his kids were snowed in. Within 48 hours, the vision had evolved into a fully autonomous software company builder.
Kelly runs on OpenClaw and can ideate, build, market, and ship iOS apps to the App Store — end to end — without human involvement. She was even the one who hired her own first employee (Jake).
The stack: a hierarchy of AI agents with Kelly as orchestrator, project leads managing each build, and dozens of sub-agents doing parallel work at every phase.
Scans App Store for apps with high demand but poor reviews — finds unmet demand gaps. Still the least developed of the three.
7-phase pipeline: Scaffold → Plan → Build → Test → QA → Ship → Resubmit. Plan takes longer than build. Up to 20 sub-agents work in parallel during the build phase.
App in → marketing out. Generates App Store screenshots, metadata, paid ad creatives, and social reels (UGC-style, ~$1/reel) for Instagram, YouTube, TikTok.
Git init, project structure, boilerplate setup
BMAD agents: PRD → UX mockups → Architecture → Stories. Longest phase.
10–20 sub-agents in parallel, story dependency graph, Amelia as primary dev
Lint, build checks, unit tests, accessibility, performance — all automated
Humans still required. Launch in Simulator, test all user flows manually
20-step App Store checklist via ASC CLI + browser automation
Daily cron checks rejections, routes back to correct phase, resubmits
PR tracker and gym logger
Reading habit tracker
AI menu analyzer using iOS vision models, works offline
Summarize anything into one sentence
Rock identifier — Pokédex for rocks
Fasting timer and tracker
9 total approved in App Store
| Scenario | Cost / day | Notes |
|---|---|---|
| Full steam, no optimization | $1,000 | Building 3–4 apps/day + full marketing, all heavy models |
| Smart model routing | $50–100 | Right model at right time, offload deterministic tasks |
| With further optimizations | ~$10 | Reduced browser use, local model experiments |
| Current (with secret tricks) | ~$1 | "Hacks we won't talk about publicly" — won't last forever |
"Kelly is written in markdown. Pretty much. It's a bunch of instructions to agents." The orchestration is in prompts, not code.
Separate agents cross-check each other and find issues the same agent creating the work would miss. Project Lead layer keeps Kelly's context clean.
Sonnet 3.5 vs Opus 4.6 can be 100x cost difference. Use the smallest model that works. Most tasks don't need the best model.
"If you know something is deterministic, do it the deterministic way." Offload lint, build checks, and CLI steps to shell scripts. Keep the LLM for ambiguous decisions.
A watchdog checking a 100K token context every 5 minutes = massive token burn. Scheduling is one of the biggest hidden costs.
Put critical instructions in AGENTS.md / TOOLS.md rather than relying on memory. Kelly's context window was 50% full on boot — by design.
Tried Anthropic for PM + Codex for backend + Gemini for design. "Complete nightmare." They didn't pass context between each other.
App Store accounts, 2FA, LLCs — Kelly can't register as a legal entity. Cameras pointed at phones to handle 2FA. Kelly hesitated to click the "I am not a robot" box.
Our corporate training cohorts come in with "here's our quarterly roadmap." We start Monday and we're done with our quarterly roadmap by Tuesday.
If you can make something work on Sonnet 3.5 versus Opus 4.6, it can be literally one one-hundredth the cost.
The agents are ready. The environment you need to actually trust them has been missing.
Kelly is written in Markdown. Pretty much. It's a bunch of instructions to agents.
We've got nine apps approved in the App Store... and $144 in revenue. Don't clap for that. We're getting there.
The AI is not an idiot — it can just go do stuff. That's new. You didn't have to ask it to dangerously skip permission to do so.