The Complete Guide to Codex Superpowers: 14 Skills That Make AI Follow Industrial-Grade Development Workflows

Codex Superpowers turns AI coding agents into disciplined dev teams with 14 composable skills.
Codex Superpowers is a methodology that decomposes professional software development into 14 composable skills for AI coding agents. Its seven-step workflow enforces planning before execution through Socratic questioning, isolated Git worktrees, sub-agent-driven development, TDD, and systematic code review. A real-world case study shows it building a complete WeChat Mini Program without manual coding.
What Is Codex Superpowers
Superpowers is a complete software development methodology designed specifically for coding agents. It's not a single skill but rather a decomposition of a top-tier programming team's capabilities into 14 composable Skills, covering the entire pipeline from brainstorming and plan writing to code review and test-driven development.
Coding agents represent one of the most significant paradigm shifts in AI-assisted programming since 2024. Unlike earlier code completion tools (such as GitHub Copilot's inline completion mode), coding agents possess the ability to autonomously plan, execute, and verify — functioning like a junior developer who can independently handle the complete workflow from requirements analysis to code commits. Products like OpenAI's Codex, Anthropic's Claude Code, and Devin all fall into this category. The key distinction: code completion is reactive, while agents are proactive.
Its core philosophy is: plan first, execute second. When you present a requirement to the AI, it won't jump straight into writing code. Instead, it asks questions step by step to clarify your needs, organizes its thinking, formulates a plan, and then executes incrementally. This "Socratic" design-refinement process can produce quite solid results even if you have zero coding knowledge.
The Socratic method originates from the ancient Greek philosopher Socrates' dialectical teaching approach, where the core idea is guiding others to discover answers through continuous questioning rather than providing conclusions directly. In the software engineering context, this means the AI won't generate code based on vague requirements. Instead, it eliminates ambiguity and gaps through layered questioning. In traditional software development, this corresponds to the role of a Business Analyst — statistics show that over 60% of software project failures originate from requirements-phase issues, not from coding. Superpowers systematically embeds this insight into the AI workflow.
The Seven-Step Workflow Explained
The official Superpowers GitHub documentation outlines a complete seven-step workflow. This is a mandatory process, not a suggestion — the agent checks for relevant skills before executing any task and strictly follows the workflow.
Step 1: Brainstorming
Before writing any code, the brainstorming phase kicks in. The AI refines your initial ideas through continuous questioning, explores various viable approaches, presents designs in stages for validation, and ultimately saves the design document. Throughout the process, it continuously confirms the details of your requirements to ensure the direction is correct.
Step 2: Create an Isolated Environment Using GitHub Work Tree
Once the design is approved, the AI creates an isolated workspace on a new branch, runs the project setup, and verifies that the test baseline is clean. This step ensures the independence and traceability of the development environment.
Git Worktree is an advanced feature introduced in Git 2.5 that allows multiple working directories to be checked out simultaneously within the same repository, with each directory corresponding to a different branch. Unlike traditional git checkout branch switching, worktree creates physically isolated workspaces that don't interfere with each other. This is critical for AI coding agents — when the agent is doing exploratory development, worktree ensures the absolute safety of the main branch. If the generated code isn't satisfactory, you simply delete the work tree without leaving any contamination. This is also why enterprise CI/CD pipelines increasingly use worktree for parallel builds.
Step 3: Writing Plans
Based on the approved design, the AI breaks the work down into manageable small tasks, each taking approximately 20–50 minutes, with precise file paths, complete code snippets, and verification steps.

Step 4: Sub-Agent-Driven Development
Based on the plan, a new sub-agent is dispatched for each task, followed by a two-stage review: first checking specification compliance, then reviewing code quality. This closely mirrors the Code Review process in real-world development.
The sub-agent pattern is the mainstream architecture for implementing complex task decomposition in current AI systems. The main agent handles global planning and task allocation, while sub-agents focus on executing specific atomic tasks. This design borrows from microservices architecture — each sub-agent has its own independent context window and execution environment, preventing attention dilution caused by overly long global context. Under the hood, OpenAI's Codex employs a similar architecture: each subtask executes in an independent sandbox container, ensuring both security isolation and improved efficiency through parallel execution. The two-stage review (specification compliance + code quality) simulates the dual gatekeeping mechanism of a Tech Lead and Senior Developer in mature development teams.
Step 5: Test-Driven Development (TDD)
The classic "Red-Green-Refactor" cycle is enforced:
- Red: Write a failing test
- Green: Write the minimum code to make the test pass
- Refactor: Optimize the code structure and commit
This TDD pattern ensures every line of code has corresponding test coverage.
TDD was formally introduced by Extreme Programming (XP) pioneer Kent Beck in 2003. Its "Red-Green-Refactor" cycle seems simple but embodies profound engineering philosophy. The failing test in the red phase is essentially describing expected behavior in code — equivalent to an executable requirements document. The "minimum code" emphasis in the green phase prevents over-engineering. The refactor phase allows safe optimization under test protection. In AI coding scenarios, TDD is especially important — it provides an objective correctness verification standard for AI-generated code, eliminating reliance on subjective judgment. Internal research at Google shows that projects using TDD have 40%–90% lower defect density than traditional development.
Step 6: Request Code Review
A systematic review is conducted based on the plan, with review reports submitted by severity level. Issues that are too severe will block progress and trigger code regeneration, ensuring code quality from the source.
Step 7: Complete the Development Branch
After verifying all tests pass, the AI offers options to keep, create a PR, merge, or discard. Finally, it cleans up the work tree, completing a full development cycle.
Installation and Usage
Installing Superpowers in the Codex App is straightforward:
- Click the "Plugins" panel on the left
- Search for "Superpowers"
- Click install
After installation, you'll see it integrates 14 skills, all enabled by default.

To use it, activate it in the chat window with the slash command /superpowers. An easily overlooked tip: if you don't know how to use it, just ask it directly — it will explain the core usage in detail. Before starting any task, it first determines whether a relevant Skill exists, matches the appropriate one, opens the corresponding skill.md file, and follows the process.
Common skill-matching scenarios:
- Fixing a bug → Automatically invokes Debug Skill
- Building a new feature → Invokes Brainstorming Skill
- Implementing a plan → Invokes Writing Plans Skill
- Code review → Invokes Review Skill
Real-World Case Study: Building a Beike-Style WeChat Mini Program
To validate the practical effectiveness of Superpowers, the author used it to develop a second-hand property listing WeChat Mini Program similar to Beike (China's leading real estate platform), without writing a single line of code.
Requirements Confirmation Phase
The AI gradually confirmed requirements through extensive questioning: Mini Program type (display-only / operational version / commercial version), property types (second-hand / new / rental), how the appointment viewing feature should be implemented (form simulation / local cloud database / WeChat Cloud), and more.

It even proactively confirmed the data model and interaction flow: using mock files locally, Node.js environment, JS data source structure under the data directory, and the complete interaction path — users entering the homepage, searching, browsing listings, clicking details, bookmarking, and filling out forms.
Plan Generation and Step-by-Step Execution
After confirmation, the AI generated a detailed plan document containing the file structure, specific content for each task, and verification steps. The entire project was broken down into 6 tasks, each undergoing the dual review of specification compliance and code quality.

Final Results
The completed WeChat Mini Program included the following feature modules:
- Homepage: City location, map-based property search, subway-based property search, search functionality
- Second-hand Property Listings: Price filters (under 5M / 5–8M, etc.), layout filters (two-bedroom, etc.)
- Property Details: Layout, orientation, area, community location, property description, image gallery
- Bookmarks: Save/unsave functionality, unified management on the "My" page
- Complete Mini Program Structure: app.js, app.wxss, config files, page components — everything included
Honest Feedback on Token Consumption
It's important to note that Superpowers significantly increases token consumption. Some bloggers online claim an increase of 10%–15%, but the author's personal experience suggests it's at least 50% more, with consumption roughly doubling.
Tokens are the basic billing unit for large language models — roughly 1 token equals about 0.75 English words or 0.5 Chinese characters. Taking GPT-4o as an example, input tokens cost approximately $2.5/million and output tokens about $10/million. Superpowers increases token consumption because each phase requires additional context passing (plan documents, design proposals, review reports, etc.), and coordination communication between sub-agents also incurs overhead.
However, the author believes this cost is worthwhile: a complete workflow means you can precisely modify any unsatisfactory stage along the way. Compared to the "write and fix as you go" approach, a structured development process reduces the cost of repeatedly starting from scratch, and is actually more token-efficient in the long run. From an ROI perspective, if one structured development session produces usable code while an unstructured approach requires 3–5 complete restarts, the actual total consumption ends up being lower. This mirrors the classic tradeoff in software engineering between "upfront design cost vs. rework cost."
Summary
The core value of Superpowers lies in codifying software engineering best practices into a standardized, AI-executable workflow. For developers, it's a coding partner that strictly follows established conventions. For non-developers, it dramatically lowers the barrier to building software tools from scratch.
After using it for a few projects, the "plan first, execute second" mindset naturally becomes second nature — and developing this structured thinking might be the greatest value Superpowers delivers.
Related articles

Codex VS Claude Code: The Token Economics Behind a 10x Price Gap
Same coding task: Codex costs $15, Claude Code costs $155. Deep dive into the real reasons behind the 10x gap — it's not pricing, it's token volume, output style, and context strategy.

Gemma 4 Open-Source Model Local Deployment Guide: Ollama Installation & Mobile Setup
Step-by-step guide to deploying Google's Gemma 4 open-source model locally with Ollama and running the lightweight version on mobile with tool calling support.

The Decline of Tokenmaxxing: Why Selling Outcomes Matters More Than Selling Tokens
The Tokenmaxxing craze is fading as enterprise AI procurement shifts from chasing Token counts to focusing on actual business outcomes. Learn why outcome-based AI evaluation is the right approach.