Claude Code + Skills: Auto-Generating Test Cases in 10 Minutes

Introduction

In software testing, writing test cases has always been a time-consuming and highly repetitive task. Test engineers must read through requirements documents line by line, understand the business logic, and then apply test design methods such as equivalence class partitioning, boundary value analysis, and decision tables to expand each requirement into executable test cases. A medium-sized requirements document typically produces hundreds of cases, and frequent requirement changes force massive synchronous updates, driving maintenance costs sky-high. For a requirements document spanning nine chapters, manually writing test cases could take several days. But with Claude Code combined with the Skills system, the entire process can be compressed to under 10 minutes — a dramatic efficiency boost.

This article provides a detailed walkthrough of how to use Claude Code's Skills feature to achieve fully automated test case generation from requirements documents.

Overall Solution Architecture: A Three-Phase Automated Workflow

Claude Code and the Skills System

Claude Code is a command-line AI programming tool from Anthropic that allows developers to interact with the Claude model directly in the terminal for coding, debugging, document processing, and more. Skills is a key capability extension mechanism — users can define a set of structured instruction templates (i.e., Skills) via Markdown files, where each Skill includes explicit input/output specifications, execution steps, and constraints. When a user issues a natural language instruction, Claude Code automatically matches and invokes the corresponding Skill, executing it step by step according to the preset workflow. The core value of this mechanism lies in transforming one-off prompt engineering into reusable, composable, standardized workflows — especially suited for complex tasks requiring multi-step coordination.

Three-Phase Workflow Design

The core idea behind this solution is to break test case generation into three phases, each corresponding to an independent Skill:

Phase 1: Requirements document splitting and review
Phase 2: Test point extraction
Phase 3: Test case generation and export

The user only needs to issue a single instruction — "Please invoke the skill to generate functional test cases from the requirements document" — and the AI will automatically parse the instruction and execute the preset Skill workflow sequentially.

Overall workflow overview

Phase 1: Intelligent Requirements Document Splitting

Splitting by Chapter As-Is

The core task of Phase 1 is to split the requirements document by chapter. There's an important principle here: split exactly as-is — don't add requirements, don't remove requirements, and don't perform any extra extraction.

Taking a requirements document with nine chapters as an example, the AI automatically splits it into nine independent folders, each preserving the original requirement content unchanged.

Requirements splitting and review

Handling Tables and Images

The trickiest parts of requirements documents are tables and flowcharts. A key highlight of this solution is:

Tables to text: Converting tabular requirements into structured text descriptions
Images to text: Converting flowcharts and other image-format requirements into text descriptions

The reason for this is that the core capabilities of large language models are built on text sequence processing. Although multimodal models already have image understanding capabilities, pure text input still offers higher reliability and consistency when it comes to precisely extracting structured information. Tables in requirements documents typically contain field definitions, state transition rules, permission matrices, and other critical information — inputting them directly as Markdown tables or structured text descriptions avoids OCR recognition errors and table parsing misalignment issues. Converting flowcharts to text requires transforming the nodes, decision branches, and flow paths in the diagram into natural language descriptions, ensuring the AI can accurately capture every business branch and exception path. This is crucial for the completeness of test point extraction in subsequent phases.

Requirements Review and Annotation

During the splitting process, the AI also performs a preliminary review of the requirements. If it finds areas where the requirement description is unclear, it automatically flags them. For example, for a given module, the AI will annotate:

Original location (e.g., "Chapter 3, Section 3.1")
Module overview
Conditional exceptions and boundary cases
Acceptance criteria

Structured content after splitting

Phase 2: Intelligent Test Point Extraction

Automatic Filtering of Non-Requirement Chapters

When extracting test points, the AI automatically identifies and skips non-requirement chapters such as "Overview" or "Description" sections (e.g., Chapters 1 and 2 of a document are typically project overviews), focusing test point extraction only on actual functional requirements.

Structured Test Point Format

Each test point is stored in a unified JSON format. JSON (JavaScript Object Notation), as a lightweight data interchange format, is widely used in test engineering. Compared to traditional Excel spreadsheets, JSON format offers clear hierarchical structure, easy programmatic parsing, and version control support (Git diff friendly). In this solution, storing test points in JSON format not only facilitates data transfer and processing between different phases by the AI, but also enables convenient data integration with mainstream test management platforms such as TestRail, Zentao, and JIRA — achieving seamless handoff from AI generation to test platform import.

Each test point contains the following key information:

Parent module: Clearly identifies which functional module it belongs to
Parent chapter: Corresponds to the specific chapter in the requirements document
Test point title: e.g., "Verify guest can normally access the mall homepage"
Test steps: Specific operational steps
Expected results: Expected system responses
Priority: Test priority classification
Test method: Test design method used (equivalence class, boundary value, etc.)

Structured test point format

About Test Design Methods

The equivalence class partitioning and boundary value analysis mentioned in this solution are the most classic black-box test design methods in software testing. Equivalence class partitioning divides input data into several equivalence classes, where data within each class has equivalent effectiveness in revealing program errors — selecting just one representative value from each class effectively reduces the number of test cases. Boundary value analysis focuses on boundary points of input ranges, as extensive practice has shown that program errors tend to cluster near boundaries. Other methods include decision table testing (suitable for multi-condition combination scenarios), orthogonal array testing (suitable for parameter combination explosion scenarios), and scenario-based testing (suitable for business process testing). The AI's ability to automatically identify requirement characteristics and select appropriate design methods during test case generation is a significant advantage over template-based tools.

Automatic Review Mechanism

All extracted test points undergo an automatic review round after generation to ensure completeness and reasonableness. This is equivalent to building a Quality Gate into the AI workflow. A Quality Gate is an important concept in software engineering — it refers to setting quality checkpoints at critical nodes in a process, where only artifacts that pass inspection can proceed to the next phase. This concept originates from automated checking mechanisms in CI/CD (Continuous Integration/Continuous Delivery) pipelines, such as requiring code submissions to pass unit tests and static analysis before merging. In this solution, the AI automatic review performs multi-dimensional verification including completeness checks (whether all requirement items are covered), consistency checks (whether test steps match expected results), and redundancy checks (whether duplicate or highly similar test points exist) — achieving quality self-control within the automated workflow and effectively reducing omissions and redundancies.

Phase 3: Test Case Generation and Export

Case Generation Logic

Taking "Guest Browsing Flow" as an example, a single simple description in the requirements document may be split into multiple test cases:

Verify guest can normally access the mall homepage
Verify category browsing functionality
Verify search functionality
Verify product detail viewing
Verify shopping cart operation prompts login
Verify favorites function prompts login
Verify placing an order prompts login
Verify user center prompts login

A single requirement sentence is expanded into eight specific test cases, covering both normal flows and exception scenarios. This expansion capability is precisely the embodiment of AI combined with test design methodology — it not only understands the literal meaning of the requirement but also automatically derives permission validation scenarios for each functional entry point under the guest identity based on scenario-based testing, ensuring every possible user interaction path is covered.

Case Output Format

The final generated test cases contain complete fields:

Field	Description
Title	Test case name
Test Type	Functional test / API test, etc.
Priority	P0/P1/P2/P3
Preconditions	Conditions that must be met before execution
Test Data	Required test data
Test Steps	Detailed operational steps
Expected Results	Expected system behavior
Manual Review Column	Reserved space for manual review
Associated Test Points	Corresponding test point IDs

The priority system uses the industry-standard P0-P3 four-level classification: P0 is blocker level (core functionality unavailable), P1 is critical level (major functionality abnormal), P2 is normal level (minor functionality issues), and P3 is minor level (UI or experience issues). This classification helps testing teams prioritize high-priority cases when time is limited, implementing a risk-based testing strategy.

Efficiency Comparison and Practical Value

Time Comparison

Manual writing: A nine-chapter requirements document takes at least 2-3 days
AI auto-generation: The entire process takes less than 10 minutes

Quality Assessment

From a practical standpoint, AI-generated test cases offer the following advantages:

Comprehensive coverage: No requirement points are missed — every functional description is expanded into specific cases
Consistent formatting: All cases follow the same structural standards, making them easy to manage and maintain
Traceability: Every case can be traced back to its corresponding requirement chapter and test point. This three-level traceability chain of requirement → test point → test case also embodies the core concept of the Requirements Traceability Matrix (RTM), which is especially important in audit and compliance scenarios
Actionable: The output consists of actually executable test cases, not vague testing ideas

Caveats

Although the quality of AI-generated cases is relatively high, the solution still retains a manual review column, indicating that fully replacing human judgment is not yet realistic. AI still has limitations in handling implicit requirements (rules not explicitly stated in the requirements document but assumed by default in the business), cross-module interaction logic, and compliance requirements specific to certain business domains. It's recommended to treat AI-generated output as a first draft, with test engineers performing final confirmation and supplementation, focusing on the reasonableness of business logic and the completeness of boundary scenarios.

Conclusion

The Claude Code + Skills combination provides a practical, deployable solution for automated test case generation. By breaking the complex task into three phases — "Requirements Splitting → Test Point Extraction → Case Generation" — with each phase handled by an independent Skill, the approach ensures both process controllability and end-to-end automation. This phased, composable design philosophy also reflects the classic software engineering principle of "Separation of Concerns" — each Skill focuses on a single clear responsibility, reducing the complexity of individual steps while allowing any phase to be independently optimized or replaced without affecting the overall workflow.

For testing teams, this represents not just an efficiency improvement but a transformation in work mode — shifting from "writing cases" to "reviewing cases." The core value of test engineers will be freed from repetitive case-writing work and redirected toward higher-level activities such as test strategy formulation, exploratory testing, and quality analysis.

Claude Code + Skills: Auto-Generating Test Cases in 10 Minutes — A Practical Guide