Can AI Agents Replace Security Engineers? Insights from Coding Agents Reveal the Industry Truth

Introduction: AI Can Find Vulnerabilities — Are Security Engineers Doomed?

"AI can already find vulnerabilities — are all security engineers going to lose their jobs?" This is a question that cybersecurity professionals and students have been asking with increasing frequency. As large language models advance at breakneck speed — from DeepSeek to Doubao, from ChatGPT to various AI coding tools — artificial intelligence seems to be infiltrating every corner of cybersecurity.

Wu Ya (also known as JK Tezhujiang), a security instructor with years of offensive and defensive security experience and three years of focused research on AI security and AI-assisted penetration testing, systematically dissected this question during a livestream. His core argument: To answer whether AI can replace security engineers, you first need to understand what stage AI is actually at in cybersecurity applications — is it theoretical, in early adoption, or already deployed at scale?

From Chat Tools to AI Agents: A Qualitative Leap in Capability

The Ceiling of Chat Tools: All Talk, No Action

Most people already use AI chat tools like Doubao, DeepSeek, and Kimi in their daily lives — for research, writing code, generating images, creating videos, and even for medical consultations or emotional support. In programming scenarios, if you ask Doubao to "write a network port scanner in Python," it can quickly generate runnable code. Compared to the old days of hunting for code snippets on GitHub, copying, pasting, and debugging, the efficiency improvement is an order of magnitude.

But problems emerge in more complex scenarios. When you need to develop a complete system (say, a recruitment platform built on Spring Boot + MyBatis), Doubao can only tell you how to write the code. You still need to open your IDE, create files, paste code, run and debug, go back to ask about errors, paste fixes again... Constantly switching between two tools drastically reduces efficiency.

This is the biggest limitation of traditional AI chat tools: they can only "talk" but can't "act." They can't create files, run commands, connect to databases, or open browsers — none of these actual operations are possible.

Agents: AI Grows Hands and Feet

The real transformation comes from the emergence of Agents. The Agent concept originates from "autonomous agent" theory in AI research, traceable back to distributed artificial intelligence in the 1980s. But what truly brought Agents from academic concept to engineering reality was the breakthrough in LLM capabilities since 2023. OpenAI's Function Calling mechanism, LangChain's tool-chain orchestration framework, and the ReAct (Reasoning and Acting) paradigm gave LLMs the cyclical ability to "think-act-observe." Current mainstream AI Agent architectures typically include an LLM as the "brain," a toolset as the "hands and feet," and a memory module as the "experience bank" — all three working together to enable truly autonomous task execution.

AI coding agents like ByteDance's Trae, GitHub Copilot, Cursor, Claude Code, and Tencent's CodeBuddy are fundamentally different from chat tools.

During the livestream, Wu Ya demonstrated Trae's workflow live: you only need to provide a simple requirements description, and it automatically analyzes the task, breaks down the steps, creates the project directory structure, writes frontend and backend code, and configures the database — the entire process is completely autonomous, with the user only needing to "accept" the final result. Even if you've never learned any programming language, you can develop a complete project through an AI Agent.

The Six Core Capabilities of AI Agents:

Autonomous Perception: Sensing current environment information such as system version, disk files, and project runtime status
Planning: Automatically breaking down task steps based on user requirements
Decision-Making: Determining the next action based on perceived conditions
Execution: Running system commands, creating files, launching projects, and other actual operations
Memory: Maintaining context memory throughout the entire task lifecycle, unlike chat tools where each session is independent
Tool Invocation: Calling external tools to complete specific tasks

It's the combination of these capabilities that evolved agents from "can only talk" to "can get things done."

Crawfish and Hermes: AI Agents Reach New Heights

Why Did They Explode on GitHub?

Two projects that recently went viral on GitHub — Crawfish (小龙虾) and Hermes (爱马仕) — gained over 200,000 Stars in a short period. GitHub Star count is one of the key metrics for measuring an open-source project's popularity. A project receiving 200,000 Stars means it has attracted enormous attention in the global developer community — for comparison, the Linux kernel has about 180,000 Stars on GitHub, and React has about 230,000. However, Star count doesn't equate to technical maturity or production readiness; many viral projects may still be in early experimental stages. This "attention economy" phenomenon in the open-source community reflects the developer community's strong anticipation for the AI Agent direction.

Compared to ordinary coding agents, they have several key differences:

Fully autonomous task decomposition: A complete implementation of core agent capabilities
OS-level permissions: Can directly operate your operating system (hence the recommendation to install in a virtual machine to avoid security risks)
24/7 persistent operation: Unlike Doubao which relies on web sessions, it runs continuously on your computer, supporting scheduled tasks (e.g., summarizing trending news every morning at 10 AM, generating data reports every evening at 9 PM)
Remote control: After configuring APIs through WeChat, DingTalk, Feishu, etc., you can issue commands remotely

Hermes also features automatic evolution capability — the longer it runs, the better it performs. However, it currently only has a character terminal interface with no graphical UI, making it relatively difficult for ordinary users to operate.

A Thought-Provoking Use Case

Imagine you're sunbathing on a beach in Sanya, and you send a command via WeChat to Crawfish running on your computer — it automatically completes the task on your machine. This is no longer science fiction but an achievable reality — though product maturity still needs improvement.

AI's Real Impact on the Cybersecurity Industry

Wu Ya particularly emphasized a critical insight: The essence of large language models is a probability generator, not a truly omniscient oracle. AI hallucination (confidently generating nonsense) is unavoidable.

The root cause of AI hallucination lies in the generation mechanism of large language models — they're essentially performing "next token probability prediction" rather than retrieving facts from a deterministic knowledge base. When the model hasn't seen an accurate answer to a particular question in its training data, it will "fabricate" a plausible-sounding but actually incorrect response based on statistical patterns. In cybersecurity scenarios, such hallucinations are especially dangerous: AI might fabricate a non-existent CVE number, provide an incorrect vulnerability exploitation path, or falsely flag secure code as vulnerable. The industry currently mitigates hallucination through RAG (Retrieval-Augmented Generation), knowledge graph constraints, and RLHF (Reinforcement Learning from Human Feedback), but cannot completely eliminate it.

This means AI can be a powerful assistive tool for security engineers, but cannot fully replace human professional judgment.

Current State of AI in Security

From the livestream content, it's clear that AI in cybersecurity is currently in a transitional period from early application to deep application:

What it can already do: Assist in writing security tools, automation scripts, code auditing, solving CTF challenges, and information gathering and preliminary analysis for penetration testing
What it still can't do well: Complex logical reasoning, multi-step attack chain construction, vulnerability discovery requiring creative thinking, and deep understanding of business logic
What it shouldn't do: Making security decisions entirely dependent on AI while ignoring human verification

To understand why AI struggles to fully replace security engineers, you need to appreciate the inherent complexity of modern cybersecurity offense and defense. A complete penetration test typically includes information gathering, vulnerability scanning, exploitation, privilege escalation, lateral movement, data exfiltration, and evidence cleanup — each stage requiring dynamic strategy adjustment based on real-time feedback. More critically, many high-value vulnerabilities (such as business logic flaws and race conditions) require deep understanding of the target system's business processes — this kind of "context awareness" remains an AI weakness. Additionally, real-world adversarial environments involve extensive WAF bypassing, traffic detection evasion, and other adversarial operations that require creative thinking rather than pattern matching.

Practical Advice for Security Professionals

Wu Ya's stance is clear: It's not that you don't need to learn security skills anymore — you need to master security fundamentals AND learn to use AI to amplify your capabilities. The notion that "just learning AI vulnerability hunting is enough" is dangerous — without solid security foundations, you can't even judge whether AI's output is correct.

Conclusion: AI Is a Weapon, Not a Replacement

From chat tools to coding Agents, from ordinary agents to Crawfish and Hermes, AI's capability boundaries are expanding rapidly. But at least at the current stage, AI is more like a new weapon in the security engineer's arsenal, not a terminator that replaces security engineers.

What you should really worry about isn't AI replacing you — it's that security engineers who know how to use AI will replace security engineers who don't. Diligently mastering fundamental skills while embracing AI tools is the right posture for security professionals in this era.

Finally, no matter how powerful the technology you master, always remember: Use technology for legitimate purposes — the red lines of cybersecurity law must never be crossed.