AI Reverse Engineering in Practice: Automating Crawler Encryption Cracking with MCP Workflows

Introduction: AI Is Reshaping How Reverse Engineering Works

Traditional crawler reverse engineering—extracting code, simulating environments, and deobfuscating—often takes days or even a week, and demands deep technical expertise. Today, through the combination of AI + MCP (Model Context Protocol) workflows, these tasks can be completed in just hours.

This article provides a detailed guide on building an MCP-based AI reverse engineering workflow to achieve semi-automated processing of protocol crawler deobfuscation, pure algorithm reconstruction, and environment simulation.

AI Crawler Reverse Engineering MCP Workflow

Core Toolchain Overview

What Is MCP?

MCP (Model Context Protocol) is essentially a model context linking protocol that allows us to pass business logic to AI, which then invokes various tools to complete specific tasks. Simply put, MCP is the standardized interface for "directing AI to do work."

MCP was originally proposed and open-sourced by Anthropic in late 2024, aiming to solve the standardization problem of connecting large language models with external tools and data sources. Before MCP, every AI application needed custom integration code for different tools, leading to severe ecosystem fragmentation. MCP adopts a client-server architecture and defines a unified JSON-RPC protocol, enabling AI models to discover and invoke external tools through standardized interfaces. In reverse engineering scenarios, MCP's value lies in its ability to orchestrate discrete capabilities—browser debugging, code analysis, file operations—into a unified workflow, allowing AI to switch between tools collaboratively just like a human engineer.

Essential Tool Checklist

The complete workflow requires these core components:

Chrome MCP — For browser debugging, enabling encryption localization, instrumentation analysis, call stack tracing, etc.
Remus MCP — For JS analysis, deobfuscation, and related processing
Qwen/DeepSeek API — Provides large model reasoning capabilities (Qwen 3.5 Plus or Qwen Coder Plus recommended)
AST Deobfuscation Skill Pack — Integrates AST syntax tree processing for code restoration

AST (Abstract Syntax Tree) deobfuscation is a technique that parses obfuscated JavaScript code into a tree structure, then restores code readability by traversing and transforming nodes. Common obfuscation techniques include: control flow flattening (breaking sequential logic into switch-case state machines), string encryption (replacing plaintext strings with decryption function calls), dead code injection, and variable name obfuscation. AST deobfuscation tools (such as Babel plugins) identify these obfuscation patterns through pattern matching, perform constant folding, dead code elimination, control flow recovery, and other transformations, ultimately outputting readable code close to the original logic. This process is essentially the reverse application of compiler optimization.

Environment Requirements

Node.js 20.0+
Python 3.7+
Editor: Qwen Edit (command-line mode) or VS Code + Roo Code plugin (visual mode)

Detailed Setup Steps

Step 1: Install MCP Tools

After cloning the two core repositories from GitHub, place them in the same working directory. For Remus MCP, execute:

npm install
npm run start

The first command installs dependencies, and the second starts the MCP service. Watch out for port conflicts—if a local service already occupies the same port, you'll need to free it first.

Step 2: Configure Chrome Debugging Environment

Chrome MCP doesn't require additional installation, but you need to create a startup script (.bat file) that:

Kills previous Chrome processes (to prevent port occupation)
Relaunches Chrome in remote debugging mode

Key configuration items:

Chrome installation path (change to your actual path)
UserData configuration location (recommended: place it under the AI project directory)

Chrome's remote debugging protocol (Chrome DevTools Protocol, or CDP) is the technical foundation of this step. CDP allows external programs to control the Chrome browser via WebSocket connections, performing page navigation, DOM manipulation, network interception, JavaScript execution, and more. Chrome MCP communicates with the browser through CDP, enabling AI to perform breakpoint debugging, view call stacks, and monitor network requests just like a human developer using DevTools.

Step 3: Configure Model API

Using Alibaba's Bailian platform to obtain an API Key is recommended:

Log in to the Bailian platform and select the Beijing region
First-time users get approximately 1 million free tokens
Create an API Key and save it

For users in China, Qwen and DeepSeek are recommended; for international users, Claude or Codex are good alternatives. Note that reverse engineering work consumes far more input tokens than output tokens, since large JS files need to be fed to the model.

In AI-assisted reverse engineering scenarios, token consumption shows a clear input-heavy pattern. A typical obfuscated JS file may contain hundreds of thousands of characters (approximately 100,000-300,000 tokens), while AI analysis output is usually only a few thousand tokens. Taking the Qwen model as an example, input token pricing is approximately ¥0.004/thousand tokens, and output is approximately ¥0.012/thousand tokens. When processing Xiaohongshu's X-S, you need to feed large JS files multiple times for analysis, deobfuscation, and logical reasoning, with cumulative input potentially reaching millions of tokens, bringing total costs to the ¥20-30 range. By comparison, outsourcing the reverse engineering of an X-S signature traditionally costs ¥300-500, with much longer delivery timelines.

Step 4: Write MCP Configuration File

The configuration file (synthes.json) is located in the .qwen hidden folder under the user directory and contains three core sections:

Working path configuration — Where AI-generated code and analysis results are stored
Chrome MCP connection — Uses npx to maintain automatic updates
Remus MCP connection — Uses node to point to local files

After configuration, restart the terminal, enter qwen to access the interactive interface, and verify that all MCP services have started successfully.

Practical Demonstrations

Case 1: Government Website Data Decryption

For a government website returning encrypted data, the task was initiated with this prompt:

Create a project folder in the current working path. The target website returns dynamically encrypted data. Debug the browser to decrypt the data and implement pure algorithm reconstruction locally using Node.js.

AI workflow:

Automatically creates the project directory
Invokes Chrome MCP to visit the target website
Analyzes network requests and locates encryption logic
Identifies the AES encryption algorithm, extracts IV and Key
Generates a local decryption script
Successfully collects and decrypts data

The entire process took approximately 10-15 minutes, with AI automatically identifying the AES algorithm and implementing pure algorithm decryption. "Pure algorithm reconstruction" here means reproducing the encryption/decryption process locally through mathematics and cryptographic algorithms without relying on a browser environment. AES (Advanced Encryption Standard) is currently the most widely used symmetric encryption algorithm, with its security depending on key secrecy rather than the algorithm itself. In web scenarios, since frontend code is visible to users, AES keys and initialization vectors (IV) are often hardcoded in JS or generated through predictable methods, making reverse extraction possible.

Case 2: Xiaohongshu X-S Signature Analysis

Xiaohongshu's X-S parameter involves JSVMP protection and code obfuscation, traditionally requiring 1-2 days of manual analysis.

JSVMP (JavaScript Virtual Machine Protection) is an advanced code protection technique whose core idea is to compile original JavaScript code into custom bytecode (opcodes), then execute these bytecodes at runtime through a self-implemented virtual machine interpreter. This means that even if an attacker obtains the complete JS file, they only see the VM's dispatch loop and arrays of unreadable bytecodes, not the original business logic. Traditional JSVMP cracking requires reverse-analyzing the VM's instruction set, operand stack, and register mapping relationships—an extremely labor-intensive process.

AI workflow approach:

First use online AI to generate targeted prompts
Have Chrome MCP visit Xiaohongshu and locate the encryption entry point
Save the obfuscated JS file locally
Invoke AST deobfuscation skills for code restoration
Analyze the restored code logic
Implement environment simulation or pure algorithm emulation

Environment Simulation is one of the core techniques in protocol crawler reverse engineering. When encrypted JS code runs in a browser, it accesses numerous browser environment APIs (such as window, document, navigator, canvas, etc.), and the return values of these APIs participate in signature calculation. When extracting encrypted code to run independently in a Node.js environment, the code will error or produce incorrect results due to the absence of these browser objects. Environment simulation means building mock versions of these browser objects and APIs in Node.js so that the encrypted code can execute correctly outside the browser. High-quality environment simulation requires precisely mimicking every environment characteristic the target website detects, including UA, screen resolution, WebGL fingerprints, and more.

According to testing, completing Xiaohongshu X-S environment simulation takes approximately 40 minutes and consumes about ¥20-30 in tokens (using the Qwen model).

Key Insights and Considerations

Limitations of AI-Assisted Reverse Engineering

Reverse engineering fundamentals still required — If you don't understand reverse engineering, you won't even know what to ask AI to do. AI may "go off track," requiring manual intervention to correct direction.
Prompts are the core skill — Learning to decompose tasks and guide AI step-by-step is the key competency. For complex tasks, proceed incrementally rather than dumping everything at once.
Risk control remains unsolved — AI can currently handle encryption signatures, decryption simulation, environment patching, and unpacking/decompilation, but account-level risk control remains a bottleneck.

Account risk control refers to the technical system through which platforms identify abnormal access via multi-dimensional behavioral analysis. Even with perfectly reconstructed encryption signatures, platforms can still identify crawler traffic through device fingerprint correlation, abnormal access frequency, behavioral sequence analysis (such as lacking normal browse-click-scroll behavior chains), IP reputation scoring, and more. Risk control systems are typically based on machine learning models that make real-time decisions across hundreds of feature dimensions, making pure protocol-level simulation insufficient to fully bypass them. This is why even though AI can solve technical encryption problems, large-scale data collection still faces challenges.

Cost and Efficiency Comparison

Target	Traditional Approach	AI-Assisted
Xiaohongshu X-S	1-2 days, ¥500	40 minutes, ¥20-30 in tokens
Rui Shu encryption	3-7 days	3-4 hours
Simple AES decryption	Several hours	10-15 minutes

Verified Feasible Targets

According to testing, the following encryption systems can all be processed through the AI workflow: JD H5ST, Pinduoduo Anti-Content, Tencent Tianyu CAPTCHA, Akamai, Sifton and other international solutions, as well as mini-program, APP, and iOS reverse engineering work.

Rui Shu (River Security) is a leading Chinese dynamic security protection vendor whose Bot protection products are widely deployed across government, financial, and telecom industry websites. Rui Shu's core protection mechanisms include: dynamic tokens (generating different encrypted JS for each visit), Cookie encryption verification, mouse trajectory and behavior detection, and multi-layer code obfuscation with self-verification. Since its JS code changes dynamically with each load, traditional static analysis methods are virtually ineffective, requiring real-time parsing of dynamically generated encryption logic. Rui Shu is widely recognized in the industry as one of the most difficult web protection solutions to crack in China. The AI workflow compresses its cracking time from the traditional 3-7 days to 3-4 hours, demonstrating enormous efficiency gains.

Akamai is one of the world's largest CDN and web security providers, whose Bot Manager product identifies automated access through browser fingerprint collection, sensor data analysis, JavaScript challenges, and other multi-layer protections. JD H5ST is JD.com's proprietary frontend signature scheme that employs a multi-version iteration strategy, with each version having different algorithm structures, increasing the ongoing maintenance cost of reverse engineering.

Future Outlook

Notably, MCP itself may soon be superseded by more advanced solutions. New tools like OpenClio no longer rely on MCP's logical architecture and can directly collect data from multiple platforms based on protocols with one click. The pace of technological iteration far exceeds imagination—while MCP workflows currently perform excellently, more efficient alternatives are already on the way.

This rapid iteration reflects a universal trend in the AI toolchain space: evolution from manually orchestrated tool protocols (like MCP) toward higher-level autonomous Agent architectures. Future AI reverse engineering tools may no longer require humans to define specific tool invocation flows; instead, AI will autonomously plan, execute, and verify the entire reverse engineering process, with humans only needing to provide the final objective.

For practitioners, the core advice is: embrace AI as an efficiency tool, but don't abandon your understanding of underlying reverse engineering principles. AI is an accelerator, not a replacement—at least not yet.

Key Takeaways

A complete AI reverse engineering workflow can be built using Chrome MCP and Remus MCP combined with large model APIs
AI reverse engineering can compress traditional multi-day encryption cracking work to just hours, at a cost of only ¥20-30 in tokens
Verified capable of handling mainstream encryption schemes including JD H5ST, Xiaohongshu X-S, and Pinduoduo Anti-Content
AI currently cannot solve account-level risk control issues and still requires users to have basic reverse engineering knowledge to guide AI direction
MCP technology itself faces the trend of being replaced by more advanced solutions like OpenClio, with extremely rapid technological iteration