DeepSeek V4 Pro in Action: AI-Assisted Reverse Engineering of Youdao Translate's Encryption Parameters

Introduction: AI Is Lowering the Barrier to Reverse Engineering

Reverse engineering has always been a highly technical field, requiring developers to possess solid JavaScript debugging skills, experience in identifying encryption algorithms, and the patience for meticulous code tracing. Reverse Engineering in the software domain refers to the process of deducing internal implementation logic by analyzing compiled programs or network communication protocols without access to source code. In web security and web scraping, reverse engineering typically targets encryption and signature logic within front-end JavaScript code. As modern web applications widely adopt code obfuscation techniques—such as variable name replacement, control flow flattening, and string encryption—to protect core logic, the difficulty of reverse analysis has increased significantly in recent years.

However, with the rapid advancement of large language model capabilities, this field is undergoing unprecedented transformation.

Recently, a Bilibili content creator demonstrated an impressive case: using the DeepSeek V4 Pro model to reverse-engineer the sign signature parameter in Youdao Translate's API. From locating the encryption logic to fully reproducing the Python request code, the entire process was efficient and smooth. Does this mean AI is making reverse engineering something "anyone can do"?

DeepSeek V4 Pro is a large language model released by DeepSeek, belonging to the premium tier of their fourth-generation product line. DeepSeek is known for its open-source strategy and cost-effectiveness, with models that excel in code comprehension and mathematical reasoning tasks. The V4 Pro version shows significant improvements over its predecessors in long-context processing, code analysis, and tool use capabilities, able to process tens of thousands of tokens of code snippets with precise logical reasoning—making it particularly suitable for analyzing obfuscated JavaScript code.

bilibili source: AI颠覆逆向？Deepseek v4pro AI逆向分析有道翻译？岂不是有手就行！

Reverse Engineering Target: Youdao Translate's Sign Signature Mechanism

API Analysis and Parameter Identification

In Youdao Translate's web translation API, request parameters contain multiple fields. Most are fixed values, but two key dynamic parameters stand out:

T: Timestamp parameter
sign: Signature parameter used for API authentication

API Signature is a common security mechanism in web services. Its core concept involves combining request parameters with a secret key according to specific rules, then performing a hash operation to generate an unforgeable signature value. When the server receives a request, it recalculates the signature using the same rules and compares it to verify the request's legitimacy and integrity. This mechanism effectively prevents parameter tampering and unauthorized API calls. Common signature algorithms include HMAC-SHA256, MD5, etc., while the introduction of salt values increases the difficulty of brute-force attacks.

Using Chrome DevTools' Network panel, filtering by specific API paths allows you to locate the translation request packet. This packet contains the translation result and all request parameters. By globally searching for the sign keyword and setting breakpoints one by one, you can ultimately locate the exact code position where the signature is generated—within a specific function in the app.js file.

Youdao Translate API parameter analysis

Pain Points of Traditional Reverse Engineering

Without AI assistance, developers need to:

Manually read obfuscated JavaScript code
Identify the encryption algorithm type (MD5, SHA256, etc.)
Untangle parameter concatenation logic and fixed salt values
Manually write corresponding Python reproduction code

For less experienced developers, this process can take hours or even longer. Especially when code has undergone multiple layers of obfuscation—with variable names replaced by meaningless characters and function call chains scattered and reorganized—developers must painstakingly extract core logic from vast amounts of irrelevant code.

DeepSeek V4 Pro's Reverse Analysis Process

Step 1: Feeding in the Encryption Code Snippet

After locating where the sign parameter is generated, the relevant function code snippet from app.js is directly provided to DeepSeek V4 Pro, requesting it to analyze the signature generation logic and reproduce it.

The AI first automatically opens a browser, navigates to the Youdao Translate page, inputs the test text "hello world" for translation, and captures the corresponding network request packet. This step demonstrates DeepSeek V4 Pro's tool use capability—the model can not only analyze static code but also manipulate a browser to obtain real-time data to verify its analysis results.

DeepSeek V4 Pro analyzing encryption code

Step 2: AI Automatically Identifies the Encryption Algorithm

After analyzing the code, DeepSeek V4 Pro accurately identified the following key information:

The encryption algorithm is MD5
A fixed salt value exists (a fixed parameter similar to "webMAN")
Signature generation involves two MD5 operations
Intermediate steps include a modulo operation on string length

MD5 (Message-Digest Algorithm 5) is a hash function designed by Ronald Rivest in 1991 that maps input data of arbitrary length to a fixed 128-bit (16-byte) hash value, typically represented as a 32-character hexadecimal string. Although MD5 has been proven to have collision vulnerabilities in terms of cryptographic security (Professor Wang Xiaoyun's team first achieved MD5 collision attacks in 2004) and is no longer recommended for security-sensitive scenarios, it is still widely used by web applications for API signatures, data verification, and other non-high-security scenarios due to its fast computation speed and simple implementation.

The specific logic is: first perform MD5 encryption on the timestamp, then concatenate the result with fixed parameters, translation text, etc., and finally perform another MD5 encryption to obtain the final sign value. This double MD5 with salt approach, while not high-strength encryption, is sufficient to prevent simple parameter forgery.

AI identifies double MD5 encryption logic

Step 3: Generating Runnable Python Code

The AI not only analyzed the logic but also directly output complete Python implementation code. After copying the code to create a local Python file, running it successfully generates the correct signature value.

Subsequently, by copying the cURL request from the browser and converting it to Python code, then integrating the signature generation logic, a complete translation request script is obtained. cURL is a command-line tool for sending HTTP requests. Chrome DevTools supports copying captured network requests directly in cURL command format, which includes complete request headers, cookies, request body, and other information. Developers can use online tools like curlconverter or Python libraries to automatically convert cURL commands into Python code using the requests library. This is a commonly used rapid prototyping method in web scraping development that ensures request headers and other details are completely consistent with browser behavior.

Generating Python request code

Verification Results

In the final test, after changing the translation keyword to "watermelon" (西瓜) and running the script, it successfully returned the correct translation result (including descriptions like "a herbaceous plant"), proving that the entire reverse engineering reproduction was completely correct.

Technical Analysis: Advantages and Limitations of AI-Assisted Reverse Engineering

Advantages

Fast algorithm identification: AI can quickly identify common encryption algorithms (MD5, AES, RSA, etc.), eliminating manual judgment time. Large language models have been exposed to massive amounts of encryption algorithm implementation code during pre-training, enabling them to quickly identify hidden algorithm characteristics in obfuscated code through pattern matching—such as specific constants, computational steps, or function call patterns.
Strong code reproduction capability: For obfuscated but logically clear code, AI can directly output equivalent Python implementations.
End-to-end solution: From analysis to code generation in one step, significantly lowering the technical barrier.

Limitations and Considerations

Initial code location still requires manual effort: The video explicitly mentions that the encryption code location was "previously manually identified"—AI cannot automatically complete this step. This means developers still need to master Chrome DevTools usage, breakpoint debugging, call stack tracing, and other fundamental skills.
Limited in complex obfuscation scenarios: For multi-layer nested obfuscation, control flow flattening, and other advanced protection techniques, AI's analytical capability may be significantly reduced. Control Flow Flattening is an advanced code obfuscation technique that breaks apart a program's originally clear if-else, for-loop, and other control structures, wrapping them in a large switch-case statement inside a while loop, using state variables to control execution order. This makes the code's logical flow extremely difficult to trace—even experienced reverse engineers need considerable time to restore the original logic. Currently, mainstream JavaScript obfuscation tools like Obfuscator.io and jscrambler support this technique. When code undergoes such advanced obfuscation, AI may be unable to complete full logic restoration within its limited context window.
Legal and ethical boundaries: Reverse engineering others' APIs may involve legal risks, and a clear distinction must be made between technical learning and practical application. According to the Computer Software Protection Regulations and the Anti-Unfair Competition Law, unauthorized reverse engineering of others' software for commercial purposes may constitute infringement.

Conclusion and Outlook

DeepSeek V4 Pro demonstrated powerful code comprehension and reproduction capabilities in this case. The entire workflow can be summarized as: manual location + AI analysis + AI code generation + manual verification, forming an efficient human-AI collaboration model.

This doesn't mean reverse engineering is something "anyone can do"—preliminary API analysis and breakpoint location still require a certain technical foundation. However, AI has indeed compressed the most time-consuming "code reading and algorithm reproduction" phase from hours to minutes, representing a tremendous efficiency improvement for security research and technical learning.

As AI model capabilities continue to evolve, human-AI collaboration will become mainstream in fields like reverse engineering and security auditing. We can foresee that future security tools will deeply integrate LLM capabilities, achieving full-pipeline automation from traffic capture and code location to logic restoration. What developers need to do is learn how to better "ask questions" and "guide" AI, rather than doing everything manually. At the same time, defenders will also leverage AI to generate more complex obfuscation strategies, and the technical arms race between offense and defense will unfold at higher dimensions.

Key Takeaways

DeepSeek V4 Pro can accurately identify the MD5 signature algorithm in Youdao Translate's API and generate complete Python reproduction code
The entire reverse engineering workflow adopts a human location + AI analysis collaboration model, compressing code reproduction time from hours to minutes
The sign parameter generation logic involves two MD5 operations, fixed salt concatenation, and timestamp processing
AI-assisted reverse engineering still requires manual completion of foundational work such as API analysis and encryption code location
This technical demonstration is for learning purposes only; actual reverse engineering of others' APIs requires attention to legal and ethical boundaries