AI-Assisted JS Reverse Engineering in Practice: Cracking Youdao Translate's Sign Algorithm

Introduction

In web scraping and API reverse engineering, cracking signature parameters (Sign) is often the most time-consuming step. Traditional methods require developers to manually analyze obfuscated JavaScript code and progressively reconstruct the encryption logic. Now, with the code analysis capabilities of AI large language models, this process can be significantly simplified. This article demonstrates how to use the DeepSeek V4 Pro model to quickly analyze and reconstruct the Sign generation logic of Youdao Translate's API.

bilibili source: AI实战分析有道翻译sign生成逻辑

API Packet Capture and Sign Parameter Identification

Background on API Signature Mechanisms

API signatures (Sign) are a common security verification mechanism in web services. Their core purpose is to prevent request tampering and unauthorized access. The server requires the client to concatenate request parameters according to agreed-upon rules and encrypt them to generate a unique signature value with each request. When the server receives a request, it recalculates the signature using the same rules—if it matches the signature sent by the client, the request is considered legitimate. This mechanism is widely used in payment interfaces, translation APIs, social platforms, and other scenarios, serving as an important line of defense for anti-scraping and API security. Understanding how signature mechanisms work is a prerequisite for API reverse analysis.

Capturing the Translation API Traffic

First, we perform packet capture analysis using the browser's developer tools. Right-click and select "Inspect" to open DevTools, navigate to the Network panel, and enter a translation keyword to trigger a request. By filtering specific JS paths, you can locate the translation API's data packets.

Among the request parameters, most are fixed, but there are two key dynamic parameters:

T: Timestamp, changes with each request
Sign (SAND): Signature parameter, the core of the entire API verification

Locating the Sign Generation Code

By performing a global search for Sign-related keywords, multiple matching locations will appear. After setting breakpoints and debugging each one, the Sign generation location is ultimately identified in a specific function within the app.js file. Specifically, the sign value is generated by a variable named WL. At the breakpoint, the signature calculation process can be clearly observed.

It's worth noting that modern web applications typically obfuscate their frontend JavaScript code, including variable name replacement (e.g., replacing meaningful variable names with meaningless characters like a, b, WL), control flow flattening, string encryption, dead code injection, and other techniques. These obfuscation methods greatly increase the difficulty of manual reading and comprehension while keeping functionality unchanged. Common obfuscation tools include UglifyJS, Terser, JavaScript Obfuscator, and others. This is why traditional reverse analysis is so time-consuming—reverse engineers need to progressively reconstruct obfuscated logic through breakpoint debugging, AST (Abstract Syntax Tree) analysis, and other methods. This is precisely where AI can dramatically improve efficiency.

AI-Assisted Reverse Analysis: DeepSeek Reconstructs the Encryption Logic

DeepSeek V4 Pro's Code Analysis Capabilities

DeepSeek V4 Pro is a large language model developed by DeepSeek, featuring powerful code comprehension, generation, and analysis capabilities. The model excels at code-related tasks, understanding the syntax and semantics of multiple programming languages, identifying common encryption algorithm patterns, and converting implementation logic from one language to another. Its internet access and tool-calling capabilities allow it to automatically visit web pages and execute code for verification, forming a complete analysis-verification loop that is impossible in traditional reverse engineering. It is precisely this combination of capabilities that makes AI-assisted reverse engineering a practical new paradigm.

Feeding the Code to DeepSeek for Analysis

After locating the encryption function, the relevant code snippet is provided directly to the DeepSeek V4 Pro model with the instruction:

"Help me examine the Sign parameter generation logic in this API. I've already located it at this position in the app.js file—please help me reconstruct it."

The AI model automatically performs the following steps:

Opens a browser: Automatically visits the Youdao Translate page
Triggers a translation request: Enters "hello world" for translation
Captures data packets: Obtains the complete request information for the translation API
Analyzes the encryption logic: Based on the provided code location, progressively parses the signature algorithm

AI's Conclusion: Dual MD5 Signature Scheme

After analysis and verification, DeepSeek confirms that the Sign generation algorithm is based on MD5 encryption.

MD5 Algorithm Overview

MD5 (Message-Digest Algorithm 5) is a widely used hash function that maps input data of arbitrary length to a fixed 128-bit (16-byte) hash value, typically represented as a 32-character hexadecimal string. MD5 has properties such as one-way computation (the original text cannot be derived from the hash value) and the avalanche effect (tiny changes in input cause dramatic changes in output). Although MD5 has been proven to have collision vulnerabilities (different inputs can produce the same hash value) and is no longer suitable for high-security scenarios like password storage, it is still widely used in API signatures and data integrity verification due to its fast computation speed and simple implementation. In Python, MD5 can be easily computed using hashlib.md5().

Specific Signature Logic

Obtain a fixed salt value (Salt) parameter
Obtain a fixed MAP parameter
Perform length truncation on the input text
Get the current timestamp
Concatenate the above parameters according to specific rules
Perform the first MD5 encryption on the concatenated result
Concatenate the first encryption result with other parameters again
Perform the second MD5 encryption to obtain the final Sign value

The entire process is a dual MD5 signature scheme, which the AI fully reconstructs into Python code. Dual MD5 (performing MD5 on an MD5 result) is a common enhancement technique. While it doesn't significantly improve security from a cryptographic perspective, it increases the complexity of reverse analysis and renders simple rainbow table attacks ineffective.

Python Implementation and Verification

Running the Reconstructed Code

The Python reconstruction code generated by AI can be run directly. Create a Python file, paste the code, and run it to verify whether the Sign generation logic is correct.

Complete Request Implementation

After verifying the Sign generation logic is correct, the next step is to integrate it into a complete HTTP request:

Right-click in the browser to copy the translation request's cURL command
Convert the cURL to Python requests code
Provide the converted code to AI and have it integrate the dynamic Sign generation logic
AI outputs the final complete request code

cURL is a command-line tool for sending HTTP requests. Browser developer tools allow you to copy captured network requests directly as cURL commands, which contain complete URL, headers, cookies, and body information. Converting cURL to Python requests code is a common operation in web scraping development and can be done automatically through online tools (such as curlconverter.com) or AI models. The converted code retains all parameters from the original request—developers only need to replace dynamic parameters (such as Sign and timestamp) with programmatically computed values.

Running the final code successfully returns the correct translation response. Changing the translation keyword to "西瓜" (watermelon) and running again also produces the correct translation result (watermelon, a herbaceous plant, etc.), proving that the entire reverse engineering reconstruction is completely successful.

Methodology Summary: AI-Assisted JS Reverse Engineering Workflow

Five-Step Efficient Reverse Engineering Process

Through this practical exercise, we can summarize an efficient AI-assisted JS reverse engineering workflow:

Manual Localization: Use traditional methods (searching, breakpoints) to find the approximate location of the encryption function
AI Analysis: Provide the located code snippet to the AI model for deep analysis
Automated Verification: AI automatically opens a browser, triggers requests, and verifies analysis conclusions
Code Reconstruction: AI outputs directly runnable Python reconstruction code
Integration Testing: Integrate the reconstructed logic into a complete request and verify end-to-end functionality

Key Insights

The core advantage of this approach is: humans handle localization, AI handles analysis. Developers still need basic reverse analysis skills to find key code locations, but the most time-consuming work of code comprehension and logic reconstruction can be delegated to AI. For identifying and reconstructing common encryption algorithms like MD5, AES, RSA, and HMAC, AI models demonstrate extremely high accuracy. This is because large language models have been exposed to massive amounts of encryption algorithm implementation code during training, enabling them to identify underlying algorithm patterns from obfuscated code structures—for example, recognizing specific constants (such as MD5 initialization vectors) or characteristic bitwise operation sequences.

Scope of Application and Limitations

It should be noted that AI-assisted reverse engineering is not omnipotent. For highly customized encryption algorithms, encryption logic involving WebAssembly, or signature schemes requiring dynamic environments (such as browser fingerprinting), AI's analytical capabilities may be limited. Additionally, some websites employ anti-debugging techniques (such as infinite debugger loops, code self-detection, etc.), requiring developers to bypass these protections before proceeding with further analysis.

Please note that this article is intended solely for technical learning purposes. In practical applications, please comply with relevant website terms of service and applicable laws and regulations, and use web scraping technology responsibly. According to cybersecurity laws and related judicial interpretations, unauthorized large-scale data scraping may involve legal risks, and developers should conduct technical research within legal and compliant boundaries.