The Fatal Flaws of Using Doubao/DeepSeek to Write Paper Drafts — And the Right Approach

AI should assist with structuring papers, not generate content — references must come from real databases.
General-purpose AI tools like Doubao and DeepSeek are not connected to CNKI, meaning directly generating papers leads to fabricated references and high AI detection rates. The correct approach is to first obtain real literature from CNKI, then have AI assist with organizing an outline framework, with humans ultimately writing the content and reading through to internalize it — positioning AI as a "thinking assistant" rather than a "content producer."
The Core Problem: AI's Fatal Weakness in Academic Writing
When it comes to writing a thesis, many students' first instinct is to open Doubao or DeepSeek and have AI generate a complete draft. It seems efficient, but it actually creates enormous hidden risks.

According to analysis from a Bilibili creator, the core reason is: these general-purpose AI tools are not connected to the CNKI (China National Knowledge Infrastructure) database. This means the references they generate are mostly "creative improvisation" — they look properly formatted but may not actually exist. You think the AI is doing academic work, but it's actually doing creative writing.
The Technical Mechanism Behind AI Hallucination and Fabricated References
The fundamental reason Doubao, DeepSeek, and other large language models fabricate references lies in their working principle: probability-based text generation rather than database retrieval. After training on massive text corpora, LLMs learn the "pattern of what academic citations should look like" — including author name formats, journal naming conventions, year distributions, and more. When you ask them to provide references, they're actually "continuing" a text sequence that looks plausible, not pulling records from a real database. This phenomenon is known as "Hallucination" in the AI field — where models output factually incorrect information with high confidence. Unless a model is connected in real-time to academic databases like CNKI or Web of Science through RAG (Retrieval-Augmented Generation) technology, any specific citation information it generates cannot be trusted.
Common Mistakes: The Copy-Paste Integration Approach
Mistake #1: Having AI Generate the Entire Paper from Scratch
Directly asking Doubao or DeepSeek to write a complete paper produces content with two fatal problems:
- Fabricated references: AI will invent seemingly real paper titles, authors, and journal names — if your advisor or plagiarism detection system verifies them, the consequences are severe
- Obvious AI fingerprints: The writing patterns of generated content are highly uniform, making detection by AI identification tools extremely likely
Mistake #2: Finding Literature First, Then Dumping Everything into AI
Some students take a slightly more advanced approach — they find a bunch of literature on CNKI first, then dump it all into AI for integration. The problem is that the AI-integrated text still carries strong machine-generated characteristics. When they check the AI detection rate, they're left speechless — after three days and nights of manual editing, the AI rate hasn't budged.
What's worse, detection tools like Weipu and Gezida are constantly upgrading their algorithms, making manual hard-editing less and less effective — it's inefficient and easily breaks the logical flow.
How AI Detection Tools Work
Tools like Weipu, Gezida, and international ones like GPTZero and Turnitin AI Detection primarily base their detection on two metrics: text Perplexity and Burstiness. Perplexity measures how predictable the text is — AI-generated text consistently selects high-probability words, resulting in universally low perplexity and writing that's too "smooth and uniform." Burstiness measures the variation in sentence length and complexity — human writing typically alternates between long and short sentences with significant stylistic fluctuation, while AI output tends to be steady and even. Additionally, these tools analyze vocabulary diversity, sentence structure repetition rates, transition word frequency, and other features. As detection algorithms continue to iterate and train, simple synonym substitution or sentence restructuring is increasingly unable to fool the system — which is exactly why "manual hard-editing" is becoming less and less efficient.
The Right Approach: An Efficient CNKI + DeepSeek Workflow
Rather than struggling with manually reducing AI detection rates, it's better to solve the problem at its source. Here's a relatively hassle-free workflow that can produce a properly formatted draft framework with a relatively stable AI rate in 30 minutes.
Step 1: Obtain Real Literature from CNKI
- Enter your paper's keywords in CNKI
- Filter for literature from the past 5 years (ensuring timeliness)
- Click the grid view icon to quickly browse abstracts and keywords
- Select a dozen or so papers that are genuinely useful
Key reminder: Remember to directly export citation formats for your references (GB/T 7714, etc.) — this saves a lot of formatting work later.
Practical Tips for CNKI Literature Search
CNKI is China's largest academic literature database, containing over 8,000 academic journals, master's and doctoral theses, conference papers, and other resources. When searching, the "past 5 years" filter isn't just about timeliness — many universities' thesis writing guidelines explicitly require that references from the past 5 years account for no less than 50% of total citations. The "grid view icon" refers to the grid display mode on CNKI's search results page, which allows you to quickly browse each paper's abstract, keywords, and citation count without opening them individually, greatly improving screening efficiency. GB/T 7714 is the Chinese national standard "Information and Documentation — Rules for Bibliographic References," and CNKI supports one-click export in this format, allowing you to paste directly into your reference list and avoid the formatting errors common in manual entry.
Step 2: Have AI Create an Outline, Not the Full Text
Organize your selected literature and research direction, then hand it to DeepSeek to create an outline first.
The key here is: don't ask it to generate the full text right away — otherwise you'll most likely end up with another serving of "AI-flavored rice." Let AI help you organize the logical framework, clarifying what each chapter covers and which literature supports it. Once the outline is ready, you'll find your thinking suddenly becomes much clearer.
Step 3: Apply Your University's Format Template
Table of contents, headers and footers, reference formatting — for these mechanical typesetting tasks, use tools instead of staying up all night adjusting them yourself. Apply your university's formatting requirements directly and let the system handle the formatting issues.
Step 4: Read Through and Internalize
The final and most critical step: read through the content yourself, making sure you understand what every paragraph is saying. After all, when your advisor casually asks during the defense "what does this paragraph mean," you can't look like you're seeing your own paper for the first time.
Summary: AI Is a Tool, Not a Replacement
The core logic of correctly using AI for academic writing is:
- References must be real: Obtained from legitimate databases like CNKI, not fabricated by AI
- AI handles the framework: Let it help you clarify your thinking and structure, not generate final text
- Humans handle the content: Based on real literature, organize arguments in your own words
- Leave formatting to tools: Mechanical typesetting work doesn't need to be done manually
The essence of this workflow is demoting AI from "content producer" to "thinking assistant" — leveraging AI's efficiency advantages while avoiding the risks of high AI detection rates and fabricated references. Rather than spending three days reducing your AI rate, spend 30 minutes getting your workflow right.
RAG Technology and the Future of Academic AI Tools
It's worth noting that a new category of AI tools is emerging in the academic writing space, connecting large language models to real academic databases through RAG (Retrieval-Augmented Generation) technology. Tools like Consensus, Semantic Scholar, and Elicit can retrieve real papers based on user queries, then have AI summarize and synthesize findings. The core advantage of this architecture is that generated content is verifiable — every argument can be traced back to a specific source. Similar products in China are also beginning to integrate with CNKI, Wanfang, and other databases. In the future, the development direction of academic AI tools will inevitably be "retrieval first, generation second," rather than the current general model's "generate from memory" approach. However, even so, AI output still requires human review, as models may misinterpret literature or take quotes out of context.
Key Takeaways
- Doubao and DeepSeek are not connected to the CNKI database — their generated references are mostly fabricated, making them unsuitable for directly writing paper drafts
- Having AI generate full text or integrate literature results in extremely high AI detection rates that are difficult to reduce through manual editing
- The correct approach is to first obtain real literature from CNKI, then have AI assist with generating an outline framework rather than the full text
- References should be exported directly from CNKI in standard citation format to avoid later formatting rework
- The final content must be read through and internalized to ensure you can explain every section during your defense
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.