Nature Skills: A Complete Guide to Automating Academic Paper Workflows with Claude Code

Project Overview: Giving AI a Professional Operations Manual

A PhD student from Shanghai Jiao Tong University has open-sourced a project called Nature Skills, which fully automates the entire academic paper workflow—from writing to publication—using Claude Code's Skills mechanism. The project includes 7 Skills, 15 output file types, and is fully open-sourced under the MIT license.

Nature Skills Project Introduction

The Essence of Skills: From "Relying on Luck" to "Relying on Rules"

What is a Skill? Simply put, it's like giving Claude a professional operations manual.

Claude Code is a command-line development tool from Anthropic that allows developers to interact with Claude using natural language to complete programming tasks. Its Skills mechanism is a structured prompt engineering approach—developers can place Markdown-formatted rule files in their project directory, and Claude automatically loads these files as behavioral constraints when executing tasks. This essentially turns System Prompt engineering into something modular and systematic, transforming AI behavior from one-off conversational tuning into standardized configurations that are version-controllable, reusable, and shareable. Skills files typically contain YAML front matter (for trigger condition logic) and Markdown body content (for defining workflows and output specifications).

Without a Skill, asking Claude to polish a paper means it follows its own interpretation—the style doesn't match journal requirements, sentences are too long, word choices aren't academic enough, and results vary every time. With a Skill, Claude first identifies the paper type, then polishes according to Nature journal standards: no more than 30 words per sentence, word choices based on evidence strength, and consistent adherence to the same standards every time for stable, reproducible results.

The core difference: Without a Skill, Claude relies on luck; with a Skill, Claude relies on rules.

Breaking Down the 7 Skills One by One

1. Nature Polishing (Academic Writing Polish)

It's not just about fixing language—it first identifies the paper type and uses different narrative logic accordingly. More critically, it helps you discover logical gaps—flagging them directly rather than polishing over them.

2. Nature Citation (Automated Literature Citation)

The most technically sophisticated of the 7 Skills. It calls the Crossref API and PubMed, segments the paper, automatically matches literature within the CNS scope for each section, and assigns a support level rating to each reference.

Crossref is the world's largest academic literature metadata registration agency, managing over 150 million DOI (Digital Object Identifier) records. Its open API allows developers to search for structured information like titles, abstracts, and citation relationships using keywords, authors, journals, and other criteria—all free of charge. PubMed is a biomedical literature database maintained by the U.S. National Library of Medicine, containing over 36 million literature records, with its E-utilities API also providing free programmatic search access. The Nature Citation Skill combines calls to both APIs to automate the process from paper text to matched literature, eliminating the inefficiency of traditional manual searches.

CNS here is the academic shorthand for Cell, Nature, and Science—the three top comprehensive journals—and broadly includes their sub-journal series (such as Nature Methods, Nature Communications, Cell Reports, etc.). These journals represent the highest-level research achievements across disciplines, and their citations carry extremely high weight in academic evaluation systems. Limiting literature matching to CNS-level publications ensures citation quality and aligns with the convention of prioritizing top-journal references when submitting high-level papers.

Output comes in 5 formats: ENW can be directly imported into Zotero, and TSV can be filtered in Excel. ENW (EndNote Export Format) is a universal citation exchange format supported by virtually all major reference management tools. Zotero is an open-source, free reference management software widely used in academia, supporting one-click literature capture via browser plugins, automatic reference list generation, and deep integration with Word/LaTeX. TSV (Tab-Separated Values) format can be opened directly in Excel or Google Sheets, making it convenient for researchers to manually screen and annotate candidate references. The multi-format output design reflects engineering thinking—the same data adapts to different downstream tool chains.

3. Nature Fig (Scientific Figure Creation)

What makes it special is its "figure contract mechanism": before drawing any figure, you must first define what scientific question the figure answers, preventing the bad habit of drawing figures first and fitting conclusions afterward.

4. Nature Reader (Bilingual Paper Reading)

Converts papers into bilingual side-by-side Markdown files, with figures placed next to the body text that references them rather than stacked at the end. It also includes a traceability ID system (S001, S002), which is particularly useful for knowledge graph construction.

5. Nature Paper2PPT (Paper to Chinese PPT)

7 scientific questions drive the presentation logic, defaulting to 12-16 slides, with Chinese content and English terminology preserved, ready to open directly in PowerPoint or WPS.

6. Nature Data (Data Availability Statement)

Helps you draft data availability statements compliant with Nature's policies, accepts Chinese input, and produces English output. There's one important red line: it explicitly prohibits fabricating DOIs.

7. Nature Response (Reviewer Response Letter)

Transforms reviewer comments into structured point-by-point response drafts. It's the only Skill among the 7 that has both example files and test scoring criteria, indicating the author takes quality control seriously.

Practical Demo: Running Through a Simulated Paper in 5 Steps

The author used a simulated draft titled "Comparative Study of Deep Learning Methods for Protein Structure Prediction" and ran through 5 steps in the recommended order:

Step 1 - Polishing: Beyond language standardization, it discovered logical issues in the paper—the Abstract mentioned MSA depth analysis and hybrid strategies, but the Results section had no corresponding subsections. Instead of covering up the problem, it left annotations telling you what must be added before submission.

Step 2 - Literature Citation: Produced 80 candidate references, each with a support level rating, output in 5 formats simultaneously. This is something pure conversation simply cannot achieve.

Step 3 - Figures: SVG, PDF, PNG, and Python source code—all 4 formats included. The significance of preserving source code is that when data changes, you just modify one line and re-run (though the generated figures had text overlap defects).

Step 4 - Data Statement: Comes with a Chinese checklist that can be directly pasted into the manuscript.

Step 5 - PPT: 13 Chinese slides with speaker notes, plus a quality inspection report telling you what needs attention.

The value of this output isn't writing your paper for you—it's producing a set of engineered files that can be continuously iterated upon.

9 Skill Writing Patterns: Learn Them and Write Your Own

By dissecting the source code, 9 core writing patterns were identified:

Patterns 1-3: Structural Level

The Description field determines trigger timing: Write keywords in both Chinese and English, and also specify when NOT to use it. Without clarity, the Skill won't activate when needed—making it useless.
Keep the main file lightweight, load details on demand: Put core rules in the main file, scenario-specific details in sub-files. Cramming everything together fills up Claude's context and actually degrades performance. Large language models like Claude have a fixed context window limit—the maximum number of tokens processable in a single conversation. When Skill files are too large, they consume significant context space, leaving less room for user input and model reasoning, leading to decreased output quality or information loss. Therefore, the "lightweight main file, load details on demand" design principle is essentially information density optimization within limited context resources—loading specific sub-files only when needed ensures the model always has sufficient reasoning space. This aligns with the lazy loading concept in software engineering.
Workflows must be numbered: Explicitly write out what to do first, what to do second. Without ordering, Claude does whatever comes to mind, producing unstable results.

Patterns 4-6: Specification Level

Clearly define default behaviors: What to do when there are no special instructions must be explicitly specified; otherwise, the model improvises at edge cases.
Specify output format templates: Design outputs like API interfaces—fixed fields, fixed order. Without specifications, the format differs every time and can't be used directly.
Rules must have sources: From official documentation, papers, or authoritative courses. Rules without sources rely on intuition with no quality guarantee.

Patterns 7-9: Engineering Level

Both scale approaches work: Simple rules can fit in a single file; complex rules use a main file plus multiple sub-files. Nature Paper2PPT uses a 495-line single file, Nature Response uses a main file plus 9 sub-files—both work well.
Include examples and test cases: Write a few demonstrations of what good output looks like, plus a few test cases to verify effectiveness.
Dedicated sections for Chinese-language users: Accept Chinese input, handle terminology conversion precisely, and clearly specify the final output language. 4 out of 7 Skills specifically include this section.

Skill Writing Template: Minimum Necessary Structure

Core components of the template distilled from the 9 patterns:

YAML Gate: Required
Description: Write trigger conditions
Default Stance: Core principles and prohibited behaviors
Workflow: Must be numbered, specifying execution order
Output Format: Fixed output format
Relative Files: Routing table—which sub-file to load under what conditions
Source Hierarchy: Rule sources

One-sentence summary: A good Skill = Trigger conditions + Ordered workflow + Fixed output format + Clear red lines + Sourced rules + Verifiable results

Extension Direction: Knowledge Graph Automated Evaluation Pipeline

Common knowledge graph experiment workflows (triple extraction, F1 scoring, QA pair generation, RAG retrieval, graph retrieval, etc.) can all be individually encapsulated as Skills following the Nature Skills writing conventions, forming a complete automated evaluation pipeline.

A Knowledge Graph is a technology that stores knowledge in graph structures, composed of nodes (entities) and edges (relationships). A Triple is the smallest unit of a knowledge graph, in the format (subject, predicate, object), for example (AlphaFold2, predicts, protein structure). The F1 score is the standard metric for evaluating triple extraction quality, considering both precision and recall. RAG (Retrieval-Augmented Generation) is the mainstream architecture for current AI applications, reducing LLM hallucinations by first retrieving relevant knowledge before generating answers. Encapsulating these steps as Skills enables end-to-end automation from paper reading to knowledge ingestion.

Natural integration points with existing Nature Skills:

Nature Reader's traceability ID system (S001, S002) naturally serves as a prototype for knowledge graph node IDs—triples can be extracted while reading papers
Citation relationships produced by Nature Citation directly become edges in the graph
Nature Figure can visualize knowledge graphs according to journal specifications

This approach isn't limited to academic scenarios—anywhere there are repetitive professional tasks, the Skill mechanism can achieve standardized, reproducible automated execution.

Key Takeaways

The Nature Skills project contains 7 Skills covering the entire academic paper workflow from polishing, citation, figure creation to reviewer responses, open-sourced under MIT license
The core value of Skills is transforming AI from "relying on luck" to "relying on rules," achieving stable and reproducible output through predefined professional operations manuals
Source code analysis yielded 9 Skill writing patterns, with the core formula: Trigger conditions + Ordered workflow + Fixed output format + Clear red lines + Sourced rules + Verifiable results
Nature Reader's traceability ID system and Nature Citation's citation relationships naturally map to knowledge graph node and edge construction
This methodology can be extended to any scenario with repetitive professional tasks, not limited to academic writing