Codex in Action: One Prompt, 47 Minutes, a Complete Algorithm Research Paper

Experiment Overview: A Complete Algorithm Paper Without Writing a Single Line of Code

A researcher recently shared a remarkable experiment: by crafting a single detailed prompt, OpenAI's Codex autonomously produced a complete algorithm research paper draft — including working code, figures, and a full LaTeX manuscript — in just 47 minutes and 51 seconds.

OpenAI Codex is a code generation system built on the GPT large language model architecture, originally released in 2021 as the engine powering GitHub Copilot. In 2025, OpenAI launched an entirely new agentic version of Codex that goes far beyond code completion — it can autonomously execute multi-step tasks within a cloud sandbox environment. The new Codex reads entire repository contexts, plans task sequences, executes code, runs tests, generates files, and submits results as Pull Requests. This leap from "completion tool" to "autonomous agent" is what enables it to handle the complex end-to-end task described here.

The paper's topic was highly complex: "Joint Robust Location-Inventory-Routing Optimization for Fresh Products Considering Freshness Decay Suppression and Time-Varying Refrigeration Emission Reduction Under Dual Carbon Tax and Cap-and-Trade Policies," solved using a Particle Swarm Optimization (PSO) algorithm. Throughout the entire process, the author wrote zero lines of code — everything was generated automatically by Codex.

Technical Background of the Core Problem

The Location-Inventory-Routing Problem (LIRP) is a classic NP-hard problem in supply chain optimization that integrates three traditionally independent sub-problems — facility location, inventory management, and vehicle routing — into a single joint optimization model. This integration avoids suboptimal solutions from sequential decision-making but causes problem complexity to grow exponentially. In cold-chain logistics for fresh products, the model must additionally account for time-dependent freshness decay, carbon emissions from refrigerated transport, and policy constraints like carbon taxes and cap-and-trade systems, further increasing the problem's dimensionality and constraint complexity.

Carbon Tax and Cap-and-Trade are the two primary economic policy instruments for combating climate change globally. A carbon tax imposes a fixed fee per unit of carbon emission, providing price certainty but uncertain total emissions. Cap-and-trade sets an overall emission cap and allows companies to trade allowances on the market, providing quantity certainty but volatile prices. In supply chain optimization research, dual-policy scenarios mean enterprises must both pay carbon taxes and comply with hard emission caps, requiring models to simultaneously handle linear cost terms (tax) and hard constraints (cap), significantly increasing model complexity.

Robust Optimization is a mathematical programming approach for handling uncertainty. Unlike stochastic programming, it doesn't require precise probability distributions for uncertain parameters. Instead, it assumes parameters vary within an uncertainty set and seeks solutions that remain feasible and optimal under worst-case scenarios. In supply chain management, demand fluctuations, transport time variability, and uncertain freshness decay rates are all well-suited to robust optimization frameworks.

Prompt Engineering: The Details Make All the Difference

Project Architecture Design

The author shared an extremely detailed prompt specification covering these core dimensions:

Overall research objective: Clear paper direction, with writing style aligned to the journal Systems Engineering — Theory & Practice
Project directory structure: Pre-designed complete folder and file hierarchy
Code implementation requirements: Based on Python 3.10+, with dependencies like Groovy pre-installed
Model implementation details: Network structure, decision variables, cost functions, carbon emission constraints, freshness decay mechanisms
Algorithm design specifications: HRDC-PSO algorithm encoding/decoding, repair mechanisms, local search, adaptive strategies

Prompt Engineering has evolved from simple instruction writing into a systematic discipline of AI interaction design. Advanced prompt engineering involves task decomposition (breaking complex goals into executable subtasks), context management (providing the most relevant background information within token limits), output format control (constraining generation results through structured templates), and Chain-of-Thought guidance. The author's prompt was essentially a comprehensive system design document containing architecture specifications, interface definitions, quality standards, and acceptance criteria. This engineering-oriented approach to prompt design is becoming a core skill of the AI era.

Paper Specification Requirements

On the writing side, the prompt was equally meticulous:

Chinese-language paper, approximately 10,000 characters, 31 pages
No fewer than 50 references, prioritizing authoritative journals in operations research and management
Figures following Nature journal standards, using Nature scale styling
10 figures and 17 tables in the main text, plus supplementary figures
Complete LaTeX output format

LaTeX is a document preparation system developed by Leslie Lamport on top of Donald Knuth's TeX typesetting engine. It's the standard typesetting tool for academic papers in mathematics, physics, computer science, and engineering. Unlike WYSIWYG editors like Word, LaTeX uses a markup language approach — authors write source code to define document structure and content, which a compiler transforms into final PDF output. Its advantages include beautiful mathematical formula typesetting, automated reference management, and strong cross-referencing consistency. Major journals typically provide official LaTeX templates. AI-generated LaTeX source code means output can be directly compiled into journal-format-compliant manuscripts.

Codex project structure and generation requirements

Results: A Surprisingly Complete Output

Code Layer

Codex created a complete project file structure in one pass, including:

Main controller code (Python entry point)
Data generation module (automatic simulation data creation)
Core algorithm implementation (each algorithm module as a separate file)
Numerical simulation code
Experimental design code
Visualization and plotting code
Parameter settings and configuration files

The author specifically noted that the most impressive aspect was that the entire algorithm codebase actually ran successfully — from base model encoding, data processing, and repair mechanisms to solution output — forming a complete working pipeline.

Particle Swarm Optimization (PSO) is a swarm intelligence metaheuristic algorithm inspired by bird flocking behavior, proposed by Kennedy and Eberhart in 1995. Each "particle" represents a candidate solution in the search space, updating its velocity and position by tracking its personal best position and the swarm's global best position. PSO is widely applied to continuous optimization problems due to its simple implementation, few parameters, and fast convergence. The HRDC-PSO in this paper is an enhanced variant incorporating hierarchical repair mechanisms, dynamic constraint handling, and adaptive strategies on top of standard PSO, specifically designed for combinatorial optimization problems with multiple constraints. Codex's ability to correctly implement this complex algorithm variant demonstrates a remarkably deep "understanding" of metaheuristic algorithm design patterns.

Generated complete file structure

Paper Layer

The generated 31-page manuscript contained a complete academic paper structure:

Abstract and keywords
Introduction and literature review (with comparative literature tables)
Problem description and notation
Model formulation (objective function, constraints, freshness decay mechanism, carbon policy mechanism)
Algorithm design (with algorithm flowcharts)
Numerical experiments and results analysis
Conclusions

Figure and Table Quality

The generated figures and tables were generally usable, with some already at publication quality. Issues mainly centered on:

Occasional misalignment of figure elements
Minor layout glitches when combining ABCD subplot panels
Complex visualizations like heatmaps requiring fine-tuning

The author's assessment: "With proper polishing, these are already good enough for publication."

Codex's ability to search for references

Comparison and Reflections: Where Are the Boundaries of AI-Assisted Research?

Comparison with Previous Experiment

Dimension	Previous Experiment	This Experiment
Runtime	15-20 minutes	47 minutes 51 seconds
Complexity	Basic demonstration	Multi-constraint joint optimization
Output Quality	Rough	Significantly deeper theoretical content

Feasibility for Actual Publication

The author revealed their real submission experience: using a similar method (segmented prompts + iterative refinement), they have successfully submitted to Computers & Industrial Engineering (Chinese Academy of Sciences Zone 1 TOP journal), which is currently under external review. The first paper took about two weeks; once the workflow is established, the estimated turnaround is one paper per week.

Computers & Industrial Engineering is a top journal in industrial engineering and operations management published by Elsevier. It holds a CAS JCR Zone 1 TOP ranking with an impact factor consistently between 6-8, primarily publishing high-quality research in operations optimization, production scheduling, supply chain management, and intelligent manufacturing. Successfully submitting to this journal and reaching the external review stage indicates that AI-assisted papers have already met a considerable threshold in terms of academic rigor and novelty.

Current Limitations

Limited depth in single-pass generation: While theoretical depth has improved, segmented prompting with iterative refinement still produces better results
Figures require manual adjustment: AI-generated figures often have minor positioning and detail issues, making them unsuitable for direct submission
Domain expertise is still essential: Designing the prompt itself requires deep understanding of the research area — this is far from "zero barrier to entry"

Overall quality of AI-generated paper

Conclusion: AI-Assisted Research Has Entered the Practical Stage

This experiment clearly demonstrates that current AI coding tools like Codex have reached a remarkably practical level for algorithm-focused paper writing. The key question isn't whether AI can fully replace researchers, but how to collaborate efficiently:

Prompt engineering is a core competitive advantage: Detailed, structured requirement descriptions directly determine output quality — writing good prompts is itself a professional skill
Human-AI collaboration is more efficient: AI handles code implementation and draft writing; humans handle direction-setting and quality calibration
Research efficiency gains are order-of-magnitude: From the traditional timeline of several months down to one or two weeks

For researchers, learning to collaborate with AI and mastering prompt engineering may hold more strategic value than simply improving programming skills. It's worth noting that this trend doesn't eliminate scientific creativity — true innovation still comes from problem insight, methodological breakthroughs, and critical interpretation of results. What AI tools change is the execution efficiency of converting creative ideas into publishable outputs. Future research competitiveness will increasingly depend on whether researchers can find optimal task division in human-AI collaboration — letting AI handle repetitive coding, typesetting, and formatting work while concentrating human cognitive resources on the tasks that genuinely require creativity and judgment.