Building a Cold Chain Logistics Optimization Research Project with Codex: A Complete Workflow from Scratch to PDF Paper

Using OpenAI Codex to build a complete research project from empty folder to PDF paper without writing code.
This article details how OpenAI Codex was used to independently complete a cold chain logistics optimization research project based on simulated annealing—from an empty folder to a compiled PDF paper. It covers the phased prompt design methodology, environment configuration, execution observations, and key strategies for leveraging AI in code-based research workflows.
Can AI Code Generation Tools Handle a Complete Research Project?
When people think of AI programming assistants, they typically envision auxiliary tasks like code completion and bug fixing. But what if we raise the bar to building a complete code-based research project from scratch—encompassing mathematical modeling, algorithm implementation, experimental design, scientific visualization, and even LaTeX paper compilation—how far can AI actually go?
A Bilibili content creator conducted a comprehensive experiment using OpenAI Codex: having Codex independently complete a cold chain logistics transportation optimization project based on simulated annealing, going from an empty folder to a final PDF paper without writing a single line of code manually. While positioned as a "demonstration-level" project, the workflow and methodology it showcases offer tremendous reference value for researchers looking to boost their efficiency with AI.
OpenAI Codex is an AI agent system designed for software engineering tasks. Unlike simple code completion tools (such as GitHub Copilot's inline suggestions), it's an intelligent agent capable of autonomously executing multi-step programming tasks within a sandbox environment. Codex can read and write files, execute terminal commands, install dependencies, run tests, and self-correct based on execution results. Built on GPT-series large language models, its architecture adds tool use and environment interaction capabilities, enabling it to handle complex tasks requiring multi-turn reasoning and actual code execution verification. Users describe task requirements in natural language, and Codex automatically plans execution steps and completes them progressively—a paradigm known as "agentic coding."
Project Design: A Carefully Planned Prompt Blueprint
Research Topic and Technology Stack
The chosen research topic for this experiment was cold chain logistics transportation optimization, with the core algorithm being the classic Simulated Annealing (SA) algorithm, implemented in Python. The final deliverables included not only runnable code but also a LaTeX paper and PDF manuscript conforming to Nature Scale style and figure specifications.

Simulated Annealing (SA) is a stochastic optimization algorithm originating from statistical mechanics, inspired by the annealing process in metallurgy—heating metal to high temperatures then cooling it slowly, giving atoms sufficient time to find the lowest-energy crystal lattice arrangement. At the algorithmic level, SA introduces a temperature parameter to control the probability of accepting inferior solutions during the search process. Higher temperatures mean greater probability of accepting worse solutions, helping the algorithm escape local optima. As the temperature gradually decreases according to a preset cooling schedule (such as exponential decay or linear decay), the algorithm progressively converges toward the global optimum or near-optimal solution. SA's core advantage lies in its theoretical guarantee of global convergence (under specific cooling conditions), combined with simple implementation and low requirements on problem structure, making it a classic baseline algorithm for combinatorial optimization problems.
This topic carries considerable engineering complexity: cold chain logistics involves multi-dimensional constraints including temperature requirements, time windows, and vehicle capacity. The simulated annealing algorithm requires designing reasonable initial solution generation strategies, neighborhood operators, and cooling schedules. The cold chain logistics optimization problem is essentially a variant of the Vehicle Routing Problem (VRP) with multiple constraints. Beyond traditional VRP constraints of vehicle capacity and travel distance, it must also consider temperature decay models (temperature changes during transportation), hard time window constraints (customers requiring delivery within specific time periods), refrigeration energy costs (related to door-opening frequency and external temperature differential), and product freshness degradation functions. These constraints make the solution space highly non-convex and discontinuous, and exact solutions are typically NP-hard computationally, making heuristic and metaheuristic algorithms the mainstream choice in practical engineering. While the creator admitted that "parameters weren't specified in detail," the overall framework covers the core elements of a standard code-based research project.
Regarding Nature Scale style paper specifications, Nature-series journals have strict visual requirements for formatting and figures: figures must use sans-serif fonts (such as Helvetica or Arial), font size no smaller than 5pt, line widths between 0.25-1pt, and color schemes must be colorblind-friendly. Figure resolution requires at least 300 DPI (for bitmaps) or vector formats (PDF/EPS). In LaTeX implementation, this typically means using specific document classes and configuring appropriate page margins and citation formats. Being able to automatically generate a paper framework conforming to these specifications means researchers can save considerable time on formatting adjustments and focus on the content itself.
Phased Task Planning
The project was decomposed into multiple clear phases, forming a complete execution roadmap:
- Project Initialization: Create a standardized directory structure (README, environment configuration, data storage, output directories, etc.)
- Data and Business Logic Generation: Construct simulated data for the cold chain logistics scenario
- Mathematical Model and Constraint Review: Define objective functions and constraint conditions
- Core Algorithm Implementation: Initial solution generation, neighborhood operator design, baseline scheme construction
- Experimental Design and Execution: Parameter tuning, comparative experiments
- Scientific Visualization: Generate publication-quality figures
- LaTeX Paper Compilation: Automatically generate the paper and compile to PDF
- Final Delivery and Verification

The creator mentioned that theoretically the project could be split into "1 to 99 phases," with each phase's issues individually inspectable, iterable, and modifiable. This phased strategy is a key methodology for using AI programming tools—the finer the granularity, the stronger the controllability, and the higher the output quality. This principle aligns with the "divide and conquer" philosophy in software engineering: decomposing complex problems into independently verifiable sub-problems not only reduces the cognitive load of each individual task but also makes error localization and fixing more efficient.
Standardized Output Structure Design
The project required Codex to generate a standardized directory structure, including:
README.md: Project documentationrequirements.txt: Python environment dependenciesconfig/: Configuration file directorydata/: Data storage directoryoutput/: Experiment output directorypaper/: LaTeX paper source file directory- One-click run script: Execute all code in a single command
This structured output requirement essentially uses prompt engineering to replace traditional project management. When prompts are sufficiently detailed and standardized, Codex can function like an experienced research assistant, completing all work according to the established framework.
Codex in Practice: Environment Configuration and Execution Details
Key Configuration Points
In actual operation, the creator made several critical configuration choices:
- Model Selection: Used the ultra-high mode (XGBT 5.5)—while token consumption is faster, reasoning capability is noticeably stronger
- Permission Settings: Set to "full access" across the board, requiring no manual approval, giving Codex complete file read/write and command execution permissions
- Pre-installed Environment: LaTeX and Python installed in advance, ensuring Codex can directly invoke compilation tools
- Working Directory: A completely empty folder as the project root directory

Here's an important practical lesson: giving AI tools sufficient permissions and pre-installed environments is a prerequisite for ensuring end-to-end automated execution. If every step requires manual confirmation or dependency installation, the entire automation workflow will be frequently interrupted. This resonates with the DevOps concept of "Infrastructure as Code"—pre-configuring the runtime environment so automated workflows can execute the complete pipeline without obstacles.
Execution Process Observations
After launching, Codex first performed task planning—creating the project structure, generating data models, running experiments, creating LaTeX files, and executing checks. From the screenshots, directories like data/, output/, figures/, and paper/ were progressively established.

Since the prompt content was quite lengthy (covering the complete description of the entire cold chain optimization project), Codex required considerable processing time to complete thinking, optimization, and iteration. The creator also admitted being unable to accurately estimate runtime, reflecting a practical characteristic of current AI programming tools when handling complex tasks—the more complex the task, the more unpredictable the wait time. This unpredictability stems from the autoregressive generation mechanism of large language models: the model needs to generate each token sequentially, and complex tasks often require longer chains of thought (Chain of Thought), plus interactive loops of code execution and error correction, making total elapsed time difficult to estimate in advance.
Methodology Summary: Key Strategies for Efficiently Completing Code-Based Research with AI
Prompt Design Is the Core Competency
The creator repeatedly emphasized one point: prompt quality directly determines output quality. To have AI generate a truly in-depth paper, prompts need to be "extremely, extremely detailed." While this experiment used relatively rough prompts due to its demonstration nature, Codex was still able to complete the basic project scaffolding according to the framework.
Prompt Engineering has evolved from simple instruction writing into a systematic methodology. In research scenarios, high-quality prompts typically need to include the following layers: task definition layer (specifying final deliverables and evaluation criteria), domain knowledge layer (providing necessary technical terminology and constraints), execution specification layer (designating code style, file organization, naming conventions), and quality control layer (defining verification standards and error handling strategies). Research shows that structured prompts (using XML tags, Markdown hierarchies) achieve more stable output quality than free-text prompts. Additionally, prompting strategies like "Chain of Thought" and "Tree of Thoughts" can guide models toward deeper reasoning, which is particularly important in tasks requiring multi-step logical deduction such as mathematical modeling and algorithm design.
The implication is: rather than spending extensive time writing code manually, invest your energy in carefully designing prompts. A mature prompt template can be reused across different research topics, forming a replicable "AI research workflow."
Phased Iteration Outperforms One-Shot Generation
While this demonstration submitted all instructions at once, the creator recommends adopting a phased iteration strategy in actual research: checking and debugging after each phase is completed, confirming correctness before proceeding to the next phase. Although this approach takes longer, it significantly improves the reliability and accuracy of the final output.
The effectiveness of this iterative strategy has a cognitive science basis: large language models exhibit "attention dilution" when processing long contexts—when input information is excessive, the model's attention to each instruction decreases. Phased submission not only keeps each task's context more focused but also allows researchers to inject corrective information at intermediate stages, forming a "human-AI collaborative feedback loop." This closely aligns with the Sprint iteration concept in agile development—short cycles, fast feedback, continuous improvement.
Applicable Scenarios and Limitations
This Codex-driven research workflow is particularly suited for the following scenarios:
- Algorithm comparison experiments: Rapidly implementing multiple optimization algorithms and conducting benchmark tests
- Prototype validation: Quickly verifying the feasibility of research ideas before committing significant effort
- Paper writing assistance: Automatically generating journal-compliant figures and LaTeX document frameworks
However, it's important to clearly recognize its limitations: AI-generated code and papers still require human review, especially regarding the correctness of mathematical models, the reasonableness of experimental results, and the rigor of paper logic—these aspects still depend on researchers' professional judgment. Specifically, current large language models still suffer from "hallucination" problems in mathematical reasoning—potentially generating derivations that appear reasonable but are actually incorrect; in experimental design, they may overlook statistical significance testing or introduce systematic bias; in paper writing, they may cite non-existent references ("hallucinated citations"). Therefore, the researcher's role transforms from "code writer" to "quality auditor" and "direction controller."
Conclusion: The Value and Boundaries of Codex Research Project Construction
From an empty folder to a research project containing complete code, experimental results, and a PDF paper, Codex demonstrates the enormous potential of AI programming tools in research scenarios. While it cannot yet fully replace researchers' professional judgment, as a powerful research accelerator, it can already dramatically reduce the time costs of algorithm implementation and paper formatting.
The most essential conclusion from this experiment is: the key to mastering AI research tools lies not in programming ability, but in whether you can design sufficiently good prompts to precisely describe your research requirements. With this methodology mastered, whether it's cold chain logistics optimization or code-based research projects in other domains, you can leverage Codex for efficient construction from scratch. This marks a paradigm shift in research work: from "manual coding-driven" to "natural language-driven," where researchers' core value will increasingly concentrate on problem definition, methodological innovation, and result interpretation, rather than specific code implementation details.
Related articles

Gemini 3.5 Live Translate Launch: A Deep Dive into the Speech-to-Speech Translation Model Supporting 70+ Languages
Google launches Gemini 3.5 Live Translate, a speech-to-speech translation model supporting 70+ languages. Learn about its end-to-end architecture, Grab partnership, and developer access via Live API.

Gemma 4 12B: Google's Open-Weight Model Runs Locally on Your Laptop
Google releases Gemma 4 12B, an open-weight model that runs locally on laptops. Learn about its performance, local deployment value, and the open-source LLM competitive landscape.

Non-Technical Founders Built a $50K/Month SaaS Product Using AI Tools
Two non-coders built Shipper to $50K MRR in 6 months using AI tools. Learn their reverse-engineering, zero-free-tier, and viral growth playbook for indie developers.