Andrew Ng's New Course Explained: A Practical Guide to Using OpenAI's O1 Reasoning Model

Course Background: From Chain of Thought to Test-Time Scaling

Andrew Ng and Colin Jarvis, Head of AI Solutions at OpenAI, have jointly launched a new short course — Reasoning with O1 — that systematically explains how to effectively use OpenAI's O1 reasoning model. Continuing Deeplearning.AI's tradition of accessible education, this course covers everything from foundational concepts to hands-on applications, helping developers master this milestone model.

Looking back, in Deeplearning.AI's very first short course, OpenAI's Iza Fulford taught how to optimize GPT-3.5 performance through prompting techniques. The most critical technique was Chain of Thought (CoT) — instructing the model to "think step by step" rather than jumping directly to an answer.

Chain of Thought prompting example

This technique originated from the seminal 2022 paper Chain of Thought Prompting Elicits Reasoning in Large Language Models by Jason Wei et al. from Google Brain. The core idea: by providing an example in the prompt that breaks a complex problem into simple steps, the model mimics this step-by-step reasoning approach when answering, significantly improving problem-solving accuracy.

Chain of Thought prompting became one of the most influential technical discoveries in the LLM field in 2022 because it solved a long-standing problem: large language models performed poorly on math and logic problems requiring multi-step reasoning — models tended to "jump" directly to answers rather than going through intermediate reasoning steps. Wei et al.'s breakthrough finding was that simply showing intermediate reasoning steps in few-shot examples caused models to automatically mimic this step-by-step reasoning pattern. For example, when solving "Roger has 5 tennis balls, buys 2 cans of 3 balls each — how many does he have now?", showing the reasoning process "5 + 2×3 = 11" in the example caused the model to unfold similar step-by-step reasoning for new problems. This technique boosted PaLM 540B's accuracy on the GSM8K math benchmark from 17.9% to 58.1% — a remarkable improvement.

O1's Core Breakthrough: Autonomous Reasoning and Test-Time Scaling

OpenAI's O1 model elevates Chain of Thought to an entirely new level. Unlike previous models that required users to manually guide "step-by-step thinking" in prompts, O1 uses Reinforcement Learning (RL) fine-tuning to enable the model to autonomously incorporate Chain of Thought into its response process.

O1 autonomously incorporating Chain of Thought

The reinforcement learning fine-tuning used by O1 is fundamentally different from traditional supervised fine-tuning. Supervised fine-tuning trains models to imitate human-annotated "correct answers," while reinforcement learning fine-tuning lets models learn optimal strategies through trial and error. Specifically, OpenAI likely employed a training method similar to RLHF (Reinforcement Learning from Human Feedback) but more focused on the reasoning process: the model generates multiple reasoning chains, the system provides reward signals based on final answer correctness, and the model gradually learns which reasoning strategies are more effective. The key advantage of this approach is that the model learns not just "what the correct answer is" but "how to find the correct answer" — the thinking process itself. This is why O1 can autonomously produce Chain of Thought reasoning without users explicitly guiding it in prompts.

In the course, Colin Jarvis particularly emphasizes that while O1's current performance is impressive, its truly long-term significant breakthrough lies in Test-Time / Inference-Time Scaling. OpenAI discovered that O1's performance can be continuously improved along two dimensions:

Train-Time Compute: More reinforcement learning training yields better foundational capabilities
Inference-Time Compute: The more time the model spends "thinking" when answering questions, the better the performance

This means we've gained an entirely new dimension for LLM performance scaling. Previously, we primarily improved performance by increasing model parameters and training data (i.e., train-time scaling). Now, even without changing the model itself, simply increasing computation during inference yields significant performance improvements. This paradigm shift has profound implications for the entire AI industry.

To understand the far-reaching significance of this breakthrough, we need to revisit the limitations of traditional Scaling Laws. The classic Scaling Law proposed by Kaplan et al. at OpenAI in 2020 primarily focused on the training phase: model parameters, training data volume, and training compute share a predictable power-law relationship. While effective, this path faces real challenges including data exhaustion (high-quality internet text is nearly "used up"), soaring energy consumption, and hardware bottlenecks. Test-time scaling provides a complementary path: improving output quality by having the model do more "internal thinking" during inference — such as generating longer reasoning chains, exploring multiple reasoning paths, performing self-verification, and backtracking. This is analogous to the human intuition that "thinking longer about a hard problem" leads to better answers. From an economic perspective, inference-time compute is consumed on-demand — extra computational resources are only invested when high-quality output is needed — while train-time compute is a one-time fixed investment, making resource allocation more flexible and efficient.

Core Course Content: From Prompting to Multi-Model Collaboration

O1 model use cases

The course covers six core modules in a progressive structure:

O1 Model Overview and Use Cases

O1 isn't suitable for every scenario. The course first helps learners determine which tasks are appropriate for O1, when to choose smaller and faster models, or when to combine different models. This pragmatic model-selection mindset is crucial — the most powerful model isn't always the best choice.

The New Paradigm of O1 Prompt Engineering

The best way to prompt O1 is significantly different from earlier models. This point deserves special attention. The various prompting techniques we've grown accustomed to (such as manually adding "please think step by step") may no longer be necessary or could even be counterproductive with O1, since the model already has built-in autonomous reasoning capabilities. The course explains in detail how to adjust prompting strategies for O1.

Planning Capabilities for Complex Multi-Step Tasks

The course demonstrates through a supply chain logistics optimization case study how to use O1 as an "Orchestrator" for task planning while using GPT-4o as a "Worker" to execute specific tasks. This O1 planning + lightweight model execution architectural pattern achieves an excellent balance between cost and effectiveness.

This orchestrator-worker architecture pattern is essentially the "separation of concerns" principle from software engineering applied to AI systems, also known in the industry as "Agentic Architecture." In this architecture, O1 as the orchestrator handles understanding global objectives, decomposing tasks, creating execution plans, and handling exceptions — all high cognitive-load tasks requiring deep reasoning. Lightweight models like GPT-4o-mini serve as workers, completing specific, relatively standardized subtasks such as text generation, data extraction, and format conversion. The economic benefits of this division are significant: O1's API call costs are far higher than lightweight models (typically 10-50x more expensive). If all tasks were processed by O1, costs would be prohibitively high. By having O1 handle only the planning steps that most require reasoning capabilities, overall costs can be reduced by an order of magnitude while maintaining output quality close to an all-O1 approach.

Code Generation and Image Reasoning

O1 performs exceptionally well on programming tasks, and the course showcases its code generation capabilities.

Image understanding capabilities

Image understanding has long been a pain point for AI in production environments, but O1 demonstrates unprecedented performance levels on visual reasoning tasks. The course guides learners through experiencing this breakthrough visual reasoning capability.

Meta-Prompting: Using AI to Optimize AI

The final module teaches how to use O1 to generate and optimize prompts themselves — the so-called "Metaprompting" technique. This is an advanced approach of using AI to optimize AI, leveraging O1's reasoning capabilities to improve prompt engineering.

Meta-prompting represents an important step in the evolution of prompt engineering from craft to automation. Traditional prompt engineering is highly dependent on human experts' experience and intuition — engineers need to repeatedly experiment with different wordings, structures, and examples to find the most effective prompts. Meta-prompting leverages O1's powerful reasoning capabilities to automate this process: developers describe the target task and desired output, O1 analyzes the task characteristics and generates optimized prompts, and can even iterate on improvements. The theoretical foundation of this approach is that O1 has an implicit "understanding" of how language models work — it knows what instruction formulations most effectively guide models toward desired behaviors. In practice, meta-prompting has been proven to generate more effective prompts than those handwritten by human experts on complex tasks, especially in scenarios requiring precise control over output format and reasoning depth.

Key Takeaways and Practical Recommendations

This course conveys several signals crucial for AI developers:

Test-time scaling opens a new path for performance improvement. Over the past few years, the dominant theme in AI has been "bigger models, more data." O1's emergence demonstrates that computational investment during the inference phase is equally an effective lever for improving performance. This provides a new optimization direction for compute-constrained scenarios.

Model selection matters more than model capability. The course repeatedly emphasizes that O1 isn't a silver bullet — you need to choose the appropriate model based on task characteristics. In production environments, a hybrid architecture where O1 handles planning and smaller models handle execution is likely the most cost-effective approach.

Prompt engineering is evolving, not dying. Although O1 has built-in autonomous reasoning capabilities, prompt engineering hasn't become unimportant — it needs to adapt to new paradigms. Understanding how the model works is key to writing the most effective prompts.

For developers and AI practitioners looking to deeply understand the O1 model, this free course provides a systematic starting point. The course includes slides and code, suitable for learners from beginner to advanced levels.