Two Years of AI Growth: From Passively Following Instructions to Proactively Understanding Intent

From "Tell Me What to Do" to "I Know What to Do"

Recently, a brief reflection on Twitter struck a chord with professionals across the AI industry:

"I've grown up a lot in the past two years. Now you don't even have to tell me what to do!"

Original Tweet

Though short, this statement perfectly captures the most fundamental evolution of AI — especially large language models (LLMs) — over the past two years: the shift from passively executing instructions to proactively understanding intent and autonomously completing tasks.

AI Two Years Ago: An Instruction-Following Order Taker

Looking back at the early days of ChatGPT, users had to put in considerable extra effort:

Carefully crafted prompts: Every detail had to be explicitly spelled out in the prompt. Even a small omission could lead to results that missed the mark entirely.
Repeated iterations and corrections: AI frequently gave irrelevant answers, requiring multiple rounds of dialogue to steer it back on track.
Manual task decomposition: Complex work had to be broken down into small steps and fed to the model one at a time.

Back then, AI was more like a highly capable but "common-sense-deficient" intern — it did exactly what you told it, but never thought one step ahead on its own.

This situation gave rise to an entirely new field of technical practice: Prompt Engineering. During the GPT-3 and early ChatGPT era, this was practically a standalone skill. Practitioners needed to master various prompting strategies such as Few-shot Learning (providing a few examples in the prompt to guide the model), Chain-of-Thought (asking the model to "think step by step" to improve reasoning quality), and role-playing. The quality gap between a good prompt and a bad one could be enormous. This even spawned the new job title of "Prompt Engineer" and dedicated prompt marketplaces. However, as model capabilities advanced rapidly, the models themselves began to internalize these techniques, and users no longer needed to deliberately construct complex prompt structures — this is the most tangible manifestation of AI's "growth."

Today's AI: Intent Understanding and Autonomous Action

From Prompt Engineering to Natural Conversation

We are witnessing a qualitative leap in AI capabilities. The new generation of large language models, represented by Claude, GPT-4o, and Gemini, demonstrate remarkable "proactiveness":

Intent inference: When a user states a vague request, AI can understand the real purpose behind it.
Context awareness: No need to repeat background information — the model automatically connects previous conversations and known information.
Proactive completion: When AI detects details the user has overlooked, it proactively fills in the gaps rather than mechanically executing.

Several key technical breakthroughs underpin AI's leap from "executing instructions" to "understanding intent." First is the maturation of RLHF (Reinforcement Learning from Human Feedback) and RLAIF (Reinforcement Learning from AI Feedback) — these training methods teach models to align with humans' true intentions rather than merely responding to literal meanings. Second, alignment techniques like Constitutional AI enable models to understand user needs while maintaining safety boundaries. Additionally, through large-scale instruction fine-tuning datasets and more refined training methods, models' Instruction Following capabilities have improved dramatically, enabling them to extract core intent from vague, incomplete, or even contradictory instructions. OpenAI's o-series reasoning models and Anthropic's Claude 3.5/4 series, with their breakthroughs in reasoning capabilities, have further equipped models with the metacognitive ability to "think about what the user truly wants."

At the same time, AI's ability to "remember" conversation content has undergone a qualitative transformation. This is closely tied to the expansion of the Context Window. Early GPT-3.5 had a context window of only 4K tokens (roughly 3,000 English words) — slightly longer conversations would cause the model to "forget" earlier content. By 2025, Claude's context window has expanded to 200K tokens, and Gemini supports million-token inputs. Beyond raw window expansion, RAG (Retrieval-Augmented Generation) technology allows models to retrieve relevant information from external knowledge bases before generating responses, while long-term memory mechanisms (such as Mem0 and MemGPT) enable AI to retain memory of user preferences and interaction history across sessions. Together, these technologies form the infrastructure for AI's "context awareness" capabilities, making "no need to repeat background information" a reality.

The Rise of the AI Agent Paradigm

The deeper transformation lies in the gradual maturation of AI Agents. Today's AI no longer just "answers questions" — it can take on far more complex roles:

Autonomous planning: After receiving a goal, it independently formulates an execution plan and progresses step by step.
Tool calling: It proactively searches for information, executes code, and calls APIs as needed.
Self-correction: When it detects incorrect intermediate results, it automatically backtracks and revises its approach.

From a technical architecture perspective, AI Agents typically consist of four core modules: Perception, Planning, Memory, and Action. Planning capabilities rely on reasoning frameworks like ReAct (Reasoning + Acting) and Tree of Thoughts, enabling models to decompose complex goals into executable subtasks. Tool Use / Function Calling, through standardized API interfaces, allows models to access external resources such as search engines, databases, and code execution environments. OpenAI's Assistants API, Anthropic's Tool Use protocol, and open-source frameworks like LangChain are all driving the standardization of the Agent ecosystem. It is the synergistic evolution of these technologies that has transformed AI from a mere text generator into an autonomous system capable of perceiving its environment, formulating strategies, and taking action.

This is the technical essence behind "you don't have to tell me what to do" — AI is evolving from a mere tool into a true collaborator.

What Does Increased AI Autonomy Mean?

For Everyday Users: A Dramatic Drop in the Barrier to Entry

The barrier to using AI is falling rapidly. In the past, you needed to learn various prompt techniques to use AI effectively. Now, you simply express your needs as naturally as you would when talking to a colleague. This means AI tools will reach a much broader audience.

For Developers: From Writing Code to Reviewing Code

AI coding assistants (such as Cursor and Claude Code) have evolved from simple "code completion" to "understanding project intent and developing autonomously." The developer's role is shifting from personally writing every line of code to reviewing and ensuring the quality of AI-generated code.

The technical evolution in this field has gone through three distinct phases. Phase one was code completion tools represented by GitHub Copilot, essentially auto-completion based on code context — like an extraordinarily powerful "smart input method." Phase two was IDE-integrated solutions represented by Cursor, where models could understand the entire project's codebase structure and perform cross-file code modifications and refactoring. Phase three is autonomous coding Agents represented by Claude Code, Devin, and OpenAI Codex, which can understand requirements described in natural language, autonomously create files, write tests, debug errors, and even manage Git version control. Benchmark data from SWE-bench vividly reflects this leap: the autonomous resolution rate for real GitHub Issues by the latest models has risen from less than 5% in 2023 to over 50% in 2025. Developers' core competitive advantage is shifting from "coding speed" to "architectural design ability" and "requirements judgment."

For the Entire Industry: A Fundamental Shift in Human-AI Collaboration

When AI can autonomously understand and execute tasks, the model of human-AI collaboration will undergo a fundamental transformation. The core value of humans will increasingly be reflected in goal setting, value judgment, and creative direction, rather than in the execution layer itself.

Final Thoughts

In just two years, AI has grown from a tool that required precise instructions to function into an intelligent partner capable of understanding intent and taking proactive action. The pace of this "growth" far exceeds the learning curve of any individual human, and it foreshadows even more profound changes ahead.

When AI says "you don't have to tell me what to do," it marks both a milestone in technological progress and a starting point for rethinking the relationship between humans and machines.