Andrew Ng's New Course: A Complete Guide to Building Database Agents

Course Overview

Andrew Ng has partnered with Microsoft to launch a brand-new course — "Building Your Own Database Agent" — which systematically explains how to use Large Language Models (LLMs) to interact with structured data such as tabular data and SQL databases. The core objective of this course is to enable users to automatically perform database queries and data analysis through AI Agents without needing to master query languages like SQL.

LLMs were initially best suited for processing unstructured text data, but structured data (such as tables in relational databases) accounts for the vast majority of enterprise data assets. The core challenges of connecting LLMs with structured data include: the model needs to understand the database schema (table structures, field relationships), generate syntactically correct and semantically accurate SQL queries, process query results, and present them in a human-understandable manner. Research in this field evolved from the Text-to-SQL task — early approaches relied on rule matching and small sequence-to-sequence models, while the emergence of large models like GPT-4 has made zero-shot or few-shot SQL generation possible, with accuracy exceeding 80% on benchmarks like Spider.

Course Introduction

The course instructor is Adrian Gonzalez-Sanchez, a Microsoft Data & AI specialist who is also a university professor and author of the O'Reilly-published book "Azure OpenAI." The course is built on Azure OpenAI Service and the LangChain framework, covering a complete learning path from beginner to advanced levels.

Why Do We Need Database Agents?

Pain Points of Traditional Data Querying

In traditional data analysis workflows, when business users need to retrieve information from a database, they typically go through the following steps: formulate a question → find a data analyst → the analyst translates the natural language question into a SQL query → execute the query → return results. This process is not only time-consuming but also involves communication costs and misunderstandings.

According to Gartner, fewer than 30% of employees in enterprises can effectively leverage data for decision-making, with the primary bottleneck being technical barriers and tool complexity. Traditional solutions include self-service analytics features in BI tools (such as Tableau and Power BI), but these tools still require users to learn specific operational logic and drag-and-drop interfaces. For complex multi-table join queries or ad-hoc analysis needs, non-technical users remain helpless.

The Revolutionary Breakthrough of Agents

The emergence of database Agents has fundamentally changed this paradigm. Users simply input questions in natural language, and the Agent automatically generates a series of operations (function calls), retrieves relevant data, and returns answers. As emphasized in the course:

Using an LLM as an abstraction layer on top of databases is the new frontier for many companies to achieve data democratization.

This means that regardless of the underlying database provider, data model, or query language, the LLM can serve as a unified interaction interface, enabling everyone to easily access and analyze data. LLM-powered database Agents represent the next generation of data access solutions — users interact entirely in natural language while the system automatically handles the entire process from intent understanding to query execution, truly realizing the vision of "everyone is a data analyst."

Core Course Content

Six Learning Modules

The course covers the following key technical topics:

Deploying LLMs to Build AI Agents — Learn how to deploy large language models using Azure OpenAI Service as the core reasoning engine for Agents. Azure OpenAI Service is Microsoft's enterprise-grade offering that provides OpenAI models (including GPT-4, GPT-4o, etc.) through the Azure cloud platform. Compared to using the OpenAI API directly, Azure OpenAI offers enterprise-level security guarantees (data is not used for model training), regional compliance deployment, virtual network integration, content filtering systems, and SLA service level agreements. These features are particularly important for database Agent development in enterprise scenarios — since databases often contain sensitive business data that requires AI inference in a secure and controlled environment.
RAG Implementation for Tabular Data — Applying Retrieval-Augmented Generation (RAG) technology to structured tabular data, rather than being limited to text documents. RAG was originally designed to retrieve relevant snippets from text documents to enhance LLM responses; applying it to tabular data is a relatively novel direction. The core idea is to convert tabular data into a format that can be retrieved via vector search (e.g., serializing each row of data into text descriptions), or to retrieve relevant table structure information and example queries to help the LLM more accurately understand the data context. This approach solves the problem of LLM's limited context window — when a database contains hundreds of tables and thousands of fields, it's impossible to input all schema information into the model at once. RAG can dynamically retrieve the most relevant table structures and data samples.
Developing Database Agents — Building intelligent agents that can autonomously interact with SQL databases, including the complete workflow of understanding user intent, planning query strategies, executing SQL, and interpreting results.
Function Calling Systems — Implementing Agent integration with external tools and APIs through Function Calling. Function Calling is a key capability introduced by OpenAI in 2023 that allows LLMs to decide to call predefined external functions during reasoning. The workflow is: developers describe available functions' names, parameters, and purposes to the model; the model determines when to call a function during conversation and outputs a structured function call request (including function name and parameter values); the application layer executes the actual function call and returns results to the model; the model continues reasoning or generates a final answer based on the returned results. In database Agent scenarios, these functions can be operations like executing SQL queries, retrieving table structure information, or data visualization, giving the Agent the "hands and feet" to interact with external systems.
Azure OpenAI Assistants API Integration — Leveraging Microsoft's Assistants API to simplify the Agent development process. The Assistants API is a stateful API released by OpenAI that manages conversation history, file uploads, and tool call states on the server side. Developers don't need to maintain complex session management logic themselves, significantly reducing the engineering complexity of Agent development.
LangChain Agent Framework — Mastering the core usage of the most popular Agent development framework. LangChain is an open-source framework created by Harrison Chase in 2022 and has become one of the de facto standards for LLM application development. For Agent development, LangChain provides core components such as AgentExecutor, Tools abstraction, Memory modules, and Chains. Its SQL Agent module (such as SQLDatabaseToolkit) encapsulates common operations like database connections, schema inspection, query execution, and result parsing — developers can build an Agent with database interaction capabilities in just a few lines of code. LangChain's ReAct (Reasoning + Acting) pattern enables Agents to alternate between reasoning and action, accomplishing complex multi-step query tasks.

Technology Stack

The course primarily uses the following technology stack:

Azure OpenAI Service: Provides API services for GPT-series models with enterprise-grade security and compliance
LangChain: Agent framework responsible for orchestrating tool calls and reasoning chains, offering rich pre-built components
SQL Database: Structured database serving as the data source

Although the course uses database Agents as the main case study, the components and design patterns learned are equally applicable to building other types of Agent systems.

Industry Significance of Agent Technology

A New Path to Data Democratization

A concept repeatedly emphasized in the course is "Data Democratization." In traditional enterprises, data is often locked behind technical barriers — only engineers or analysts who know SQL can effectively utilize it. Database Agents break down this barrier, enabling non-technical personnel in marketing, operations, and management to directly converse with data.

From an industry practice perspective, data democratization has been one of the core topics in enterprise digital transformation in recent years. Early data warehouses and BI platforms (such as Tableau, Power BI, and Looker) lowered the barrier to data consumption through visual interfaces, but users still needed to understand data models and metric definitions. LLM-powered Agents further eliminate this cognitive burden — users only need to describe the business problem itself, and the system automatically handles all technical details from data location to query optimization.

Agents as a Growth Category in AI

Andrew Ng explicitly states in the course introduction that Agents are one of the fastest-growing categories in generative AI. The evolution from simple chatbots to Agents with autonomous planning, tool usage, and multi-step reasoning capabilities is accelerating. Database Agents are a quintessential application scenario of this trend.

The fundamental difference between Agents and traditional chatbots lies in "autonomy" and "action capability." Chatbots can only generate text responses based on existing knowledge, while Agents can perceive their environment, formulate plans, invoke tools to execute operations, observe results, and iteratively optimize. This "think-act-observe" loop (the ReAct paradigm) enables Agents to handle complex tasks requiring multi-step reasoning and external interaction, such as cross-table correlation analysis and root cause investigation of data anomalies.

Learning Recommendations and Practical Directions

For developers looking to take this course, here are some suggestions:

Prerequisites: Basic Python knowledge, understanding of basic SQL syntax, and initial familiarity with LLMs
Practical Direction: Try connecting the course's Agent to your company's database to build an internal data query assistant
Extended Thinking: Apply Function Calling and Agent orchestration concepts to other scenarios, such as customer service systems and automated report generation
Security Considerations: When deploying database Agents in production environments, pay special attention to SQL injection prevention, query permission control, and sensitive data masking. It's recommended to use read-only database connections and set query timeout limits

This course is not just a technical tutorial — it represents an important paradigm shift in AI applications from "conversation" to "action." Mastering Agent construction methods will become one of the core competencies for developers in the AI era.