Andrew Ng's New Course: A Practical Guide to Data Governance for Enterprise AI Agents

Course Overview

Andrew Ng has partnered with Databricks to launch a brand-new course — Governing AI Agents — which systematically covers how to integrate Data Governance into the complete lifecycle of AI Agents. The course is taught by Anvarobus, a Technical Marketing Manager at Databricks, and provides a full lab environment and code resources. Learners can follow along at zero cost using the free edition of Databricks.

The core question this course addresses is crystal clear: when enterprise AI Agents need access to large volumes of sensitive data, how do you ensure data security, controllable permissions, and observable behavior?

Why Do AI Agents Need Data Governance?

Understanding the Evolution of Data Governance

Data Governance is an organizational-level system of policies, processes, and standards designed to ensure data quality, security, compliance, and availability throughout its entire lifecycle. In traditional enterprise IT architectures, data governance primarily focuses on database access permissions, data classification, and privacy compliance (such as the EU's GDPR and California's CCPA). However, with the rise of AI Agents, data governance faces entirely new challenges: Agents don't just passively read data — they actively make decisions, invoke tools, and chain together multiple data sources. Their behavioral paths are far more complex than traditional applications, requiring more fine-grained and dynamic governance mechanisms.

A key concept needs clarification here: an AI Agent refers to an AI system capable of perceiving its environment, making autonomous decisions, and executing actions — distinct from traditional single-turn Q&A large language model applications. A typical AI Agent possesses capabilities such as Tool Use, Multi-step Reasoning, and Memory Management. In enterprise scenarios, an Agent may simultaneously access CRM systems, data warehouses, external APIs, and other data sources. The more autonomous the Agent, the greater the potential data security risk — because the Agent's behavioral paths cannot be fully anticipated at design time.

A Typical Risk Scenario

The course uses a customer analytics Agent as an example to illustrate the necessity of governance. Suppose you've built an Agent specifically for customer analysis that needs access to customer demographic data, transaction records, website behavioral data, and survey responses.

If you grant this Agent broad access to all data, the risks are obvious — the Agent could leak customers' credit card information, home addresses, or personal spending behaviors, data that shouldn't be visible to all company employees.

The Ideal State After Governance

When you build an Agent with a data governance mindset, you can achieve the following controls:

Precise access control: Explicitly specify which tables and columns the Agent can access
Data masking: Encrypt customer IDs and mask credit card information. Data Masking is a critical technique in data security, divided into static masking and dynamic masking. Static masking permanently replaces sensitive information at the storage layer, while dynamic masking transforms query results in real-time, leaving the original data unchanged. Common masking methods include partial redaction (e.g., displaying credit card numbers as ****-****-****-1234), hash encryption (converting customer IDs into irreversible hash values), data generalization (converting exact ages into age ranges), and pseudonymization (replacing real values with fictitious but consistently formatted data). In AI Agent scenarios, dynamic masking is particularly important because Agents may need different levels of data precision in different contexts, and dynamic masking can adjust data exposure in real-time based on the caller's permission level.
Quality checkpoints: Implement data quality validation on Agent inputs and outputs
Output evaluation: Add evaluation mechanisms (evals) to measure output quality
End-to-end observability: Record every processing step of the Agent for continuous monitoring and troubleshooting

Core Course Content: Complete Governance Practice from Build to Deployment

Step 1: Designing SQL Views Based on the Principle of Least Privilege

The course first teaches the Least Privileged Access principle. The principle of least privilege is one of the foundational principles in information security, first proposed by the U.S. Department of Defense in security models during the 1970s. Its core idea is: any user, program, or system process should only be granted the minimum set of permissions needed to complete its legitimate tasks — no more, no less.

The specific approach is to use SQL Views to restrict the Agent's data access scope. These views are essentially predefined SQL queries that appear similar to tables but contain only the minimum data the Agent needs to complete its tasks. A view acts as a "data window," exposing only specific columns and rows from the underlying table.

This is a highly practical design pattern — rather than letting the Agent directly access raw data tables, you add a view layer as a "data filter," eliminating unauthorized access at the source. For AI Agents, data isolation through the view layer provides an additional security benefit: even if the Agent's prompt suffers an injection attack (Prompt Injection), it cannot breach the data boundaries defined by the view to access unauthorized fields, because permission constraints occur at the database layer rather than the application layer.

Step 2: Unity Catalog Permission Configuration and Tool Registration

To enable the Agent to safely access these views, a proper permission system must be configured. The course teaches you how to:

Build data access tools (Tools) for the Agent
Register these tools as functions in Unity Catalog

Unity Catalog is a unified data governance solution launched by Databricks in 2022 and officially open-sourced in 2024. It adopts a three-level namespace architecture (Catalog → Schema → Table/Function/Model) and provides unified permission management for data assets, AI models, feature tables, and functions. Its core capabilities include: fine-grained access control (supporting row-level and column-level permissions), Data Lineage tracking, automated audit logs, and cross-workspace asset sharing.

In AI Agent scenarios, Unity Catalog's unique value lies in its ability to register the tools used by Agents as function objects in the catalog, thereby incorporating tool access permissions into a unified governance system rather than scattering them across various code repositories where they're difficult to manage. This provides a unified governance layer for enterprise AI Agent permission management, ensuring that only authorized Agents or users can access specific tools and data.

Step 3: Agent Logic Implementation and MLflow Tracing & Evaluation

With tools and permissions in place, the course uses the OpenAI SDK to implement the Agent's core logic. Simultaneously, it leverages MLflow to enable Tracing functionality, achieving end-to-end recording of the Agent's reasoning process.

MLflow is a machine learning lifecycle management platform open-sourced by the Databricks team in 2018, and has since become one of the most widely used ML experiment management tools in the industry. MLflow's Tracing feature is specifically designed for large language models and AI Agents, automatically recording each step's inputs and outputs, tool call parameters, latency, and token consumption, forming a complete call chain diagram — similar to distributed tracing in microservice architectures (like Jaeger or Zipkin), but optimized specifically for the characteristics of LLM applications.

The evaluation component is equally critical — the course teaches you how to systematically evaluate the Agent to ensure its output quality meets expectations. MLflow provides an evaluation framework for LLM applications (mlflow.evaluate) that supports custom evaluation metrics such as answer accuracy, Hallucination Detection, harmful content detection, and more, enabling teams to systematically verify Agent output quality before deployment rather than relying solely on manual spot-checks.

Step 4: Deployment and Continuous Monitoring

Finally, the course guides you through Agent deployment, putting all the governance practices into production and establishing continuous monitoring mechanisms.

The Four Pillars of AI Agent Data Governance

The course proposes a Four Pillars Framework for AI Agent data governance:

Pillar	Core Focus
Lifecycle Management	End-to-end management from development to retirement
Risk Management	Identifying and controlling risks such as data leakage and unauthorized access
Security	Security measures including data encryption, masking, and access control
Observability	Logging, behavior monitoring, and troubleshooting capabilities

These four pillars form a complete governance system covering the core concerns enterprises face when deploying AI Agents. Notably, this framework is consistent with governance principles in traditional software engineering but makes important extensions for the autonomy and uncertainty of AI Agents — for example, observability in traditional applications primarily focuses on performance metrics and error logs, while in Agent scenarios it must also cover the reasonableness of reasoning paths, compliance of tool invocations, and safety of output content.

Course Technology Stack Overview

The technology stack covered in this course includes:

Databricks: Serves as the overall platform and runtime environment. Databricks was founded in 2013 by the creators of Apache Spark and has evolved into a unified Data Intelligence Platform integrating data engineering, data science, and AI. Its Lakehouse architecture combines the flexibility of data lakes with the governance capabilities of data warehouses.
Unity Catalog: Open-source data catalog responsible for permission and tool management
OpenAI SDK: Implements Agent logic. The course's choice of OpenAI SDK as the Agent implementation layer reflects the current industry trend of "decoupling the model layer from the governance layer" — the Agent's intelligence is provided by the LLM, while security and governance are uniformly ensured by the platform layer (such as Unity Catalog and MLflow).
MLflow: Tracing and evaluation framework
SQL Views: Data access control layer

Summary and Learning Recommendations

As enterprise AI Agents rapidly proliferate, data governance is shifting from "nice-to-have" to "must-have." The value of this course lies in the fact that it doesn't merely discuss governance principles at a theoretical level — it provides a practical, implementable solution — from view design, permission configuration, and tool registration to deployment monitoring, forming a complete closed loop.

For technical teams currently deploying or planning to deploy AI Agents in enterprise settings, the governance framework and best practices offered by this course are well worth studying in depth. The course is available for free on the DeepLearning.ai platform, and all labs can be completed using the free edition of Databricks.

Key Takeaways

Andrew Ng partners with Databricks to launch an AI Agent data governance course, providing a complete practical guide from build to deployment
The course proposes four pillars of AI Agent governance: lifecycle management, risk management, security, and observability
Implements the principle of least privilege through SQL views, using Unity Catalog for unified tool and permission management
Uses OpenAI SDK to build Agent logic, combined with MLflow for end-to-end tracing and evaluation
The course is freely available, with all labs completable at zero cost using the free edition of Databricks