Course DP-750T00-A: Implement data engineering solutions using Azure Databricks

Duration: 4 Days

Master end-to-end data engineering with Azure Databricks and Unity Catalog. This course moves from foundational setup to production deployment, covering environment configuration and enterprise-grade governance. Learn to build robust ingestion pipelines, implement security with Unity Catalog, and deploy optimized workloads. By the end, you will have the practical skills to implement, secure, and maintain scalable lakehouse solutions that meet rigorous enterprise requirements.

Audience Profile

The target audience is data engineers who have fundamental knowledge of data analytics concepts, a basic understanding of cloud storage, and familiarity with data organization principles. They should be comfortable working with SQL and have experience using Python, including notebooks, for data engineering tasks. Learners are expected to have a good understanding of Azure Databricks workspaces and Unity Catalog, along with familiarity with data access patterns and core data engineering and data warehouse concepts. In addition, they should have foundational knowledge of Azure security, including Microsoft Entra ID, and be familiar with Git version control fundamentals.

Course Content

Explore Azure Databricks

Azure Databricks is a cloud service that provides a scalable platform for data analytics using Apache Spark.

Get started with Azure Databricks
Identify Azure Databricks workloads
Understand key concepts
Data governance using Unity Catalog and Microsoft Purview
Exercise - Explore Azure Databricks

Understand Azure Databricks architecture

Azure Databricks architecture separates control and compute planes while organizing resources through a hierarchical structure. This module explores how the account hierarchy works, the differences between serverless and classic compute planes, and the various storage options available including default storage, external storage, and Unity Catalog managed storage for organizing and governing your data.

Understand Azure Databricks architecture
Understand Unity Catalog managed storage
Understand external storage
Understand default storage (serverless compute)

Understand Azure Databricks Integrations

Azure Databricks integrates with multiple Microsoft services to provide end-to-end data engineering, analytics, and AI capabilities. This module explores how Azure Databricks works with Microsoft Fabric, Power BI, Visual Studio Code, Power Platform, Copilot Studio, Microsoft Purview, and Microsoft Foundry to enable comprehensive solutions that combine data lakehouse capabilities with business intelligence, application development, and conversational AI.

Understand integration with Microsoft Fabric
Understand integration with Power BI
Understand integration with VS Code
Understand integration with Power Platform
Understand integration with Copilot Studio
Understand integration with Microsoft Purview
Understand integration with Microsoft Foundry

Select and Configure Compute in Azure Databricks

Azure Databricks provides multiple compute options optimized for different workloads. This module explores how to choose the right compute type, configure performance settings, manage access permissions, and install libraries. You'll learn when to use serverless versus classic compute, how to optimize clusters for cost and performance, and best practices for securing compute resources.

Choose an appropriate compute type
Configure compute performance
Configure compute features
Install libraries for compute
Configure compute access
Exercise - Select and Configure Compute in Azure Databricks

Create and organize objects in Unity Catalog

Unity Catalog's three-layer namespace—catalogs, schemas, and objects—provides a flexible foundation for organizing data assets while maintaining centralized governance. This module explores how to create catalogs for environment isolation, organize schemas within those catalogs, and create tables, views, and volumes for structured and unstructured data. You'll learn to implement foreign catalogs for external database access, apply effective naming conventions, and configure AI/BI Genie instructions to enhance data discoverability.

Apply naming conventions
Create catalog
Create schema
Create tables and views
Create volumes
Implement DDL operations
Implement foreign catalog
Configure AI/BI Genie instructions
Exercise - Create and Organize Objects in Unity Catalog

Secure Unity Catalog objects

Unity Catalog provides centralized governance and security for data assets in Azure Databricks. This module explores how to secure Unity Catalog objects through access control strategies, fine-grained permissions, credential management, and authentication mechanisms. You'll learn how to implement table and schema-level security, enforce row and column filtering, securely access secrets from Azure Key Vault, and authenticate data access using service principals and managed identities.

Understand query lifecycle
Implement access control strategies
Understand fine-grained access control
Implement row filtering and column masking
Access Azure Key Vault secrets
Authenticate data access with service principals
Authenticate resource access with managed identities
Exercise - Secure Unity Catalog Objects

Govern Unity Catalog objects

This module covers essential governance practices in Unity Catalog, enabling you to secure, monitor, and manage your data estate effectively. You will learn how to implement fine-grained access control, track data lineage, configure audit logs, and share data securely.

Create and preserve table definitions
Configure ABAC with tags and policies
Apply data retention policies
Set up and manage data lineage
Configure audit logging
Design secure Delta Sharing strategy
Exercise - Govern Unity Catalog Objects

Design and implement data modeling with Azure Databricks

Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.

Design ingestion logic and data source configuration
Choose a data ingestion tool
Choose a data table format
Design and implement a data partitioning scheme
Choose a slowly changing dimension (SCD) type
Implement a slowly changing dimension (SCD) type 2
Design and implement a temporal (history) table to record changes over time
Choose granularity on a column or table based on requirements
Choose managed vs unmanaged tables
Design and implement a clustering strategy
Exercise - Design and Implement Data Modeling with Azure Databricks

Ingest data into Unity Catalog

Data ingestion is a fundamental capability for any data platform. This module explores the comprehensive set of techniques available in Azure Databricks for loading data into Unity Catalog tables. You'll learn how to use managed connectors with Lakeflow Connect, write custom ingestion code in notebooks, apply SQL commands for batch file loading, process change data capture feeds, configure streaming ingestion from message buses, set up Auto Loader for automatic file detection, and orchestrate ingestion workflows with Lakeflow Spark Declarative Pipelines.

Ingest data with Lakeflow Connect
Ingest data with notebooks
Ingest data with SQL methods
Ingest data with CDC feed
Ingest data with Spark Structured Streaming
Ingest data with Auto Loader
Ingest data with Lakeflow Spark Declarative Pipelines
Exercise - Ingest Data into Unity Catalog

Cleanse, transform, and load data into Unity Catalog

Data engineering requires transforming raw data into clean, well-structured formats ready for analysis. This module explores techniques for profiling data quality, selecting appropriate column types, resolving duplicates and null values, applying filtering and aggregation transformations, combining datasets with joins and set operators, reshaping data through pivoting and denormalization, and loading transformed data using append, overwrite, and merge strategies.

Profile data
Choose column data types
Resolve duplicates and nulls
Transform data with filters and aggregations
Transform data with joins and set operators
Transform data with denormalization and pivots
Load data with merge, insert, and append
Exercise - Cleanse, Transform, and Load Data into Unity Catalog

Implement and manage data quality constraints with Azure Databricks

This module explores strategies for maintaining high data quality in Azure Databricks. You will learn how to implement validation checks, enforce schemas, manage schema drift, and use pipeline expectations to ensure data integrity throughout your data pipelines.

Implement validation checks
Implement data type checks
Detect and manage schema drift
Manage data quality with pipeline expectations
Exercise - Implement and Manage Data Quality Constraints with Azure Databricks

Design and implement data pipelines with Azure Databricks

Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Spark Declarative Pipelines, covering orchestration, error handling, and task logic.

Design order of operations for a pipeline
Choose notebook vs Lakeflow Pipelines
Design Lakeflow job logic
Design error handling in pipelines and jobs
Create pipeline with notebook
Create pipeline with Lakeflow Spark Declarative Pipelines
Exercise - Design and Implement Data Pipelines with Azure Databricks

Implement Lakeflow Jobs with Azure Databricks

This module guides you through the process of implementing Lakeflow Jobs in Azure Databricks. You will learn how to create jobs, configure triggers and schedules, set up alerts, and manage automatic restarts to ensure reliable data pipeline execution.

Create job setup and configuration
Configure job triggers
Schedule a job
Configure job alerts
Configure automatic restarts
Exercise - Implement Lakeflow Jobs with Azure Databricks

Implement development lifecycle processes in Azure Databricks

Azure Databricks integrates with established development practices through Git folders for version control and Databricks Asset Bundles for infrastructure-as-code deployments. This module explores Git version control best practices, branching and pull request workflows, comprehensive testing strategies, and CLI-based bundle deployment across environments.

Apply Git version control best practices
Manage branching and pull requests
Implement testing strategy
Configure and package DABs
Deploy bundle with Databricks CLI
Exercise - Implement Development Lifecycle Processes in Azure Databricks

Monitor, troubleshoot and optimize workloads in Azure Databricks

Monitoring and optimization are essential for running reliable, cost-effective data workloads in Azure Databricks. This module explores cluster consumption metrics, Lakeflow Jobs troubleshooting, Spark job diagnostics, performance optimization for caching, skew, spill, and shuffle issues, and log streaming to Azure Log Analytics.

Monitor and manage cluster consumption
Troubleshoot and repair Lakeflow Jobs
Troubleshoot Spark jobs and notebooks
Investigate caching, skewing, spilling, shuffle
Implement log streaming with Azure Log Analytics
Exercise - Monitor, Troubleshoot and Optimize Workloads in Azure Databricks

Labs/Hands-On Exercises

This class has hands-on labs provided by Go Deploy.