Course DP-750T00-A: Implement data engineering solutions using Azure Databricks

Course DP-750T00-A: Implement data engineering solutions using Azure Databricks

Duration: 4 Days

Master end-to-end data engineering with Azure Databricks and Unity Catalog. This course moves from foundational setup to production deployment, covering environment configuration and enterprise-grade governance. Learn to build robust ingestion pipelines, implement security with Unity Catalog, and deploy optimized workloads. By the end, you will have the practical skills to implement, secure, and maintain scalable lakehouse solutions that meet rigorous enterprise requirements.

The target audience is data engineers who have fundamental knowledge of data analytics concepts, a basic understanding of cloud storage, and familiarity with data organization principles. They should be comfortable working with SQL and have experience using Python, including notebooks, for data engineering tasks. Learners are expected to have a good understanding of Azure Databricks workspaces and Unity Catalog, along with familiarity with data access patterns and core data engineering and data warehouse concepts. In addition, they should have foundational knowledge of Azure security, including Microsoft Entra ID, and be familiar with Git version control fundamentals.

Explore Azure Databricks

Azure Databricks is a cloud service that provides a scalable platform for data analytics using Apache Spark.

  • Get started with Azure Databricks
  • Identify Azure Databricks workloads
  • Understand key concepts
  • Data governance using Unity Catalog and Microsoft Purview
  • Exercise - Explore Azure Databricks

Understand Azure Databricks architecture

Azure Databricks architecture separates control and compute planes while organizing resources through a hierarchical structure. This module explores how the account hierarchy works, the differences between serverless and classic compute planes, and the various storage options available including default storage, external storage, and Unity Catalog managed storage for organizing and governing your data.

  • Understand Azure Databricks architecture
  • Understand Unity Catalog managed storage
  • Understand external storage
  • Understand default storage (serverless compute)

Understand Azure Databricks Integrations

Azure Databricks integrates with multiple Microsoft services to provide end-to-end data engineering, analytics, and AI capabilities. This module explores how Azure Databricks works with Microsoft Fabric, Power BI, Visual Studio Code, Power Platform, Copilot Studio, Microsoft Purview, and Microsoft Foundry to enable comprehensive solutions that combine data lakehouse capabilities with business intelligence, application development, and conversational AI.

  • Understand integration with Microsoft Fabric
  • Understand integration with Power BI
  • Understand integration with VS Code
  • Understand integration with Power Platform
  • Understand integration with Copilot Studio
  • Understand integration with Microsoft Purview
  • Understand integration with Microsoft Foundry

Select and Configure Compute in Azure Databricks

Azure Databricks provides multiple compute options optimized for different workloads. This module explores how to choose the right compute type, configure performance settings, manage access permissions, and install libraries. You'll learn when to use serverless versus classic compute, how to optimize clusters for cost and performance, and best practices for securing compute resources.

  • Choose an appropriate compute type
  • Configure compute performance
  • Configure compute features
  • Install libraries for compute
  • Configure compute access
  • Exercise - Select and Configure Compute in Azure Databricks

Create and organize objects in Unity Catalog

Unity Catalog's three-layer namespace—catalogs, schemas, and objects—provides a flexible foundation for organizing data assets while maintaining centralized governance. This module explores how to create catalogs for environment isolation, organize schemas within those catalogs, and create tables, views, and volumes for structured and unstructured data. You'll learn to implement foreign catalogs for external database access, apply effective naming conventions, and configure AI/BI Genie instructions to enhance data discoverability.

  • Apply naming conventions
  • Create catalog
  • Create schema
  • Create tables and views
  • Create volumes
  • Implement DDL operations
  • Implement foreign catalog
  • Configure AI/BI Genie instructions
  • Exercise - Create and Organize Objects in Unity Catalog

Secure Unity Catalog objects

Unity Catalog provides centralized governance and security for data assets in Azure Databricks. This module explores how to secure Unity Catalog objects through access control strategies, fine-grained permissions, credential management, and authentication mechanisms. You'll learn how to implement table and schema-level security, enforce row and column filtering, securely access secrets from Azure Key Vault, and authenticate data access using service principals and managed identities.

  • Understand query lifecycle
  • Implement access control strategies
  • Understand fine-grained access control
  • Implement row filtering and column masking
  • Access Azure Key Vault secrets
  • Authenticate data access with service principals
  • Authenticate resource access with managed identities
  • Exercise - Secure Unity Catalog Objects

Govern Unity Catalog objects

This module covers essential governance practices in Unity Catalog, enabling you to secure, monitor, and manage your data estate effectively. You will learn how to implement fine-grained access control, track data lineage, configure audit logs, and share data securely.

  • Create and preserve table definitions
  • Configure ABAC with tags and policies
  • Apply data retention policies
  • Set up and manage data lineage
  • Configure audit logging
  • Design secure Delta Sharing strategy
  • Exercise - Govern Unity Catalog Objects

Design and implement data modeling with Azure Databricks

Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.

  • Design ingestion logic and data source configuration
  • Choose a data ingestion tool
  • Choose a data table format
  • Design and implement a data partitioning scheme
  • Choose a slowly changing dimension (SCD) type
  • Implement a slowly changing dimension (SCD) type 2
  • Design and implement a temporal (history) table to record changes over time
  • Choose granularity on a column or table based on requirements
  • Choose managed vs unmanaged tables
  • Design and implement a clustering strategy
  • Exercise - Design and Implement Data Modeling with Azure Databricks

Ingest data into Unity Catalog

Data ingestion is a fundamental capability for any data platform. This module explores the comprehensive set of techniques available in Azure Databricks for loading data into Unity Catalog tables. You'll learn how to use managed connectors with Lakeflow Connect, write custom ingestion code in notebooks, apply SQL commands for batch file loading, process change data capture feeds, configure streaming ingestion from message buses, set up Auto Loader for automatic file detection, and orchestrate ingestion workflows with Lakeflow Spark Declarative Pipelines.

  • Ingest data with Lakeflow Connect
  • Ingest data with notebooks
  • Ingest data with SQL methods
  • Ingest data with CDC feed
  • Ingest data with Spark Structured Streaming
  • Ingest data with Auto Loader
  • Ingest data with Lakeflow Spark Declarative Pipelines
  • Exercise - Ingest Data into Unity Catalog

Cleanse, transform, and load data into Unity Catalog

Data engineering requires transforming raw data into clean, well-structured formats ready for analysis. This module explores techniques for profiling data quality, selecting appropriate column types, resolving duplicates and null values, applying filtering and aggregation transformations, combining datasets with joins and set operators, reshaping data through pivoting and denormalization, and loading transformed data using append, overwrite, and merge strategies.

  • Profile data
  • Choose column data types
  • Resolve duplicates and nulls
  • Transform data with filters and aggregations
  • Transform data with joins and set operators
  • Transform data with denormalization and pivots
  • Load data with merge, insert, and append
  • Exercise - Cleanse, Transform, and Load Data into Unity Catalog

Implement and manage data quality constraints with Azure Databricks

This module explores strategies for maintaining high data quality in Azure Databricks. You will learn how to implement validation checks, enforce schemas, manage schema drift, and use pipeline expectations to ensure data integrity throughout your data pipelines.

  • Implement validation checks
  • Implement data type checks
  • Detect and manage schema drift
  • Manage data quality with pipeline expectations
  • Exercise - Implement and Manage Data Quality Constraints with Azure Databricks

Design and implement data pipelines with Azure Databricks

Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Spark Declarative Pipelines, covering orchestration, error handling, and task logic.

  • Design order of operations for a pipeline
  • Choose notebook vs Lakeflow Pipelines
  • Design Lakeflow job logic
  • Design error handling in pipelines and jobs
  • Create pipeline with notebook
  • Create pipeline with Lakeflow Spark Declarative Pipelines
  • Exercise - Design and Implement Data Pipelines with Azure Databricks

Implement Lakeflow Jobs with Azure Databricks

This module guides you through the process of implementing Lakeflow Jobs in Azure Databricks. You will learn how to create jobs, configure triggers and schedules, set up alerts, and manage automatic restarts to ensure reliable data pipeline execution.

  • Create job setup and configuration
  • Configure job triggers
  • Schedule a job
  • Configure job alerts
  • Configure automatic restarts
  • Exercise - Implement Lakeflow Jobs with Azure Databricks

Implement development lifecycle processes in Azure Databricks

Azure Databricks integrates with established development practices through Git folders for version control and Databricks Asset Bundles for infrastructure-as-code deployments. This module explores Git version control best practices, branching and pull request workflows, comprehensive testing strategies, and CLI-based bundle deployment across environments.

  • Apply Git version control best practices
  • Manage branching and pull requests
  • Implement testing strategy
  • Configure and package DABs
  • Deploy bundle with Databricks CLI
  • Exercise - Implement Development Lifecycle Processes in Azure Databricks

Monitor, troubleshoot and optimize workloads in Azure Databricks

Monitoring and optimization are essential for running reliable, cost-effective data workloads in Azure Databricks. This module explores cluster consumption metrics, Lakeflow Jobs troubleshooting, Spark job diagnostics, performance optimization for caching, skew, spill, and shuffle issues, and log streaming to Azure Log Analytics.

  • Monitor and manage cluster consumption
  • Troubleshoot and repair Lakeflow Jobs
  • Troubleshoot Spark jobs and notebooks
  • Investigate caching, skewing, spilling, shuffle
  • Implement log streaming with Azure Log Analytics
  • Exercise - Monitor, Troubleshoot and Optimize Workloads in Azure Databricks
This class has hands-on labs provided by Go Deploy.