Skip to main content

Who this track is for

The Data Engineer track is for engineers who want to build the infrastructure that powers data teams. You’ll learn the modern data stack end to end — from ingestion and transformation to streaming and governance. Best fit if you:
  • Come from a software engineering or analytics background
  • Want to build scalable pipelines, warehouses, and real-time systems
  • Are interested in the infrastructure that makes ML and analytics possible

Curriculum

Level 1 — Modern Data Stack Foundations (free)

ModuleKey Topics
Data WarehousingDimensional modelling, star schema, Snowflake/BigQuery
dbt FundamentalsModels, tests, documentation, lineage

Level 2 — Pipeline Engineering

ModuleKey Topics
OrchestrationAirflow DAGs, task dependencies, backfill, SLAs
Ingestion PatternsCDC, batch vs streaming, Airbyte, Fivetran patterns
Advanced dbtIncremental models, snapshots, macros, packages

Level 3 — Distributed Computing

ModuleKey Topics
Spark FundamentalsRDDs, DataFrames, partitioning, optimisation
Spark in ProductionCluster management, cost optimisation, debugging

Level 4 — Streaming & Lakehouses

ModuleKey Topics
Kafka & StreamingProducers, consumers, Kafka Streams, exactly-once
Lakehouse ArchitectureDelta Lake, Iceberg, data contracts, governance
Data QualityGreat Expectations, anomaly detection, alerting

Level 5 — Capstone

Design and build a complete data platform: ingestion, transformation (dbt), orchestration (Airflow), and a downstream analytics or ML use case. Reviewed by your cohort on architecture and data quality.