Who this track is for
The Data Engineer track is for engineers who want to build the infrastructure that powers data teams. You’ll learn the modern data stack end to end — from ingestion and transformation to streaming and governance. Best fit if you:- Come from a software engineering or analytics background
- Want to build scalable pipelines, warehouses, and real-time systems
- Are interested in the infrastructure that makes ML and analytics possible
Curriculum
Level 1 — Modern Data Stack Foundations (free)
| Module | Key Topics |
|---|---|
| Data Warehousing | Dimensional modelling, star schema, Snowflake/BigQuery |
| dbt Fundamentals | Models, tests, documentation, lineage |
Level 2 — Pipeline Engineering
| Module | Key Topics |
|---|---|
| Orchestration | Airflow DAGs, task dependencies, backfill, SLAs |
| Ingestion Patterns | CDC, batch vs streaming, Airbyte, Fivetran patterns |
| Advanced dbt | Incremental models, snapshots, macros, packages |
Level 3 — Distributed Computing
| Module | Key Topics |
|---|---|
| Spark Fundamentals | RDDs, DataFrames, partitioning, optimisation |
| Spark in Production | Cluster management, cost optimisation, debugging |
Level 4 — Streaming & Lakehouses
| Module | Key Topics |
|---|---|
| Kafka & Streaming | Producers, consumers, Kafka Streams, exactly-once |
| Lakehouse Architecture | Delta Lake, Iceberg, data contracts, governance |
| Data Quality | Great Expectations, anomaly detection, alerting |