← index
CASE STUDY · data engineering · 2025

DWH / SSOT

Medallion architecture data warehouse for a life-insurance group's microinsurance products. Bronze-silver-gold ETL with watermark-based incremental loading, idempotent MERGE patterns, and a 97-column fact table spanning 5 insurance products.

uilt the data warehouse from scratch over nine months. Five microinsurance products, full medallion architecture: bronze (raw ingestion), silver (cleaned and typed), gold (business-ready aggregates).

The core pattern is watermark-based incremental loading. A control table tracks the last loaded timestamp per table. Each run picks up new rows only. MERGEs handle duplicates. If the ETL runs twice, you get the same result. Idempotency first, performance second.

The fact table ended up at 97 columns. That is the wrong shape, and it happened because every new product needed just a few more fields instead of proper dimension tables. Lesson learned the expensive way.

Insurance data is its own category of messy. Policy numbers change. Products get renamed mid-year. The same customer shows up under three slightly different names. A significant chunk of the work ended up being validation logic that has nothing to do with data engineering and everything to do with understanding the business.

highlights

  • +Medallion architecture (bronze/silver/gold) across 5 microinsurance products
  • +Watermark-based incremental loading with idempotent MERGE patterns, runs nightly without intervention
  • +Control table system tracks ETL state per table, enables reliable restarts after failure
  • +Data feeds regulatory reporting and executive dashboards used in real business decisions
  • +Built entirely in SQL Server + SSIS, zero external orchestration dependencies

what was hard

The 97-column fact table is the honest answer. Started at 40, grew because we kept adding fields instead of creating proper dimensions. Also: insurance data is deceptively dirty. You build the pipeline, then spend more time on validation than the pipeline itself because the source system has no enforced constraints.