Sports organizations generate data from dozens of sources — ticketing systems, CRM platforms, email tools, merchandise systems, and more. Each one produces raw, messy, inconsistently formatted records that can't be queried together without significant transformation work.
The typical result is a reporting environment built on ad hoc queries, fragile spreadsheet joins, and analysts spending 80% of their time cleaning data before they can ask a single business question. The data exists. The infrastructure to trust it doesn't.
GamePulse was built to demonstrate exactly what that infrastructure looks like — a three-layer medallion warehouse where every transformation is documented, every business key is managed, and the Gold layer delivers analytical datasets that power dashboards and models without any further cleaning required.
The goal wasn't just to build a warehouse. It was to build one that reflects how a real analytics team would structure a repeatable, scalable data foundation for a sports org — modular, extensible, and ready for Power BI or Tableau on day one.
Five stages — each one building on the last, each one independent enough to test and validate before the next begins.
Three tools with clearly separated responsibilities — and a full diagram of the medallion schema showing every table across all three layers.
Each layer has a strict contract — what it accepts, what it does, and what it produces. No layer does another layer's job.
| Layer | Purpose | Key Tables | Automation |
|---|---|---|---|
| Bronze | Raw ingestion — data lands exactly as received from source systems. No transformation, no business logic. Source fidelity preserved for re-processing and auditability. | bronze.sm_customerbronze.trs_ticket_ordersbronze.email_campaigns |
Python loader script (pyodbc + sqlalchemy) |
| Silver | Cleansed and conformed — trimmed, capitalized, standardized naming and data types. Surrogate keys generated. Each table maps one-to-one with its Bronze source but is trusted for analytical use. | silver.sm_customersilver.trs_ticket_orderssilver.campaign_events |
T-SQL stored procusp_LoadSilver |
| Gold | Aggregated and analytical — fact tables and dimension views shaped specifically for BI consumption. Pre-joined, pre-aggregated, and free of transformation debt. Power BI or Tableau connects here directly. | gold.fct_ticketinggold.dim_customersgold.vw_email_performance |
T-SQL stored procusp_LoadGold |
Every Gold output is pre-built for a specific analytical question. No transformation required at query time.
vw_customer_retention.fct_ticketing. Directly feeds Power BI KPI dashboards.vw_email_performance. Highlights underperforming sends and engagement trends.dim_customers with surrogate keys, segment labels, and LTV signals — the shared key that connects all fact tables.The design choices that shaped the architecture — and the reasoning behind each one.
| Decision | Rationale | Tradeoff |
|---|---|---|
| Bronze preserves raw data exactly | The Bronze layer stores data exactly as received — no trimming, no type casting, no business logic. This is a deliberate principle: if a downstream transformation turns out to be wrong, you need the ability to re-process from raw without going back to the source system. Bronze is the source of truth, not a staging area. | Duplicate storage — keeping both raw and cleansed versions of the same data costs storage. Accepted as the right tradeoff because re-ingestion from external sources is more expensive than disk space in any real org. |
| T-SQL stored procedures over ad hoc scripts | Each transformation layer is automated through a named stored procedure rather than a one-off query file. This makes the pipeline repeatable without manual intervention, auditable through execution history, and testable in isolation. It mirrors how production ETL automation actually works in SQL Server environments. | Production-grade pattern — stored procedures are the SQL Server equivalent of dbt models or Databricks notebooks. The same pattern scales directly to enterprise environments without rearchitecting. |
| Surrogate keys in Silver, not Bronze | Business keys (surrogate keys) are generated in Silver because that's where identity is first resolved. Bronze carries whatever natural keys came from the source system, which may be inconsistent or missing. Silver is where the warehouse takes ownership of a unique, stable identifier for each business entity. | Clean identity lineage — every Gold fact table joins to the same surrogate key from Silver, making cross-table analysis consistent and preventing the key collisions that break reports in un-keyed warehouses. |
| Gold as views AND fact tables | The Gold layer uses both materialized fact tables (for aggregated metrics that benefit from pre-computation) and SQL views (for analytical summaries that should reflect Silver updates in real-time). The choice between them depends on query frequency and update cadence — documented in the schema DDL. | Two patterns to maintain — mixing tables and views in Gold adds a small maintenance surface. The alternative (all views) would be slower to query; all tables would require manual refresh on every Silver load. |