Data Engineering 🏗 Architect ◆ Independent
GamePulse —
Sports Data Warehouse
& Analytics Stack
A full-stack data engineering environment that models how modern sports organizations build enterprise-grade data infrastructure — end-to-end ETL pipeline using the Bronze → Silver → Gold medallion pattern in SQL Server, with T-SQL stored procedures automating each transformation layer and Python driving realistic data generation.
SQL ServerT-SQLPythonMedallion ArchitectureStored ProceduresETL PipelineDimensional ModelingSurrogate Keys
The Problem

Raw data everywhere.
Trusted data nowhere.

Sports organizations generate data from dozens of sources — ticketing systems, CRM platforms, email tools, merchandise systems, and more. Each one produces raw, messy, inconsistently formatted records that can't be queried together without significant transformation work.

The typical result is a reporting environment built on ad hoc queries, fragile spreadsheet joins, and analysts spending 80% of their time cleaning data before they can ask a single business question. The data exists. The infrastructure to trust it doesn't.

GamePulse was built to demonstrate exactly what that infrastructure looks like — a three-layer medallion warehouse where every transformation is documented, every business key is managed, and the Gold layer delivers analytical datasets that power dashboards and models without any further cleaning required.

The goal wasn't just to build a warehouse. It was to build one that reflects how a real analytics team would structure a repeatable, scalable data foundation for a sports org — modular, extensible, and ready for Power BI or Tableau on day one.

Gap 01 · No Trusted Layer
Raw source data from ticketing, CRM, and email tools can't be queried reliably without standardization. The Silver layer solves this with consistent naming, data types, and keys.
Gap 02 · No Analytical Model
Even clean data isn't analytics-ready until it's shaped into facts and dimensions. The Gold layer delivers the aggregated, business-keyed outputs that reporting tools actually consume.
Gap 03 · Manual ETL
Without stored procedures, every transformation run requires manual scripting. T-SQL stored procedures automate the Silver and Gold layer builds, making the pipeline repeatable and auditable.
Gap 04 · No Data Lineage
Surrogate key management and automated load timestamps create an auditable lineage from raw ingestion through to the final analytical layer — so you always know where a number came from.
My Role

Designed and built
every layer.

🏗
What I Built
  • Designed the full three-schema medallion architecture in SQL Server — Bronze, Silver, and Gold schemas each with a defined purpose, grain, and transformation contract
  • Built all Bronze layer ingestion tables — raw data landing zones accepting CSV, API, and Excel inputs with minimal transformation and full source fidelity
  • Built all Silver layer transformation logic — data cleansing, standardization, consistent naming conventions, and surrogate key generation via T-SQL stored procedures
  • Built the Gold layer analytical model — fact tables and dimension views designed for direct consumption by Power BI, Tableau, or ad hoc SQL analytics
  • Wrote all T-SQL stored procedures for Silver and Gold automation — each layer runs as a controlled, repeatable procedure with consistent load timestamps and error-safe design
  • Generated all source data with Python — faker-based realistic datasets covering customer profiles, ticket orders, and campaign events with intentional imperfections and variability
🔬
Design Philosophy
  • Industry-standard medallion pattern — Bronze, Silver, and Gold mirror the architecture used in modern enterprise data platforms including Databricks, Azure Synapse, and Snowflake
  • Schema-driven, not script-driven — each transformation layer is defined by its schema contract, not by one-off query logic that breaks on re-run
  • Extensible by design — the warehouse can absorb additional sources (team stats, financial data, marketing performance) by adding Bronze tables and corresponding Silver/Gold procedures without touching existing logic
  • Reporting-ready Gold layer — Gold outputs are structured specifically for BI tool consumption: pre-aggregated, properly keyed, and free of transformation debt
  • Available on GitHub — full DDL, stored procedures, and example analysis queries published for review
The Approach

Ingestion to insight,
one layer at a time

Five stages — each one building on the last, each one independent enough to test and validate before the next begins.

01
Schema Design
Designed the three-schema SQL Server architecture — Bronze, Silver, Gold — with DDL defining tables, keys, and load metadata fields.
02
Data Generation
Built Python scripts using faker and pandas to generate realistic, imperfect source datasets — customers, ticket orders, campaign events with natural variability.
03
Bronze Ingestion
Loaded raw source data into Bronze tables as-is. No transformation. Full source fidelity preserved for auditability and re-processing.
04
Silver Transform
T-SQL stored procedures run Bronze → Silver: trim, capitalize, standardize naming, assign surrogate keys, apply consistent data types.
05
Gold Modeling
Gold stored procedures build fact tables and dimension views from Silver — pre-aggregated, BI-ready outputs for reporting and analytical consumption.
The Stack

Tools & architecture

Three tools with clearly separated responsibilities — and a full diagram of the medallion schema showing every table across all three layers.

🕶
SQL Server (SSMS)
Core data warehouse host — all three schemas, DDL, and stored procedure execution environment
📑
T-SQL Stored Procs
Silver and Gold layer automation — repeatable, auditable transformation pipelines with load timestamps
Python
Python
Faker-based realistic dataset generation — customers, ticket orders, campaign events with natural variability
GamePulse Medallion Architecture
SQL Server · Bronze → Silver → Gold · T-SQL Stored Procedures
Full DDL on GitHub
PYTHON BRONZE SILVER GOLD Data Generator faker + pandas sm_customer.csv trs_ticket_orders.csv Loader Script pyodbc + sqlalchemy CSV → Bronze tables Stored Procedures usp_LoadSilver usp_LoadGold bronze.sm_customer RAW · NO TRANSFORM bronze.trs_ticket_orders RAW · NO TRANSFORM bronze.email_campaigns RAW · NO TRANSFORM bronze.campaign_events RAW · NO TRANSFORM usp_LoadSilver executes ↓ silver.sm_customer Trimmed · Capitalized · Surrogate key · Standardized types silver.trs_ticket_orders Parsed dates · Normalized amounts · FK to customer silver.email_campaigns Standardized send dates · Normalized metrics silver.campaign_events Joined to customer · Consistent event types usp_LoadGold executes ↓ gold.fct_ticketing Aggregated ticket revenue · segment · channel · dates PRIMARY ANALYTICAL FACT gold.dim_customers Customer dimension · surrogate key · segment · LTV signals CUSTOMER DIMENSION gold.vw_email_performance Campaign engagement metrics · open/click/convert rates ANALYTICAL VIEW gold.vw_customer_retention Renewal & churn signals · purchase frequency · cohorts ANALYTICAL VIEW Power BI · Tableau Gold layer consumed directly — no further cleaning needed KPI dashboards · Executive reports · Ad hoc analysis Automates Silver & Gold builds —— Direct load - - Stored procedure call Gold → BI tool consumption All schemas: SQL Server (SSMS) · Full DDL on GitHub
Architecture Breakdown

Three layers.
One clean data lifecycle.

Each layer has a strict contract — what it accepts, what it does, and what it produces. No layer does another layer's job.

LayerPurposeKey TablesAutomation
Bronze Raw ingestion — data lands exactly as received from source systems. No transformation, no business logic. Source fidelity preserved for re-processing and auditability. bronze.sm_customer
bronze.trs_ticket_orders
bronze.email_campaigns
Python loader script
(pyodbc + sqlalchemy)
Silver Cleansed and conformed — trimmed, capitalized, standardized naming and data types. Surrogate keys generated. Each table maps one-to-one with its Bronze source but is trusted for analytical use. silver.sm_customer
silver.trs_ticket_orders
silver.campaign_events
T-SQL stored proc
usp_LoadSilver
Gold Aggregated and analytical — fact tables and dimension views shaped specifically for BI consumption. Pre-joined, pre-aggregated, and free of transformation debt. Power BI or Tableau connects here directly. gold.fct_ticketing
gold.dim_customers
gold.vw_email_performance
T-SQL stored proc
usp_LoadGold
Gold Layer Outputs

What the Gold layer enables

Every Gold output is pre-built for a specific analytical question. No transformation required at query time.

🎫
Customer Segmentation & Retention
Cohort analysis by acquisition channel, purchase frequency, and LTV tier. Renewal rate trends and churn risk signals from vw_customer_retention.
🎫
Ticketing & Revenue Optimization
Aggregated revenue by segment, channel, date, and event type from fct_ticketing. Directly feeds Power BI KPI dashboards.
📧
Email Campaign Performance
Open rate, click rate, and conversion analysis by campaign and segment from vw_email_performance. Highlights underperforming sends and engagement trends.
👥
Customer Dimension
Business-keyed customer dimension from dim_customers with surrogate keys, segment labels, and LTV signals — the shared key that connects all fact tables.
Scale

The stat sheet

3 layers
Medallion Architecture
Bronze, Silver, Gold — each with a defined purpose, schema contract, and independent test path.
2 procs
T-SQL Automation
usp_LoadSilver and usp_LoadGold automate the full transformation pipeline — repeatable, auditable, timestamp-tracked.
4+ tables
Gold Analytical Layer
Fact tables and dimension views purpose-built for BI consumption — no further cleaning or joining required at query time.
🔗
Open on GitHub
Full DDL, stored procedures, and example analysis queries available for review at the Clutchlytics/gamepulse repository.
Architect's Lens

Key decisions & tradeoffs

The design choices that shaped the architecture — and the reasoning behind each one.

DecisionRationaleTradeoff
Bronze preserves raw data exactly The Bronze layer stores data exactly as received — no trimming, no type casting, no business logic. This is a deliberate principle: if a downstream transformation turns out to be wrong, you need the ability to re-process from raw without going back to the source system. Bronze is the source of truth, not a staging area. Duplicate storage — keeping both raw and cleansed versions of the same data costs storage. Accepted as the right tradeoff because re-ingestion from external sources is more expensive than disk space in any real org.
T-SQL stored procedures over ad hoc scripts Each transformation layer is automated through a named stored procedure rather than a one-off query file. This makes the pipeline repeatable without manual intervention, auditable through execution history, and testable in isolation. It mirrors how production ETL automation actually works in SQL Server environments. Production-grade pattern — stored procedures are the SQL Server equivalent of dbt models or Databricks notebooks. The same pattern scales directly to enterprise environments without rearchitecting.
Surrogate keys in Silver, not Bronze Business keys (surrogate keys) are generated in Silver because that's where identity is first resolved. Bronze carries whatever natural keys came from the source system, which may be inconsistent or missing. Silver is where the warehouse takes ownership of a unique, stable identifier for each business entity. Clean identity lineage — every Gold fact table joins to the same surrogate key from Silver, making cross-table analysis consistent and preventing the key collisions that break reports in un-keyed warehouses.
Gold as views AND fact tables The Gold layer uses both materialized fact tables (for aggregated metrics that benefit from pre-computation) and SQL views (for analytical summaries that should reflect Silver updates in real-time). The choice between them depends on query frequency and update cadence — documented in the schema DDL. Two patterns to maintain — mixing tables and views in Gold adds a small maintenance surface. The alternative (all views) would be slower to query; all tables would require manual refresh on every Silver load.
Explore the Full Repository
Schema DDL · Stored procedures · Example analysis queries · Clutchlytics/gamepulse
🔗 View on GitHub