GamePulse Data Warehouse & Analytics Stack - Data-Driven Marketing & Data Professional

Data Engineering 🏗 Architect ◆ Independent

GamePulse —
Sports Data Warehouse
& Analytics Stack

A full-stack data engineering environment that models how modern sports organizations build enterprise-grade data infrastructure — end-to-end ETL pipeline using the Bronze → Silver → Gold medallion pattern in SQL Server, with T-SQL stored procedures automating each transformation layer and Python driving realistic data generation.

SQL ServerT-SQLPythonMedallion ArchitectureStored ProceduresETL PipelineDimensional ModelingSurrogate Keys

🔗 View on GitHub ← All Projects

The Problem

Raw data everywhere.
Trusted data nowhere.

Sports organizations generate data from dozens of sources — ticketing systems, CRM platforms, email tools, merchandise systems, and more. Each one produces raw, messy, inconsistently formatted records that can't be queried together without significant transformation work.

The typical result is a reporting environment built on ad hoc queries, fragile spreadsheet joins, and analysts spending 80% of their time cleaning data before they can ask a single business question. The data exists. The infrastructure to trust it doesn't.

GamePulse was built to demonstrate exactly what that infrastructure looks like — a three-layer medallion warehouse where every transformation is documented, every business key is managed, and the Gold layer delivers analytical datasets that power dashboards and models without any further cleaning required.

The goal wasn't just to build a warehouse. It was to build one that reflects how a real analytics team would structure a repeatable, scalable data foundation for a sports org — modular, extensible, and ready for Power BI or Tableau on day one.

Gap 01 · No Trusted Layer

Raw source data from ticketing, CRM, and email tools can't be queried reliably without standardization. The Silver layer solves this with consistent naming, data types, and keys.

Gap 02 · No Analytical Model

Even clean data isn't analytics-ready until it's shaped into facts and dimensions. The Gold layer delivers the aggregated, business-keyed outputs that reporting tools actually consume.

Gap 03 · Manual ETL

Without stored procedures, every transformation run requires manual scripting. T-SQL stored procedures automate the Silver and Gold layer builds, making the pipeline repeatable and auditable.

Gap 04 · No Data Lineage

Surrogate key management and automated load timestamps create an auditable lineage from raw ingestion through to the final analytical layer — so you always know where a number came from.

My Role

Designed and built
every layer.

🏗

What I Built

Designed the full three-schema medallion architecture in SQL Server — Bronze, Silver, and Gold schemas each with a defined purpose, grain, and transformation contract
Built all Bronze layer ingestion tables — raw data landing zones accepting CSV, API, and Excel inputs with minimal transformation and full source fidelity
Built all Silver layer transformation logic — data cleansing, standardization, consistent naming conventions, and surrogate key generation via T-SQL stored procedures
Built the Gold layer analytical model — fact tables and dimension views designed for direct consumption by Power BI, Tableau, or ad hoc SQL analytics
Wrote all T-SQL stored procedures for Silver and Gold automation — each layer runs as a controlled, repeatable procedure with consistent load timestamps and error-safe design
Generated all source data with Python — faker-based realistic datasets covering customer profiles, ticket orders, and campaign events with intentional imperfections and variability

🔬

Design Philosophy

Industry-standard medallion pattern — Bronze, Silver, and Gold mirror the architecture used in modern enterprise data platforms including Databricks, Azure Synapse, and Snowflake
Schema-driven, not script-driven — each transformation layer is defined by its schema contract, not by one-off query logic that breaks on re-run
Extensible by design — the warehouse can absorb additional sources (team stats, financial data, marketing performance) by adding Bronze tables and corresponding Silver/Gold procedures without touching existing logic
Reporting-ready Gold layer — Gold outputs are structured specifically for BI tool consumption: pre-aggregated, properly keyed, and free of transformation debt
Available on GitHub — full DDL, stored procedures, and example analysis queries published for review

The Approach

Ingestion to insight,
one layer at a time

Five stages — each one building on the last, each one independent enough to test and validate before the next begins.

Schema Design

Designed the three-schema SQL Server architecture — Bronze, Silver, Gold — with DDL defining tables, keys, and load metadata fields.

Data Generation

Built Python scripts using faker and pandas to generate realistic, imperfect source datasets — customers, ticket orders, campaign events with natural variability.

Bronze Ingestion

Loaded raw source data into Bronze tables as-is. No transformation. Full source fidelity preserved for auditability and re-processing.

Silver Transform

T-SQL stored procedures run Bronze → Silver: trim, capitalize, standardize naming, assign surrogate keys, apply consistent data types.

Gold Modeling

Gold stored procedures build fact tables and dimension views from Silver — pre-aggregated, BI-ready outputs for reporting and analytical consumption.

The Stack

Tools & architecture

Three tools with clearly separated responsibilities — and a full diagram of the medallion schema showing every table across all three layers.

🕶

SQL Server (SSMS)

Core data warehouse host — all three schemas, DDL, and stored procedure execution environment

📑

T-SQL Stored Procs

Silver and Gold layer automation — repeatable, auditable transformation pipelines with load timestamps

Python

Faker-based realistic dataset generation — customers, ticket orders, campaign events with natural variability

GamePulse Medallion Architecture

SQL Server · Bronze → Silver → Gold · T-SQL Stored Procedures

Full DDL on GitHub

Architecture Breakdown

Three layers.
One clean data lifecycle.

Each layer has a strict contract — what it accepts, what it does, and what it produces. No layer does another layer's job.

Layer	Purpose	Key Tables	Automation
Bronze	Raw ingestion — data lands exactly as received from source systems. No transformation, no business logic. Source fidelity preserved for re-processing and auditability.	`bronze.sm_customer` `bronze.trs_ticket_orders` `bronze.email_campaigns`	Python loader script (pyodbc + sqlalchemy)
Silver	Cleansed and conformed — trimmed, capitalized, standardized naming and data types. Surrogate keys generated. Each table maps one-to-one with its Bronze source but is trusted for analytical use.	`silver.sm_customer` `silver.trs_ticket_orders` `silver.campaign_events`	T-SQL stored proc `usp_LoadSilver`
Gold	Aggregated and analytical — fact tables and dimension views shaped specifically for BI consumption. Pre-joined, pre-aggregated, and free of transformation debt. Power BI or Tableau connects here directly.	`gold.fct_ticketing` `gold.dim_customers` `gold.vw_email_performance`	T-SQL stored proc `usp_LoadGold`

Gold Layer Outputs

What the Gold layer enables

Every Gold output is pre-built for a specific analytical question. No transformation required at query time.

🎫

Customer Segmentation & Retention

Cohort analysis by acquisition channel, purchase frequency, and LTV tier. Renewal rate trends and churn risk signals from vw_customer_retention.

🎫

Ticketing & Revenue Optimization

Aggregated revenue by segment, channel, date, and event type from fct_ticketing. Directly feeds Power BI KPI dashboards.

📧

Email Campaign Performance

Open rate, click rate, and conversion analysis by campaign and segment from vw_email_performance. Highlights underperforming sends and engagement trends.

👥

Customer Dimension

Business-keyed customer dimension from dim_customers with surrogate keys, segment labels, and LTV signals — the shared key that connects all fact tables.

Scale

The stat sheet

3 layers

Medallion Architecture

Bronze, Silver, Gold — each with a defined purpose, schema contract, and independent test path.

2 procs

T-SQL Automation

usp_LoadSilver and usp_LoadGold automate the full transformation pipeline — repeatable, auditable, timestamp-tracked.

4+ tables

Gold Analytical Layer

Fact tables and dimension views purpose-built for BI consumption — no further cleaning or joining required at query time.

🔗

Open on GitHub

Full DDL, stored procedures, and example analysis queries available for review at the Clutchlytics/gamepulse repository.

Architect's Lens

Key decisions & tradeoffs

The design choices that shaped the architecture — and the reasoning behind each one.

Decision	Rationale	Tradeoff
Bronze preserves raw data exactly	The Bronze layer stores data exactly as received — no trimming, no type casting, no business logic. This is a deliberate principle: if a downstream transformation turns out to be wrong, you need the ability to re-process from raw without going back to the source system. Bronze is the source of truth, not a staging area.	Duplicate storage — keeping both raw and cleansed versions of the same data costs storage. Accepted as the right tradeoff because re-ingestion from external sources is more expensive than disk space in any real org.
T-SQL stored procedures over ad hoc scripts	Each transformation layer is automated through a named stored procedure rather than a one-off query file. This makes the pipeline repeatable without manual intervention, auditable through execution history, and testable in isolation. It mirrors how production ETL automation actually works in SQL Server environments.	Production-grade pattern — stored procedures are the SQL Server equivalent of dbt models or Databricks notebooks. The same pattern scales directly to enterprise environments without rearchitecting.
Surrogate keys in Silver, not Bronze	Business keys (surrogate keys) are generated in Silver because that's where identity is first resolved. Bronze carries whatever natural keys came from the source system, which may be inconsistent or missing. Silver is where the warehouse takes ownership of a unique, stable identifier for each business entity.	Clean identity lineage — every Gold fact table joins to the same surrogate key from Silver, making cross-table analysis consistent and preventing the key collisions that break reports in un-keyed warehouses.
Gold as views AND fact tables	The Gold layer uses both materialized fact tables (for aggregated metrics that benefit from pre-computation) and SQL views (for analytical summaries that should reflect Silver updates in real-time). The choice between them depends on query frequency and update cadence — documented in the schema DDL.	Two patterns to maintain — mixing tables and views in Gold adds a small maintenance surface. The alternative (all views) would be slower to query; all tables would require manual refresh on every Silver load.

Explore the Full Repository

Schema DDL · Stored procedures · Example analysis queries · Clutchlytics/gamepulse

🔗 View on GitHub

Raw data everywhere.Trusted data nowhere.

Designed and builtevery layer.

Ingestion to insight,one layer at a time