Data Engineering ๐Ÿ— Architect + Product Owner + Builder โ—† Independent Project โš  Simulated Data
BackNineIQ โ€”
Golf Club Visit Analytics
Private golf clubs had no way to connect a member's golf visit to what they spent elsewhere on property that same day โ€” three systems, zero shared keys, and no infrastructure to answer the most operationally important question.
Databricks Python SQL Identity Resolution Streamlit Medallion Architecture Product Design Unity Catalog
The Problem

Three systems.
Zero shared keys.

Private clubs have a data fragmentation problem that is specific, common, and fixable. Membership data lives in a CRM. Golf activity is captured in a tee sheet platform. On-site spending sits in a POS system. None of these share a reliable common key.

Names are formatted inconsistently across systems, emails are sometimes missing from the tee sheet, and the POS has no concept of a member ID. The result is that operators, general managers, golf operations leads, and membership teams cannot connect golf play to spending behavior at the member level.

They run three separate reports and try to reconcile them manually. Key decisions about staffing, merchandising, and member engagement get made without the data that would make them better. The problem isn't the data โ€” it's that no one has built the plumbing to connect it.

BackNineIQ is that plumbing. Built with synthetic but controlled data to simulate a realistic club environment, it answers one question: when a member comes to play golf, what else do they do and spend?

Gap 01 ยท Data Silos
CRM, tee sheet, and POS operate independently with no shared member identity key โ€” no native way to join records across systems.
Gap 02 ยท Dirty Identity Fields
Names formatted differently across systems, missing emails on tee sheet rows, and the POS carries no concept of a member ID at all.
Gap 03 ยท No Analytical Layer
Even with clean sources, there was no reusable fact table linking golf visits to same-day spend โ€” every query would require rebuilding that logic from scratch.
Gap 04 ยท No Stakeholder Surface
No reporting layer designed around the specific operational questions GMs, F&B directors, and membership staff actually ask โ€” not a data problem, a product problem.
My Role

Built end to end

๐Ÿ—
What I Owned
  • Defined the product question and framed the analytics problem the pipeline needed to answer before writing any code
  • Designed the full medallion architecture โ€” Bronze ingestion, Silver identity resolution, Gold fact/dimension modeling, Platinum reporting layer
  • Built the identity resolution logic โ€” deterministic matching across CRM, tee sheet, and POS using email, phone, and normalized name matching
  • Wrote all Python ETL pipelines and orchestrated them in Databricks with Unity Catalog for layer management
  • Designed and built the Streamlit app โ€” three pages mapped to three distinct stakeholder personas with intentional UX decisions
  • Documented all tradeoffs and architectural decisions โ€” built it to be explainable, not just functional
๐Ÿ“‹
Scope & Constraints
  • All data is simulated โ€” a controlled synthetic environment designed to behave like a real club, not a production client engagement
  • Single-developer build โ€” no engineering team, no QA, no dedicated infrastructure budget
  • Streamlit chosen deliberately over Power BI โ€” faster iteration on persona-specific views, Databricks-native deployment, full layout control
  • Deterministic matching only โ€” no probabilistic identity methods in v1; accepted lower initial match rate in exchange for zero false joins
  • v1 scope โ€” membership-type comparison page, spend-link audit page, and fuzzy matching are backlogged for v2
The Approach

From raw exports
to live app

Five stages. Each one cleanly separated so the pipeline is explainable, testable, and extensible.

01
Source Sim
Generated realistic messy source data โ€” CRM export, tee sheet JSON, POS CSV โ€” with controlled overlap and intentional dirty fields.
02
Bronze Ingest
Loaded raw files into Databricks Unity Catalog. Preserved source truth exactly โ€” no business logic, no transformations at this layer.
03
Silver + Identity
Standardized all three sources. Built deterministic match logic to resolve a single backnine_global_id per resolved member.
04
Gold + Platinum
Built reusable fact/dimension models in Gold. Then Platinum app-ready summaries โ€” keeping presentation logic out of the analytical layer.
05
App Deployment
Built a three-page Streamlit app on Databricks Apps, querying Gold and Platinum directly via SQL Connector with service principal auth.
The Stack

Tools & architecture

Every tool chosen for a specific reason โ€” and a Lucid-style medallion flow diagram built from the architecture document.

Databricks
Pipeline orchestration, Unity Catalog, Apps deployment
Python
ETL pipelines, identity resolution, data simulation
SQL
Schema design, Silver/Gold transformation queries
Streamlit
Three-page stakeholder app on Databricks Apps
๐Ÿ—‚๏ธ
Unity Catalog
Medallion layer governance and table management
Medallion Pipeline Architecture
Source โ†’ Bronze โ†’ Silver โ†’ Gold โ†’ Platinum โ†’ App
BackNineIQ v1
SOURCES BRONZE SILVER GOLD PLATINUM APP CRM Excel Export Members Tee Sheet JSON API Bookings POS CSV Export Transactions crm_members _raw RAW ยท NO LOGIC tee_sheet _bookings_raw RAW ยท NO LOGIC pos_transactions _raw RAW ยท NO LOGIC silver.members_clean Standardized ยท Normalized tee_sheet_clean valid_play_row_flag silver.pos_clean Parsed Dates ยท Amounts IDENTITY RESOLUTION identity_candidates identity_matches backnine_identity_map โ†’ backnine_global_id dim_member _profile DIMENSION fct_member _visit_activity PRIMARY FACT golf visit + same-day spend bridge_visit _transactions BRIDGE rpt_member_visit _summary KPI ยท TREND READY rpt_member _engagement_summary RANK ยท SEGMENT READY STREAMLIT APP Page 1 Executive Overview โ†’ GM persona platinum.rpt_visit_summary Page 2 Member Engagement โ†’ Membership Mgr gold.dim + fct_visit rpt_engagement_summary Page 3 Visit Detail Explorer โ†’ F&B Director gold.fct_member_visit _activity (filtered) โ€”โ€” Standard flow - - - Identity / passthrough All tables: workspace.[layer].[table_name]
The Output

What was delivered

A live Databricks Streamlit app with three stakeholder views โ€” each page answering exactly one operational question.

BackNineIQ Executive Overview โ€” Page 1
Page 1: Executive Overview โ€” the General Manager view. KPI strip, visit and spend trend charts, membership-type comparison, and a summary table. Powered by platinum.rpt_member_visit_summary.
Streamlit App 3 Pages Live on Databricks Simulated Data
Results

The stat sheet

4+
Data Sources Unified
CRM, tee sheet, and POS joined under a single backnine_global_id โ€” plus the identity bridge layer.
Pt.
Full Medallion Pipeline
Bronze โ†’ Silver โ†’ Gold โ†’ Platinum. Every layer documented, schema-defined, and purpose-separated.
3ร—
Stakeholder Personas
GM, F&B Director, Membership Manager โ€” each page answers exactly one core operational question.
100%
Deterministic Identity
Zero probabilistic matching. Every resolved join has a confirmed shared key with documented fallback logic.
TPM Lens

Key decisions & tradeoffs

The choices that shaped the architecture โ€” and what was knowingly accepted or given up.

Decision Rationale Tradeoff
Deterministic identity only A confirmed match is always more valuable than a probable one. Stakeholder trust in the data depends on confidence that joined records are real โ€” not inferred. Probabilistic methods would require a confidence threshold that no one in a v1 product has calibrated yet. Lower initial match rate โ€” some members without consistent cross-system fields are excluded from analysis until source data quality improves at origin.
Identity resolved in Silver, not Gold The backnine_global_id is solved once and reused everywhere. New source systems can integrate into the identity model without changing any Gold design. Clean separation of concerns โ€” Silver owns identity, Gold owns business behavior. No logic leaks between layers.
Streamlit over Power BI Databricks-native deployment, Python-first development, full layout control per persona, and no licensing constraints. Each page could be tuned independently without fighting a shared report canvas. Lower executive familiarity โ€” Power BI is the known tool in most club contexts. Streamlit requires a URL deployment step that embedded Power BI avoids.
Platinum layer for app-ready outputs The core business question requires a pre-joined, pre-aggregated surface. Pushing that logic into the Streamlit app would couple presentation to transformation โ€” making both harder to maintain and test. Extra pipeline stage and Platinum refresh latency, accepted in exchange for faster app queries and cleaner Streamlit code.
Three separate app pages vs. one unified dashboard Each stakeholder asks fundamentally different questions. A unified dashboard would require cognitive filtering that reduces adoption. One question per page is a product decision, not an engineering shortcut. More maintenance surface โ€” three views to update when schema changes. Accepted as the correct product tradeoff for usability and stakeholder adoption.
Gallery

Screenshot gallery

Every app view, pipeline artifact, and data layer โ€” click any screenshot to expand.

ร—