Back to Portfolio

Grubhub Data Platform Modernization

Enterprise ETL infrastructure processing 500K+ daily orders with automated compliance

Production System
Food Technology
12 month project

Executive Summary

110
ETL Pipelines
500K+
Daily Orders
+10%
Order Volume
75%
Faster Processing

Following Grubhub's acquisition of Tapingo, I led the complete modernization of the data infrastructure, migrating from legacy Python 2.7 systems to a modern cloud-native platform. The project involved rebuilding over 45 ETL pipelines using over 110 sources from MongoDB, MySQL, Salesforce API, and Zendesk API. This migration supported care, account managers, product, growth, marketing, finance, and logistics teams, including the migration of analytics for all teams. The modernization implemented automated NASDAQ reporting compliance and integrated campus order data that increased total reportable volume by 10%.

The Challenge

Business Context

Grubhub's acquisition of Tapingo created a critical data integration challenge. Campus dining orders representing 10% of total volume were not included in financial reporting, creating NASDAQ compliance issues and understating the company's true market performance.

  • Legacy Python 2.7 infrastructure approaching end-of-life
  • 40+ fragmented ETL processes with no standardization
  • Campus orders missing from NASDAQ financial reporting
  • No data quality monitoring or validation framework

Technical Challenges

Legacy Infrastructure

Multiple stakeholders (Finance, Product, Care, Campus, Partnerships) relying on inconsistent data

Data Migration Complexity

MySQL 5.7 to 8.0 migration with zero downtime requirements for 500K+ daily orders

Compliance Requirements

NASDAQ reporting standards and CCPA data governance across 4 data marts

Scale Requirements

Peak 500K daily orders during school year, dropping to 50K in summer

Technical Architecture

Modern Data Pipeline Architecture

Source Layer

MySQL 5.7 → 8.0
DMS Binlogs
Real-time CDC

Bronze Layer

S3 Hive Storage
Incremental Loads
Raw Data Archive

Silver Layer

Data Validation
Quality Checks
Business Rules

Gold Layer

Integrated Tables
Business Ready
NASDAQ Reporting

Modern Tech Stack

Py
Python 3.9 + PySpark
Migrated from legacy Python 2.7
EMR
AWS EMR Clusters
Auto-scaling Spark processing
S3
S3 Data Lake
Hive-partitioned storage
DMS
AWS DMS
MySQL binlog replication

Infrastructure & Tools

Az
Azkaban Scheduler
Workflow orchestration
Jk
Jenkins CI/CD
Automated deployments
Tr
Trino (Presto)
Distributed SQL queries
PG
PostgreSQL
QA and validation layer

My Implementation Approach

1. Config-Based ETL Framework Development

I designed and built a standardized, configuration-driven ETL framework that eliminated code duplication across 110 pipelines while ensuring consistent data quality and compliance standards.

Framework Benefits

  • • 80% reduction in development time
  • • Standardized patterns across all pipelines
  • • Built-in data quality validation
  • • Automatic schema evolution handling

Quality Assurance

  • • Column-level validation daily
  • • Source-to-integrated reconciliation
  • • Automated data lineage tracking
  • • CCPA compliance built-in

2. Zero-Downtime Migration Strategy

Executed a phased migration approach that maintained business continuity while modernizing the entire infrastructure from Python 2.7 to Python 3.9 with PySpark on AWS EMR.

Migration Timeline

Phase 1: Core Financial ETLs6 months
Phase 2: Full 110 Source Integration6 months
Campus Orders → NASDAQ ReportingOn schedule

3. Cross-Functional Stakeholder Management

Coordinated with Finance, Product, Care, Campus, and Partnership teams to ensure the new platform met diverse requirements while maintaining data consistency across all business units.

Stakeholder Outcomes

  • • Finance: Automated NASDAQ compliance
  • • Product: Real-time analytics across platforms
  • • Care: Unified customer data view
  • • Campus: Dedicated reporting infrastructure

Technical Delivery

  • • 45 integrated tables across 4 marts
  • • 99.9% data quality validation accuracy
  • • Zero business disruption during migration
  • • 75% improvement in processing speed

Results & Business Impact

Quantified Outcomes

Order Volume Increase
Campus orders in NASDAQ reporting
+10%
Peak Daily Orders
During school year processing
500K+
Processing Speed Improvement
ETL runtime optimization
75%
Data Quality
Column-level validation accuracy
99.9%

Business Value Delivered

Revenue Recognition
$50M+ in previously unreported campus order revenue now included in NASDAQ filings
Operational Efficiency
75% reduction in ETL processing time enabling faster business insights
Compliance Automation
100% automated NASDAQ reporting with built-in CCPA compliance
Platform Scalability
Infrastructure scales from 50K summer orders to 500K+ during peak school year

Strategic Business Impact

$50M+
Revenue now properly reported
Zero
Business disruption during migration
100%
On-time, on-budget delivery

Technologies & Infrastructure

Data Processing

Python 3.9PySparkAWS EMRS3DMS

Infrastructure

AzkabanJenkinsTrinoPostgreSQLDocker

Data Architecture

HiveData LakeCDCETL FrameworkQuality Assurance

Need Enterprise Data Platform Expertise?

I specialize in modernizing legacy data infrastructure at scale. Let's discuss how I can help transform your data platform to drive measurable business outcomes.