The Choco Effect: How a Dog Transformed My Running Data

Button hover demo

A decade of fitness tracking reveals an unexpected truth about consistency, companionship, and the stories hiding in our data


The Perfect Tracker’s Paradox

I was the model quantified-self runner: 2,593 workouts logged over 14 years. Every run tracked, every mile recorded, every pace calculated. MapMyRun dutifully collected it all while I… never actually looked at what it was telling me.

That changed when I finally exported my data and discovered something remarkable: I could pinpoint the exact month my life changed. Not through memory or photos, but through the dramatic shift in my running patterns: June 2018. The month I became a different kind of runner.

My pace from 2011-2025: Each workout is colored by average pace - JOGS, WALKS, and Bit of both? Notice how the density of workouts increases post-Choco arrival 🐾

The numbers were so dramatic I initially thought I’d made a data processing error:

That’s not improvement. That’s transformation. A 4.5x increase in consistency that happened virtually overnight and never reverted.

But the frequency change was just the beginning. The nature of my workouts had fundamentally shifted:

Bimodal Distribution Emerges: The violin plot shows pace distribution before/after June 2018. Before, tightly distributed around 9.3 min/mi. After, workouts are bimodally distributed with peaks at 10 min/mi (runs) and 24 min/mi (walks). (similar for duration and distance)🐾

Before Choco:

After Choco:

Something had fundamentally changed about how I exercised. And that something had four legs and a tail.


Meet Choco: The Data Scientist I Didn’t Know I Needed

🐕 The Technical Details of Dog-Driven Data

Choco, my Labrador Retriever, didn’t just join my workouts—he restructured them entirely. The data reveals two distinct activity profiles post-June 2018:

Profile 1: “Real Runs” (14% of workouts)

  • Pace: 8-12 min/mile
  • Distance: 3-8 miles
  • Pattern: Early morning, while Choco sleeps

Profile 2: “Choco Adventures” (76% of workouts)

  • Pace: 20-28 min/mile
  • Distance: 1-3 miles
  • Pattern: Any time, because every walk counts

The remaining 10%? That’s where it gets interesting—transition zones where I clearly couldn’t decide if we were running or walking.

Here’s what actually happened: In June 2018, I adopted a rescue dog who had her own ideas about exercise. Suddenly, my rigid “training runs” exploded into a spectrum of activities:

My carefully curated workout data became beautifully chaotic—and unexpectedly revealing.

[VISUALIZATION: The Consistency Revolution] Calendar heatmap showing workout frequency by day, 2015-2025. Sparse dots before June 2018 transform into an almost-daily pattern after. The “Choco line” is clearly visible.


The Paradox of Imperfect Data

The traditional data quality expert in me initially saw problems:

But the human in me saw the real story: Choco didn’t mess up my data—he revealed what actually drives exercise consistency.

📊 The Numbers Behind the Transformation

Consistency Metrics:

  • Longest streak pre-Choco: 14 days
  • Longest streak post-Choco: 247 days
  • Monthly variance pre-Choco: ±8.7 workouts
  • Monthly variance post-Choco: ±3.2 workouts

Behavioral Changes:

  • Morning workouts: 85% → 62% (dogs don’t care about your schedule)
  • Weekend activity: 2x increase (every day is workout day with a dog)
  • Seasonal consistency: Winter dropoff eliminated (dogs need walks year-round)

The insights hiding in this “messy” data were profound:

  1. Consistency beats intensity: My average pace slowed by 10 minutes/mile, but my fitness improved because I was moving every single day.

  2. Perfect is the enemy of good: When every walk “counted,” I stopped skipping workouts because they wouldn’t be “real runs.”

  3. External motivation works: Choco’s needs created a consistency no training plan ever achieved.


The Technical Challenge: Can Machines Learn the Difference?

This discovery led to an intriguing question: If the difference between my runs and dog walks is so clear in the data, can machine learning identify them without labels?

[INTERACTIVE ELEMENT: Predict the Activity Type] Quiz showing 5 workout metrics. Reader guesses “Run” or “Dog Walk” before revealing the answer and ML prediction.

The bimodal distribution in my pace data suggests clear clusters:

But here’s where it gets interesting: there’s a fuzzy middle ground where runs became walks, or walks became runs. These edge cases might reveal the most about how life happens in the margins of our planned activities.

Coming in Episode 2: Building a classifier to automatically identify workout types, and discovering what the “unclassifiable” workouts reveal about the beautiful messiness of real life.


Your Data Has Stories Too

The Choco Effect taught me that the most interesting insights often hide in what we consider “data quality issues.” Those inconsistencies, outliers, and sudden changes? They’re life happening.

🛠️ Try This With Your Own Data

Quick Analysis Checklist:

  1. Export your fitness data (Strava, Garmin, Apple Health, etc.)
  2. Look for sudden changes in:
    • Frequency patterns
    • Average metrics (pace, distance, duration)
    • Workout time distributions
  3. Ask yourself: What life change might explain this?
  4. Check if categories or labels changed around the same time

SQL Starter Query:

-- Find your "Choco moment"
WITH monthly_stats AS (
  SELECT 
    DATE_TRUNC('month', workout_date) as month,
    COUNT(*) as workout_count,
    AVG(distance) as avg_distance,
    AVG(pace) as avg_pace
  FROM workouts
  GROUP BY 1
)
SELECT 
  month,
  workout_count,
  workout_count - LAG(workout_count) OVER (ORDER BY month) as change
FROM monthly_stats
ORDER BY ABS(change) DESC
LIMIT 10;

The real lesson isn’t about dogs or running. It’s that our data tells stories we don’t expect. My “failed” attempt at maintaining pristine running data became a beautiful record of life change. The metrics got messy, but my habits got better.

What stories are hiding in your perfectly tracked imperfect life?


What’s Next

Episode 2: “Teaching Machines to Spot Dog Walks” - Can unsupervised learning identify workout types based purely on pace, distance, and duration patterns? More importantly, what do the edge cases teach us about the fuzzy boundaries in our categorized lives?

Episode 3: “The Weather Excuse Myth” - Combining workout data with historical weather reveals surprising patterns about what actually affects exercise consistency (spoiler: it’s not rain).


[Links Section]


Barbara is a data scientist who discovered that the best insights come from imperfect data. Her dog Choco is a better personal trainer than any app, though he refuses to wear a fitness tracker. Follow their data adventures at [barbhs.com].