Tags: , , ,

30-second version: Beehive Analytics transforms scattered inspection photos into a searchable, analyzable knowledge base. Upload photos, and the system automatically extracts timestamps, correlates weather conditions, runs computer vision to detect hive components, and renders everything on an interactive timeline—so you can finally answer questions like “what did inspections look like before the swarm?” or “how does brood pattern correlate with temperature?”

2-minute version When bees moved into our owl box in 2020, I did what any curious person would do: I started documenting. Four years later, I had hundreds of photos across phones, cameras, and cloud storage—all with names like "IMG_2847.jpg" and no reliable way to search, compare, or learn from them. The friction wasn't just organizational. I wanted to answer questions that required correlating multiple data sources: What were conditions like before the swarm? Is there a pattern between weather and foraging activity? Which frames showed early signs of problems I missed at the time? This platform addresses that gap by treating inspection photos as structured data rather than digital clutter. The system extracts EXIF metadata (using a multi-library approach to handle diverse camera formats), enriches it with historical weather data via Open-Meteo API, runs Google Cloud Vision to detect hive components and bee activity, and stores everything in a queryable format. The result is an interactive timeline where each inspection becomes a node with rich context—GPS coordinates, temperature, precipitation, dominant colors, detected objects—all without manual data entry. What was once a folder of mystery photos becomes a structured record of hive health over time.

Problem

Beekeepers capture inspection photographs but lack systematic tools to:

  • Track patterns over time with objective data rather than memory
  • Correlate visual observations with environmental conditions
  • Search historical data efficiently (“show me all inspections before swarm events”)
  • Make data-driven decisions about seasonal preparation and intervention timing

Current approaches require manual documentation. Patterns that span seasons—queen health trends, forage availability cycles, early pest indicators—rarely make it into a system of record because the overhead is too high.

The Real Friction

Challenge Impact
Photos scattered across devices Can’t find specific inspections when needed
Unhelpful filenames (IMG_2847.jpg) No searchability without manual renaming
Weather context lost “Was it hot that day?” requires separate lookup
Visual patterns invisible Subtle changes over months go unnoticed
Manual logging abandoned Too much friction → incomplete records

How It Works

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Upload    │───▶│    EXIF     │───▶│     GPS     │
│   Photos    │    │ Extraction  │    │ Validation  │
└─────────────┘    └─────────────┘    └─────────────┘
                                            │
       ┌────────────────────────────────────┘
       ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Weather   │◀───│  Computer   │◀───│   Color     │
│ Integration │    │   Vision    │    │  Analysis   │
└─────────────┘    └─────────────┘    └─────────────┘
       │
       ▼
┌─────────────┐    ┌─────────────┐
│  Structured │───▶│  Timeline   │
│   Storage   │    │    View     │
└─────────────┘    └─────────────┘
Stage What Happens
Photo Upload Multi-format support; inspection grouping wizard
EXIF Extraction Three-library cascade (PIL → ExifRead → PyExifTool) handles diverse camera formats
GPS Validation Coordinate normalization; location verification
Color Analysis Dominant palette extraction for honeycomb health indicators
Computer Vision Google Cloud Vision API: label detection, object localization, confidence scoring
Weather Integration Open-Meteo API lookup by GPS + timestamp → temperature, precipitation, wind
Structured Storage JSON/CSV with storage abstraction (local → S3 → GCS ready)
Timeline Rendering Interactive Plotly timeline, calendar heatmap, photo gallery
Multi-Library EXIF Strategy Different cameras encode EXIF data differently—smartphones vs DSLRs vs action cameras. A single library fails on ~30% of real-world photos. The solution: cascade through three libraries in order of speed vs comprehensiveness. ```python def extract_metadata(image_path): """Try multiple libraries with fallback.""" metadata = {} # Fast path: PIL handles most smartphone photos metadata = try_pil_extraction(image_path) if has_required_fields(metadata): return metadata # Fallback: ExifRead for more formats metadata = merge(metadata, try_exifread(image_path)) if has_required_fields(metadata): return metadata # Last resort: PyExifTool (most comprehensive) return merge(metadata, try_exiftool(image_path)) ``` This approach reduced metadata extraction failures from 30% to under 5%.
Weather Data Integration Open-Meteo provides historical weather data with: - 40+ year archive - Hourly granularity - Free tier covering typical usage (50-100 inspections/month) - No authentication overhead For each photo with valid GPS and timestamp, the system fetches: - Temperature (°C) - Precipitation (mm) - Cloud cover (%) - Wind speed (km/h) - Weather condition codes Caching prevents redundant API calls for photos from the same location/day.

What Shipped

Core Features

  • Interactive Timeline: Chronological inspection history with Plotly; click-to-select inspections
  • Calendar View: Multiple modes (day grid, resource timeline, list); color-coded by photo count
  • Photo Gallery: Grid-based browsing organized by inspection date
  • Bulk Import Wizard: 4-step process supporting local directories, AWS S3, URL lists

Data Extraction

  • Multi-Library EXIF: PIL/ExifRead/PyExifTool cascade for 95%+ success rate
  • GPS Normalization: Format validation and coordinate standardization
  • Color Palette Analysis: Dominant color extraction using ColorThief; honeycomb health indicators
  • Timezone Handling: Multi-format timestamp parsing with timezone support

API Integration

  • Google Cloud Vision: Label detection, object localization, image properties with custom confidence thresholds optimized for agricultural imagery
  • Open-Meteo Weather: Historical weather correlation by location and timestamp
  • Rate Limiting: Retry mechanisms for API resilience

Infrastructure

  • Storage Abstraction: Pluggable backends (local → S3 → GCS) without code changes
  • Session State Management: Multi-page Streamlit navigation with persistent state
  • Docker Deployment: Production image with health checks; Cloud Run ready
  • CSV/JSON Export: External analysis compatibility (R/Python)

Architecture

Layer Components Technology
UI Multi-page app, visualizations Streamlit 1.44+, Plotly 6.0
Processing Metadata extraction, analysis PIL, ColorThief, ExifRead, PyExifTool
API External integrations Google Cloud Vision 3.7, Open-Meteo
Storage Data persistence JSON/CSV, S3-compatible abstraction
Why Streamlit? The choice of Streamlit over traditional web frameworks was deliberate: - **80% faster development** for data-centric UIs - **Built-in session state** handles complex multi-step workflows - **Native Pandas/Plotly integration** without custom API layers - **Single Python codebase** simplifies deployment and maintenance Trade-offs accepted: less UI customization, Streamlit-specific patterns required.
Storage Abstraction Pattern The storage layer uses a simple abstraction that enables seamless migration: ```python class StorageBackend(Protocol): def read(self, key: str) -> bytes: ... def write(self, key: str, data: bytes) -> None: ... def list(self, prefix: str) -> list[str]: ... # Implementations class LocalStorage(StorageBackend): ... class S3Storage(StorageBackend): ... class GCSStorage(StorageBackend): ... ``` Benefits: - Zero breaking changes during migration - Atomic write operations with backup - Version control friendly formats

Implementation Notes

Testing Strategy

Risk-based approach targeting high-value coverage:

Risk Level Examples Testing Approach
High Core flows, session state, API calls Full test coverage with mocks
Medium Secondary features, UI logic Standard unit tests
Low Styling, configuration Minimal/manual testing

All API tests use mocks—no real API calls in test suite.

Key Design Decisions

  1. Multi-library EXIF extraction addresses real-world camera diversity
  2. Weather integration at ingest time enriches data before storage
  3. Storage abstraction enables cloud migration without refactoring
  4. Confidence thresholds tuned for agricultural imagery (bees, honeycomb, frames)

What’s Next

Current Phase (Storage Optimization)

  • Cloud storage integration (S3/GCS) for production deployment
  • Backup and sync strategies

Future Enhancements

  • Graph database for relationship queries (“inspections before swarm events”)
  • Custom CV model trained on beekeeping-specific imagery
  • Mobile-friendly capture interface
  • Multi-hive comparison views
  • Community data sharing (anonymized patterns)

View Repository Read Documentation