Scripts Directory
Automation scripts for maintaining the Jekyll site.
Sitemap Generator
File: generate_sitemap.py
Automatically generates an interactive Mermaid diagram sitemap by scanning the site’s content structure.
Quick Start
# Install dependencies (one time)
pip install -r scripts/requirements.txt
# Run the generator
python scripts/generate_sitemap.py
What It Does
- Scans all Jekyll content directories:
_posts/— Blog posts_projects/— Project showcase items_thinking/— Essay collection_resources/— Templates and guidesdata-stories/— Technical narratives_pages/— Static pages
- Parses YAML front matter from each markdown file to extract:
- Title
- Permalink/URL
- Tags and categories
- Status indicators
- Auto-detects content status:
- 🚧 WIP — Work in progress items
- 📌 Pinned — Featured/foundational content
- ✅ Active — Live/production projects
- Generates Mermaid diagram with:
- Hierarchical structure (Home → Collections → Items)
- Clickable nodes (links to live site)
- Color-coded content types
- Status indicators
- Updates two files:
SITEMAP.md(repository documentation)_pages/site-architecture.md(live site page)
Configuration
Edit these constants at the top of generate_sitemap.py:
# Base URL for your live site
SITE_URL = "https://barbhs.com"
# Directories to scan
CONTENT_DIRS = {
"posts": "_posts",
"projects": "_projects",
# ...
}
# Status detection keywords
STATUS_KEYWORDS = {
"wip": ["wip", "work in progress", "draft"],
"pinned": ["pinned", "featured"],
"active": ["active", "live"]
}
Front Matter Examples
The script looks for these fields in your markdown front matter:
Explicit status:
---
title: "My Project"
permalink: /projects/my-project/
status: wip # Auto-detected as 🚧 WIP
---
Keyword detection in title/excerpt:
---
title: "My Project (WIP)" # Auto-detected as 🚧 WIP
permalink: /projects/my-project/
---
Pinned content:
---
title: "Important Post"
status: pinned # Auto-detected as 📌 Pinned
---
Output Example
graph TB
Home[🏠 Home Page]
Home --> Projects[📊 Projects]
Projects --> PROJ1[My Project<br/>🚧 WIP]
click Home "https://barbhs.com" "Visit Home"
click Projects "https://barbhs.com/projects/" "View Projects"
click PROJ1 "https://barbhs.com/projects/my-project/" "My Project"
classDef wip fill:#f8d7da,stroke:#842029
class PROJ1 wip
When to Run
Run the script whenever you:
- Add new blog posts
- Create new projects/essays/resources
- Change content structure
- Update content status (WIP → Active, etc.)
Troubleshooting
Error: “No module named ‘frontmatter’“
- Run:
pip install -r scripts/requirements.txt
Warning: “Could not parse [file]”
- Check that the file has valid YAML front matter
- Ensure front matter is at the top of the file
- Verify YAML syntax (proper indentation, quotes)
Sitemap not updating on site
- The script updates markdown files only
- Jekyll needs to rebuild the site
- If using GitHub Pages, push changes to trigger rebuild
- If local, run
bundle exec jekyll serve
Future Enhancements
Ideas for extending the script:
- Group blog posts by series
- Add year/month nodes for blog archive
- Generate topic-based alternate views
- Include post counts in collection nodes
- Add interactive filtering options
- Generate multiple diagram layouts
Contributing
When modifying the script:
- Maintain detailed comments (blog article-ready)
- Add examples for new features
- Update this README
- Test with your actual content
- Verify both SITEMAP.md and site-architecture.md update correctly
Metadata Validation Script
File: validate_metadata.py
Ensures consistent, high-quality metadata across all content collections by validating front matter fields, excerpt quality, and taxonomy standards.
Quick Start
# Install dependencies (one time)
pip install pyyaml
# Validate all collections
python scripts/validate_metadata.py
# Validate specific collection
python scripts/validate_metadata.py --collection posts
What It Checks
All Content:
- ✓ Required fields present (title, excerpt, tags, dates)
- ⚠️ Recommended fields (subtitles, header images, last_modified_at)
- ✓ Excerpt length (ideal: 150-300 characters)
- ✓ Tag formatting (hyphens instead of spaces)
- ✓ Date formatting (YYYY-MM-DD or full timestamp)
Collection-Specific Requirements:
| Collection | Required | Recommended |
|---|---|---|
| Posts | title, date, excerpt, tags, categories | subtitle, header.overlay_image, header.teaser, stack |
| Projects | title, permalink, excerpt, tags, stack, status, header | header.teaser, header.actions, docs_url |
| Thinking | title, date, excerpt, tags, categories, permalink, header | subtitle, header.overlay_image, teaser |
| Resources | title, permalink, excerpt, date, tags, format, level | subtitle, download_url, cognitive_principle |
| Data Stories | layout, title, excerpt, permalink, date, tags, stack, header | header.teaser, last_modified_at |
| Snippets | title, date, status, source_type, source_title, highlight | takeaway, tags, topics, impact |
Validation Rules
Excerpt Quality:
- ❌ Too short (< 150 chars): Insufficient for previews
- ✅ Ideal (150-300 chars): Perfect for search results
- ⚠️ Too long (> 300 chars): Gets truncated
Tag Standards:
# ❌ Bad - spaces cause URL issues
tags: [data science, machine learning]
# ✅ Good - hyphens create clean URLs
tags: [data-science, machine-learning]
Stack vs Tags:
- tags: Concepts (e.g.,
iot,tutorial,data-visualization) - stack: Technologies (e.g.,
Python,Arduino,AWS RDS)
Snippet Status:
# ✅ Valid statuses
status: inbox
status: garden
Example Output
================================================================================
METADATA VALIDATION REPORT
================================================================================
Total files checked: 42
Files with issues: 3
Files OK: 39
────────────────────────────────────────────────────────────────────────────────
⚠️ POSTS (3 files with issues)
────────────────────────────────────────────────────────────────────────────────
📄 _posts/2022-09-16-example.md
• Missing recommended: subtitle, header.overlay_image
• Excerpt too short (45 chars, recommend 150-300)
• Tags with spaces (use hyphens): ['data science']
================================================================================
⚠️ Found issues in 3 files
================================================================================
When to Run
Run this script:
- Before committing major content changes
- Monthly to catch metadata drift
- After adding new content
- When updating metadata standards
Troubleshooting
“YAML parsing error”
- Check for unescaped special characters
- Ensure proper indentation (spaces, not tabs)
- Quote strings with colons:
title: "Project: Phase 1"
“Collection path does not exist”
- Use collection name without underscore:
posts, not_posts - Run from site root directory
Part of the dagny099.github.io repository Maintained by Barbara Hidalgo-Sotelo