📚 Personal Reading Analytics

A zero-infrastructure reading analytics pipeline that tracks personal engineering blog consumption and generates insights via Go, Python, and Google Gemini.

Architecture Blueprint

     ┌───────────────┐
     │  Google Sheet │
     └───────────────┘
             ▲
             │ (Read/Write Articles & Sources)
             ▼
     ┌───────────────┐     (Fetch Metadata)     ┌──────────────────────┐
     │  Python Extr  │ ───────────────────────> │ RSS/Web Blogs        │
     │  (Extraction) │                          └──────────────────────┘
     └───────────────┘
             │
             │ (Log events)
             ▼
     ┌───────────────┐
     │  MongoDB      │
     └───────────────┘
             ▲
             │ (Query stats)
             ▼
     ┌───────────────┐
     │  Go Metrics   │
     └───────────────┘
             │
             │ (Generate JSON)
             ▼
     ┌───────────────┐
     │ Go Dashboard  │ ───────────────────────> [ Static HTML/CSS ]
     │ (Generator)   │                          (GitHub Pages)
     └───────────────┘

Core Components

Extractor

Universal, configuration-driven Python extractor that automates RSS/web metadata collection, logging event traces to MongoDB and syncing current state to Google Sheets.

Metrics

Go metrics calculator that aggregates reading volumes, read rates, category distributions, and backlog age profiles into a structured JSON database.

Dashboard

Go static site generator that compiles HTML templates and compiled Tailwind CSS to produce responsive analytics and historical progression dashboards.

Validation & Resiliency

🛠️

Reproducibility

Standardized Go environments and local Python virtual environments bootstrapped predictably via Makefile automation.

Verification

Dual-layer testing architecture verifying data parsing logic via Go unit tests and RSS/web extraction routines via Python pytest suites.

📡

Telemetry

Event-sourced pipeline that pushes extraction telemetry, database updates, and failure logs directly to MongoDB Atlas for decoupled auditing.

Design Trade-offs

Humble Pivots

Caching Google Sheets to Avoid API Rate Limits

Originally, Go metrics queried Google Sheets directly during page builds. This routinely hit API rate limits. Pivoted to using Google Sheets as the SSOT database but logging sync events to MongoDB, letting Go query cached data for stability and speed.


Integrating Generative AI for Qualitative Trend Analysis (ADR 002)

Initially, the system only surfaced quantitative metrics (totals, read rates). Pivoted to integrating Google Gemini (GenAI) to analyze the differences (deltas) between pipeline runs, generating qualitative learning insights.


Decoupling Dashboard Generation from Live Database Queries

Originally, the Go static site generator queried MongoDB Atlas directly during compilation to retrieve historical trends. Pivoted to reading pre-computed local daily JSON metrics files, allowing the dashboard generator to build instantly and offline without database network dependencies.

Objective Clarity

Serves strictly as a personal, static, read-only dashboard. Does not support multi-user authenticated views, write operations outside of the extraction scheduler, or high-concurrency database queries.

Verifiable Outputs

Go Test Suite (make test-go)

go test -v ./cmd/... ./internal/...
=== RUN   TestCalculateTopReadRateSource
--- PASS: TestCalculateTopReadRateSource (0.00s)
=== RUN   TestCalculateMostUnreadSource
--- PASS: TestCalculateMostUnreadSource (0.00s)
=== RUN   TestCalculateThisMonthArticles
--- PASS: TestCalculateThisMonthArticles (0.00s)
PASS
ok  	github.com/victoriacheng15/personal-reading-analytics/internal/metrics	0.003s
=== RUN   TestGetTemplatesDir
=== RUN   TestGetTemplatesDir/finds_templates_directory_from_primary_path
=== RUN   TestGetTemplatesDir/finds_templates_directory_from_relative_path
=== RUN   TestGetTemplatesDir/returns_error_when_templates_directory_not_found
--- PASS: TestGetTemplatesDir (0.00s)
    --- PASS: TestGetTemplatesDir/finds_templates_directory_from_primary_path (0.00s)
    --- PASS: TestGetTemplatesDir/finds_templates_directory_from_relative_path (0.00s)
    --- PASS: TestGetTemplatesDir/returns_error_when_templates_directory_not_found (0.00s)
=== RUN   TestLoadEvolutionData
=== RUN   TestLoadEvolutionData/loads_evolution_data_successfully
=== RUN   TestLoadEvolutionData/returns_error_when_file_missing
--- PASS: TestLoadEvolutionData (0.00s)
    --- PASS: TestLoadEvolutionData/loads_evolution_data_successfully (0.00s)
    --- PASS: TestLoadEvolutionData/returns_error_when_file_missing (0.00s)
PASS
ok  	github.com/victoriacheng15/personal-reading-analytics/internal/web	0.010s

Python Test Suite (make test-py)

.venv/bin/pytest script/
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.1
configfile: pyproject.toml
plugins: cov-6.0.0
collected 3 items

script/test_extractor.py ...                                             [100%]

============================== 3 passed in 0.32s ================================================