Extractor
Universal, configuration-driven Python extractor that automates RSS/web metadata collection, logging event traces to MongoDB and syncing current state to Google Sheets.
A zero-infrastructure reading analytics pipeline that tracks personal engineering blog consumption and generates insights via Go, Python, and Google Gemini.
┌───────────────┐
│ Google Sheet │
└───────────────┘
▲
│ (Read/Write Articles & Sources)
▼
┌───────────────┐ (Fetch Metadata) ┌──────────────────────┐
│ Python Extr │ ───────────────────────> │ RSS/Web Blogs │
│ (Extraction) │ └──────────────────────┘
└───────────────┘
│
│ (Log events)
▼
┌───────────────┐
│ MongoDB │
└───────────────┘
▲
│ (Query stats)
▼
┌───────────────┐
│ Go Metrics │
└───────────────┘
│
│ (Generate JSON)
▼
┌───────────────┐
│ Go Dashboard │ ───────────────────────> [ Static HTML/CSS ]
│ (Generator) │ (GitHub Pages)
└───────────────┘
Universal, configuration-driven Python extractor that automates RSS/web metadata collection, logging event traces to MongoDB and syncing current state to Google Sheets.
Go metrics calculator that aggregates reading volumes, read rates, category distributions, and backlog age profiles into a structured JSON database.
Go static site generator that compiles HTML templates and compiled Tailwind CSS to produce responsive analytics and historical progression dashboards.
Standardized Go environments and local Python virtual environments bootstrapped predictably via Makefile automation.
Dual-layer testing architecture verifying data parsing logic via Go unit tests and RSS/web extraction routines via Python pytest suites.
Event-sourced pipeline that pushes extraction telemetry, database updates, and failure logs directly to MongoDB Atlas for decoupled auditing.
Originally, Go metrics queried Google Sheets directly during page builds. This routinely hit API rate limits. Pivoted to using Google Sheets as the SSOT database but logging sync events to MongoDB, letting Go query cached data for stability and speed.
Initially, the system only surfaced quantitative metrics (totals, read rates). Pivoted to integrating Google Gemini (GenAI) to analyze the differences (deltas) between pipeline runs, generating qualitative learning insights.
Originally, the Go static site generator queried MongoDB Atlas directly during compilation to retrieve historical trends. Pivoted to reading pre-computed local daily JSON metrics files, allowing the dashboard generator to build instantly and offline without database network dependencies.
Serves strictly as a personal, static, read-only dashboard. Does not support multi-user authenticated views, write operations outside of the extraction scheduler, or high-concurrency database queries.
go test -v ./cmd/... ./internal/...
=== RUN TestCalculateTopReadRateSource
--- PASS: TestCalculateTopReadRateSource (0.00s)
=== RUN TestCalculateMostUnreadSource
--- PASS: TestCalculateMostUnreadSource (0.00s)
=== RUN TestCalculateThisMonthArticles
--- PASS: TestCalculateThisMonthArticles (0.00s)
PASS
ok github.com/victoriacheng15/personal-reading-analytics/internal/metrics 0.003s
=== RUN TestGetTemplatesDir
=== RUN TestGetTemplatesDir/finds_templates_directory_from_primary_path
=== RUN TestGetTemplatesDir/finds_templates_directory_from_relative_path
=== RUN TestGetTemplatesDir/returns_error_when_templates_directory_not_found
--- PASS: TestGetTemplatesDir (0.00s)
--- PASS: TestGetTemplatesDir/finds_templates_directory_from_primary_path (0.00s)
--- PASS: TestGetTemplatesDir/finds_templates_directory_from_relative_path (0.00s)
--- PASS: TestGetTemplatesDir/returns_error_when_templates_directory_not_found (0.00s)
=== RUN TestLoadEvolutionData
=== RUN TestLoadEvolutionData/loads_evolution_data_successfully
=== RUN TestLoadEvolutionData/returns_error_when_file_missing
--- PASS: TestLoadEvolutionData (0.00s)
--- PASS: TestLoadEvolutionData/loads_evolution_data_successfully (0.00s)
--- PASS: TestLoadEvolutionData/returns_error_when_file_missing (0.00s)
PASS
ok github.com/victoriacheng15/personal-reading-analytics/internal/web 0.010s
.venv/bin/pytest script/
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.1
configfile: pyproject.toml
plugins: cov-6.0.0
collected 3 items
script/test_extractor.py ... [100%]
============================== 3 passed in 0.32s ================================================