PageWatch - Affordable Visual and Text Diff Monitor for Journalists and Archivists

Q: Who can build PageWatch - Affordable Visual and Text Diff Monitor for Journalists and Archivists?

This is a intermediate level project. Investigative journalists, academic researchers, compliance analysts, and competitive intelligence teams — estimated 200k potential users globally, with journalism orgs and research teams as the highest-value segment.

The Wayback Machine is great for finding what a site looked like in 2018 — not so great for getting a Slack alert the moment a government agency quietly edits a policy page. PageWatch monitors your URL list daily, generates pixel-level visual diffs and text change extracts, and sends you a digest before your morning coffee.

𝕏 Post Reddit HN

Difficulty

intermediate

What is it?

Journalists, researchers, and compliance teams routinely need to track when specific web pages change — think press release pages, policy documents, pricing pages, or competitor landing pages. The current options are either free-but-manual (Wayback Machine), expensive-enterprise (Versionista at $99/month per site), or DIY Puppeteer scripts that break constantly. PageWatch is a $19/month web app that accepts a list of URLs, scrapes them on a daily schedule via Puppeteer running on Railway, stores full-page screenshots and HTML snapshots in Cloudflare R2, and generates visual diffs using Pixelmatch and text diffs using diff-match-patch. Users get a clean diff digest via email or Slack. This is fully buildable with Next.js, Supabase, Railway, and Resend in under 2 weeks — the entire tech stack is stable, cheap, and well-documented as of April 2026.

Why now?

Railway cron jobs and Cloudflare R2 make running scheduled Puppeteer scrapers trivially cheap in April 2026, eliminating the $200+/month infrastructure cost that previously made this a VC-backed product category — now it is a solo weekend build.

▸Daily scheduled Puppeteer scrape with full-page screenshot and HTML snapshot stored in R2.
▸Visual diff overlay using Pixelmatch that highlights exactly what pixels changed.
▸Text diff extraction using diff-match-patch with changed sentences highlighted in the email digest.
▸Slack and email alert with diff summary sent within 1 hour of detecting a change.

Target Audience

Investigative journalists, academic researchers, compliance analysts, and competitive intelligence teams — estimated 200k potential users globally, with journalism orgs and research teams as the highest-value segment.

Example Use Case

Marco, an investigative journalist covering environmental policy, monitors 40 EPA and industry website pages. PageWatch alerts him at 7am that a key emissions standards page was quietly updated overnight, giving him a 6-hour scoop before other outlets notice.

User Stories

▸As an investigative journalist, I want a Slack alert when a monitored page changes, so that I can break stories before competitors.
▸As a compliance analyst, I want a visual diff overlay of yesterday vs today, so that I can verify no unauthorized content changes occurred.
▸As a researcher, I want to export my full change history as CSV, so that I can include page edit timelines in my academic paper.

Done When

✓Scrape Job: done when Puppeteer captures full-page screenshot of any public URL without crashing
✓Visual Diff: done when Pixelmatch generates a diff image highlighting changed regions with over 0.1% pixel change threshold
✓Alert Email: done when Resend delivers diff digest within 60 minutes of change detection
✓Billing Gate: done when users exceeding URL limit see upgrade prompt and cannot add more URLs without upgrading.

Is it worth building?

$19/month x 80 users = $1,520 MRR at month 3. $49/month x 100 power users = $4,900 MRR at month 6. Assumes 3% conversion of free trial signups via journalism community outreach.

Unit Economics

CAC: $15 via journalism community outreach and ProductHunt. LTV: $342 (18 months at $19/month). Payback: 1 month. Gross margin: 88%.

Business Model

Tiered subscription — $19/month for 50 URLs, $49/month for 200 URLs, $99/month for 1,000 URLs.

Monetization Path

14-day free trial with 10 URLs, converts to paid when trial ends or URL limit exceeded.

Revenue Timeline

First dollar: week 3 via trial conversion. $1k MRR: month 3. $5k MRR: month 7.

Estimated Monthly Cost

Railway for Puppeteer workers: $30, Cloudflare R2: $15, Supabase: $25, Vercel: $20, Resend: $10, Stripe fees: ~$25. Total: ~$125/month at launch.

Profit Potential

Sustainable indie product at $3k–$8k MRR. Journalism orgs and compliance teams will pay $99+/month without blinking.

Scalability

High — add team accounts, API access for power users, webhook integrations, and Slack bot for newsroom workflows.

Success Metrics

20 beta users in week 2. 5 paid conversions by end of week 3. 80% trial-to-paid conversion after first meaningful diff alert.

Launch & Validation Plan

Post in r/journalism and r/DataHoarder asking about web monitoring workflows, DM 15 investigative journalists on Twitter offering free beta, collect 20 signups before writing scraper code.

Customer Acquisition Strategy

First customer: DM 20 investigative journalists and academic researchers on Twitter offering 3 months free in exchange for weekly feedback — journalism community responds well to tools that give them a scoop edge. Ongoing: ProductHunt launch, r/journalism, NICAR conference community, SEO targeting 'website change monitor' and 'Versionista alternative'.

What's the competition?

Competition Level

Medium

What's the roadmap?

Feature Roadmap

V1 (launch): URL monitoring, visual diff, text diff, email alerts, Stripe billing. V2 (month 2-3): Slack webhook alerts, diff history archive, CSV export. V3 (month 4+): team accounts, API access, custom scrape frequency, Notion integration.

Milestone Plan

Phase 1 (Week 1): Puppeteer scraper, R2 storage, and Pixelmatch diff working end-to-end locally. Phase 2 (Week 2): Next.js dashboard live, Stripe billing, Resend alerts deployed to Railway. Phase 3 (Month 2): 20 paying users, ProductHunt launch, first $1k MRR.

How do you build it?

Tech Stack

Next.js, Puppeteer on Railway for scheduled scraping, Cloudflare R2 for screenshot storage, Pixelmatch for visual diff, diff-match-patch for text diff, Supabase for database, Resend for email alerts, Stripe for billing — build with Cursor for scraper logic, v0 for dashboard UI.

Suggested Frameworks

Puppeteer, Pixelmatch, diff-match-patch

Time to Ship

2 weeks

Required Skills

Puppeteer headless scraping, Pixelmatch image comparison, Next.js API routes, Supabase cron jobs, Cloudflare R2 storage.

Resources

Puppeteer docs, Pixelmatch GitHub, diff-match-patch npm package, Railway cron job docs, Cloudflare R2 quickstart.

MVP Scope

url management page, puppeteer scraper worker on Railway, R2 screenshot storage, pixelmatch diff generator, diff-match-patch text extractor, email digest via Resend, Supabase schema, Stripe billing, dashboard with diff history view.

Core User Journey

Architecture Pattern

User adds URLs -> stored in Supabase -> Railway cron fires daily -> Puppeteer screenshots each URL -> R2 stores screenshots -> Pixelmatch diffs against last snapshot -> changes stored in Postgres -> Resend sends digest email.

Data Model

User has many MonitoredURLs. MonitoredURL has many Snapshots. Snapshot has screenshot URL, HTML content, timestamp. Diff belongs to two Snapshots and stores pixel diff percentage and text change summary.

Integration Points

Stripe for payments, Resend for email digests, Supabase for database, Railway for cron scraper workers, Cloudflare R2 for screenshot storage, Slack API for optional alert webhook.

V1 Scope Boundaries

V1 excludes: login-gated page monitoring, API access, team accounts, mobile app, custom scrape frequency under 24 hours.

Success Definition

A paying journalist finds a meaningful page change via PageWatch before any other outlet reports it, tells their editor, and renews their subscription without prompting.

Challenges

Puppeteer scraping is fragile — sites with heavy JS rendering, bot detection (Cloudflare), or login walls will fail silently, which destroys user trust faster than any pricing issue.

Avoid These Pitfalls

Do not promise monitoring for login-gated or JS-heavy pages in V1 — bot detection will cause silent failures and user churn. Do not build team features before 20 paying solo users. Your first 10 paying customers will require manual outreach — do not wait for SEO traffic to kick in.

Security Requirements

Supabase Auth with Google OAuth, RLS on all user-owned URL and snapshot rows, rate limiting 60 req/min per IP, GDPR data deletion endpoint, no scraped content shared across user accounts.

Infrastructure Plan

Vercel for Next.js dashboard, Railway for Puppeteer cron workers, Supabase for Postgres, Cloudflare R2 for screenshot storage, GitHub Actions for CI, Sentry for scraper error alerts.

Performance Targets

50 DAU at launch, 500 scrape jobs/day. Diff generation under 3 seconds per URL. Dashboard page load under 2s. Email delivery under 60 minutes of change detection.

Go-Live Checklist

☐Security audit complete
☐Payment flow tested end-to-end
☐Sentry live on Railway worker
☐Vercel Analytics configured
☐Custom domain with SSL active
☐Privacy policy and terms published
☐5 beta journalists signed off
☐Rollback plan for Railway worker documented
☐ProductHunt launch post drafted.

First Run Experience

How to build it, step by step

1. Run npx create-next-app pagewatch and scaffold Supabase project with urls, snapshots, and diffs tables. 2. Build the URL management dashboard page using v0 with add, edit, delete URL functionality. 3. Write a Puppeteer scraper script that takes a full-page screenshot and captures HTML content. 4. Deploy the Puppeteer worker to Railway with a daily cron trigger. 5. Set up Cloudflare R2 bucket and write upload function to store screenshots with URL and timestamp keys. 6. Implement Pixelmatch comparison between last two screenshots and store diff percentage in Postgres. 7. Implement diff-match-patch text diff between last two HTML extracts and extract changed sentences. 8. Build Resend email template with diff summary and visual diff image link. 9. Add Stripe billing with $19, $49, and $99 monthly tiers and URL count gates. 10. Deploy dashboard to Vercel, configure Railway cron, set up Sentry for scraper error tracking, and launch.

Generated

April 8, 2026

Model

claude-sonnet-4-6

← Next

ClaimParse - NLP Entity Extractor That Turns Dense Insurance Policy PDFs Into Structured Data

ArchiveSafe - ChatGPT Conversation Backup Before the Archive Button Destroys Your Work

Disclaimer: Ideas on this site are AI-generated and may contain inaccuracies. Revenue estimates, market demand figures, and financial projections are illustrative assumptions only — not financial advice. Do your own research before making any business or investment decisions. Technology availability, pricing, and market conditions change rapidly; always verify details independently.