PageWatch - Affordable Visual and Text Diff Monitor for Journalists and Archivists
The Wayback Machine is great for finding what a site looked like in 2018 — not so great for getting a Slack alert the moment a government agency quietly edits a policy page. PageWatch monitors your URL list daily, generates pixel-level visual diffs and text change extracts, and sends you a digest before your morning coffee.
Difficulty
intermediate
Category
Analytics
Market Demand
High
Revenue Score
7/10
Platform
Web App
Vibe Code Friendly
No
Hackathon Score
🏆 7/10
Validated by Real Pain
— seeded from real developer complaints
Researchers and journalists describe needing to track webpage changes over time for archival and investigative purposes, with current options being either expensive enterprise tools or fragile DIY scripts — and express clear willingness to pay for an affordable automated alternative.
What is it?
Journalists, researchers, and compliance teams routinely need to track when specific web pages change — think press release pages, policy documents, pricing pages, or competitor landing pages. The current options are either free-but-manual (Wayback Machine), expensive-enterprise (Versionista at $99/month per site), or DIY Puppeteer scripts that break constantly. PageWatch is a $19/month web app that accepts a list of URLs, scrapes them on a daily schedule via Puppeteer running on Railway, stores full-page screenshots and HTML snapshots in Cloudflare R2, and generates visual diffs using Pixelmatch and text diffs using diff-match-patch. Users get a clean diff digest via email or Slack. This is fully buildable with Next.js, Supabase, Railway, and Resend in under 2 weeks — the entire tech stack is stable, cheap, and well-documented as of April 2026.
Why now?
Railway cron jobs and Cloudflare R2 make running scheduled Puppeteer scrapers trivially cheap in April 2026, eliminating the $200+/month infrastructure cost that previously made this a VC-backed product category — now it is a solo weekend build.
- ▸Daily scheduled Puppeteer scrape with full-page screenshot and HTML snapshot stored in R2.
- ▸Visual diff overlay using Pixelmatch that highlights exactly what pixels changed.
- ▸Text diff extraction using diff-match-patch with changed sentences highlighted in the email digest.
- ▸Slack and email alert with diff summary sent within 1 hour of detecting a change.
Target Audience
Investigative journalists, academic researchers, compliance analysts, and competitive intelligence teams — estimated 200k potential users globally, with journalism orgs and research teams as the highest-value segment.
Example Use Case
Marco, an investigative journalist covering environmental policy, monitors 40 EPA and industry website pages. PageWatch alerts him at 7am that a key emissions standards page was quietly updated overnight, giving him a 6-hour scoop before other outlets notice.
User Stories
- ▸As an investigative journalist, I want a Slack alert when a monitored page changes, so that I can break stories before competitors.
- ▸As a compliance analyst, I want a visual diff overlay of yesterday vs today, so that I can verify no unauthorized content changes occurred.
- ▸As a researcher, I want to export my full change history as CSV, so that I can include page edit timelines in my academic paper.
Acceptance Criteria
Scrape Job: done when Puppeteer captures full-page screenshot of any public URL without crashing. Visual Diff: done when Pixelmatch generates a diff image highlighting changed regions with over 0.1% pixel change threshold. Alert Email: done when Resend delivers diff digest within 60 minutes of change detection. Billing Gate: done when users exceeding URL limit see upgrade prompt and cannot add more URLs without upgrading.
Is it worth building?
$19/month x 80 users = $1,520 MRR at month 3. $49/month x 100 power users = $4,900 MRR at month 6. Assumes 3% conversion of free trial signups via journalism community outreach.
Unit Economics
CAC: $15 via journalism community outreach and ProductHunt. LTV: $342 (18 months at $19/month). Payback: 1 month. Gross margin: 88%.
Business Model
Tiered subscription — $19/month for 50 URLs, $49/month for 200 URLs, $99/month for 1,000 URLs.
Monetization Path
14-day free trial with 10 URLs, converts to paid when trial ends or URL limit exceeded.
Revenue Timeline
First dollar: week 3 via trial conversion. $1k MRR: month 3. $5k MRR: month 7.
Estimated Monthly Cost
Railway for Puppeteer workers: $30, Cloudflare R2: $15, Supabase: $25, Vercel: $20, Resend: $10, Stripe fees: ~$25. Total: ~$125/month at launch.
Profit Potential
Sustainable indie product at $3k–$8k MRR. Journalism orgs and compliance teams will pay $99+/month without blinking.
Scalability
High — add team accounts, API access for power users, webhook integrations, and Slack bot for newsroom workflows.
Success Metrics
20 beta users in week 2. 5 paid conversions by end of week 3. 80% trial-to-paid conversion after first meaningful diff alert.
Launch & Validation Plan
Post in r/journalism and r/DataHoarder asking about web monitoring workflows, DM 15 investigative journalists on Twitter offering free beta, collect 20 signups before writing scraper code.
Customer Acquisition Strategy
First customer: DM 20 investigative journalists and academic researchers on Twitter offering 3 months free in exchange for weekly feedback — journalism community responds well to tools that give them a scoop edge. Ongoing: ProductHunt launch, r/journalism, NICAR conference community, SEO targeting 'website change monitor' and 'Versionista alternative'.
What's the competition?
Competition Level
Medium
Similar Products
Versionista ($99/month per site, expensive), Distill.io (browser-only, no visual diff archive), Wayback Machine (manual, no alerts).
Competitive Advantage
80% cheaper than Versionista, visual diff is more intuitive than text-only tools, ships in 2 weeks not 2 years.
Regulatory Risks
GDPR compliance required for EU users. Scraping public websites is generally legal but Terms of Service violations on some sites are possible — document that users are responsible for scraping only pages they have rights to monitor.
What's the roadmap?
Feature Roadmap
V1 (launch): URL monitoring, visual diff, text diff, email alerts, Stripe billing. V2 (month 2-3): Slack webhook alerts, diff history archive, CSV export. V3 (month 4+): team accounts, API access, custom scrape frequency, Notion integration.
Milestone Plan
Phase 1 (Week 1): Puppeteer scraper, R2 storage, and Pixelmatch diff working end-to-end locally. Phase 2 (Week 2): Next.js dashboard live, Stripe billing, Resend alerts deployed to Railway. Phase 3 (Month 2): 20 paying users, ProductHunt launch, first $1k MRR.
How do you build it?
Tech Stack
Next.js, Puppeteer on Railway for scheduled scraping, Cloudflare R2 for screenshot storage, Pixelmatch for visual diff, diff-match-patch for text diff, Supabase for database, Resend for email alerts, Stripe for billing — build with Cursor for scraper logic, v0 for dashboard UI.
Suggested Frameworks
Puppeteer, Pixelmatch, diff-match-patch
Time to Ship
2 weeks
Required Skills
Puppeteer headless scraping, Pixelmatch image comparison, Next.js API routes, Supabase cron jobs, Cloudflare R2 storage.
Resources
Puppeteer docs, Pixelmatch GitHub, diff-match-patch npm package, Railway cron job docs, Cloudflare R2 quickstart.
MVP Scope
url management page, puppeteer scraper worker on Railway, R2 screenshot storage, pixelmatch diff generator, diff-match-patch text extractor, email digest via Resend, Supabase schema, Stripe billing, dashboard with diff history view.
Core User Journey
Sign up -> add 10 URLs -> receive first diff alert email within 24 hours -> upgrade to paid before trial ends.
Architecture Pattern
User adds URLs -> stored in Supabase -> Railway cron fires daily -> Puppeteer screenshots each URL -> R2 stores screenshots -> Pixelmatch diffs against last snapshot -> changes stored in Postgres -> Resend sends digest email.
Data Model
User has many MonitoredURLs. MonitoredURL has many Snapshots. Snapshot has screenshot URL, HTML content, timestamp. Diff belongs to two Snapshots and stores pixel diff percentage and text change summary.
Integration Points
Stripe for payments, Resend for email digests, Supabase for database, Railway for cron scraper workers, Cloudflare R2 for screenshot storage, Slack API for optional alert webhook.
V1 Scope Boundaries
V1 excludes: login-gated page monitoring, API access, team accounts, mobile app, custom scrape frequency under 24 hours.
Success Definition
A paying journalist finds a meaningful page change via PageWatch before any other outlet reports it, tells their editor, and renews their subscription without prompting.
Challenges
Puppeteer scraping is fragile — sites with heavy JS rendering, bot detection (Cloudflare), or login walls will fail silently, which destroys user trust faster than any pricing issue.
Avoid These Pitfalls
Do not promise monitoring for login-gated or JS-heavy pages in V1 — bot detection will cause silent failures and user churn. Do not build team features before 20 paying solo users. Your first 10 paying customers will require manual outreach — do not wait for SEO traffic to kick in.
Security Requirements
Supabase Auth with Google OAuth, RLS on all user-owned URL and snapshot rows, rate limiting 60 req/min per IP, GDPR data deletion endpoint, no scraped content shared across user accounts.
Infrastructure Plan
Vercel for Next.js dashboard, Railway for Puppeteer cron workers, Supabase for Postgres, Cloudflare R2 for screenshot storage, GitHub Actions for CI, Sentry for scraper error alerts.
Performance Targets
50 DAU at launch, 500 scrape jobs/day. Diff generation under 3 seconds per URL. Dashboard page load under 2s. Email delivery under 60 minutes of change detection.
Go-Live Checklist
- ☐Security audit complete
- ☐Payment flow tested end-to-end
- ☐Sentry live on Railway worker
- ☐Vercel Analytics configured
- ☐Custom domain with SSL active
- ☐Privacy policy and terms published
- ☐5 beta journalists signed off
- ☐Rollback plan for Railway worker documented
- ☐ProductHunt launch post drafted.
How to build it, step by step
1. Run npx create-next-app pagewatch and scaffold Supabase project with urls, snapshots, and diffs tables. 2. Build the URL management dashboard page using v0 with add, edit, delete URL functionality. 3. Write a Puppeteer scraper script that takes a full-page screenshot and captures HTML content. 4. Deploy the Puppeteer worker to Railway with a daily cron trigger. 5. Set up Cloudflare R2 bucket and write upload function to store screenshots with URL and timestamp keys. 6. Implement Pixelmatch comparison between last two screenshots and store diff percentage in Postgres. 7. Implement diff-match-patch text diff between last two HTML extracts and extract changed sentences. 8. Build Resend email template with diff summary and visual diff image link. 9. Add Stripe billing with $19, $49, and $99 monthly tiers and URL count gates. 10. Deploy dashboard to Vercel, configure Railway cron, set up Sentry for scraper error tracking, and launch.
Generated
April 8, 2026
Model
claude-sonnet-4-6