CodingIdeas.ai

RefParse — Upload Any Reference Letter Layout, Get Clean Structured Data in 30 Seconds

HR teams at recruiting agencies receive reference letters in 40 different layouts and spend 20 minutes each manually extracting the same five fields into their ATS. RefParse uses Claude vision to normalize every layout into structured JSON and CSV in under 30 seconds. One upload, zero reformatting, and your recruiter gets their lunch break back.

Difficulty

beginner

Category

NLP & Text AI

Market Demand

High

Revenue Score

7/10

Platform

Web App

Vibe Code Friendly

No

Hackathon Score

🏆 7/10

Validated by Real Pain

— sourced from real community discussions

Redditreal demand

HR teams handling reference letters in many different layouts are using fragile manual extraction or custom regex scripts because no off-the-shelf tool handles arbitrary layout variations without template setup.

What is it?

Reference letter chaos is a real daily pain for recruiting agencies — every employer formats them differently, every school uses a different template, and the ATS wants a clean data row that none of them produce naturally. RefParse accepts PDF or image uploads, runs them through Claude vision with a structured extraction prompt, and outputs name, dates, relationship, key competencies, and contact info as JSON or downloadable CSV. Volume plans let agencies batch-upload 50 letters at once. Why buildable now: Claude vision API handles multi-layout document understanding far better than traditional OCR or regex approaches, making this a weekend-shippable product rather than a 3-month ML project. Recruiting agencies already pay $50-$200/month for document tools and this fills a gap none of them cover.

Why now?

Claude vision API now handles multi-layout document extraction without per-template training, collapsing what was a 3-month ML project into a weekend build as of mid-2025.

  • Multi-layout extraction — handles any PDF or image format without template setup.
  • Structured output as JSON and one-click CSV download ready for ATS import.
  • Batch upload: drop 50 PDFs at once and download a single merged CSV.
  • Confidence score per field so reviewers know which extractions to spot-check.

Target Audience

HR and recruiting agencies processing 50+ reference letters monthly, ~80k agencies in US, UK, AU markets.

Example Use Case

A recruiting agency processing 200 reference letters monthly saves 65 hours of manual extraction, reduces data entry errors by 90%, and justifies the $29/month cost inside the first day of use.

User Stories

  • As a recruiting agency coordinator, I want to upload a batch of 50 reference letters at once, so that I get a single merged CSV to import into our ATS without manual data entry.
  • As an HR manager, I want to see a confidence score per extracted field, so that I know which ones need a human spot-check before going into our system.
  • As an agency owner, I want to process reference letters regardless of their format or layout, so that I do not need to set up templates for every new employer we work with.

Done When

  • Extraction: done when user uploads a PDF and sees name, dates, relationship, and contact fields populated correctly on screen within 30 seconds.
  • Batch: done when user drops 5 PDFs and receives a single merged CSV file with all records on one click.
  • Confidence scores: done when each extracted field displays a green or amber confidence badge the user can see without clicking.
  • Billing: done when Stripe checkout upgrades user to Pro and their document counter resets to 500 immediately.

Is it worth building?

$29/month x 50 agencies = $1,450 MRR at month 2. $29/month x 200 agencies = $5,800 MRR at month 6. Math assumes 2% conversion on cold outreach to recruiting agency email lists.

Unit Economics

CAC: $15 via cold outreach. LTV: $348 (12 months at $29/month). Payback: under 1 month. Gross margin: 82%.

Business Model

SaaS subscription

Monetization Path

Free tier: 20 documents per month. Pro at $29/month: 500 documents. Agency at $79/month: unlimited plus API access.

Revenue Timeline

First dollar: week 2 via cold outreach beta. $1k MRR: month 3. $5k MRR: month 8.

Estimated Monthly Cost

Claude API: $35, Vercel: $20, Supabase: $25, Stripe fees: $15. Total: ~$95/month at launch.

Profit Potential

Solid lifestyle business at $3k-$8k MRR with near-zero support burden.

Scalability

High — expand to recommendation letters for universities, character references for legal firms, and API resale to ATS vendors.

Success Metrics

Week 1: 5 agency signups. Month 1: 3 paid conversions. Month 3: less than 10% monthly churn.

Launch & Validation Plan

Cold-email 30 recruiting agencies offering free extraction of 50 letters in exchange for a 15-minute feedback call before writing any production code.

Customer Acquisition Strategy

First customer: find 20 small recruiting agencies on LinkedIn who posted about manual HR processes and DM them a free 50-document trial. Ongoing: r/recruiting, HR Twitter communities, cold email sequences to agency owner lists, SEO on 'reference letter parser automation' keywords.

What's the competition?

Competition Level

Low

Similar Products

Docparser (requires manual templates per layout), Nanonets (expensive ML platform, overkill), Rossum (enterprise pricing, 6-month onboarding) — RefParse fills the zero-template small-agency gap.

Competitive Advantage

Zero template setup for any layout — competitors like Docparser require manual template mapping per document type which takes hours per employer.

Regulatory Risks

GDPR and CCPA apply since letters contain personal data. Data must be encrypted at rest, retention policy required, deletion endpoint mandatory.

What's the roadmap?

Feature Roadmap

V1 (launch): PDF upload, Claude vision extraction, JSON and CSV export, confidence scores. V2 (month 2-3): batch upload, extraction history, field editing. V3 (month 4+): webhook API, ATS direct push integrations.

Milestone Plan

Phase 1 (Week 1): extraction API, upload UI, CSV download, 3 agency beta users. Phase 2 (Week 2): Stripe billing, auth, batch mode, 10 beta users processing real letters. Phase 3 (Month 2): SEO landing page, cold outreach campaign, 25 paying agencies.

How do you build it?

Tech Stack

Next.js, Claude API vision, Supabase, Stripe, pdf-parse npm package — build with Cursor for extraction pipeline, v0 for upload UI.

Suggested Frameworks

Anthropic Claude SDK, pdf-parse, Supabase JS

Time to Ship

1 week

Required Skills

Claude vision API, PDF parsing with pdf-parse, Next.js file upload, CSV generation.

Resources

Anthropic vision API docs, pdf-parse npm docs, Supabase storage guide, Stripe subscriptions.

MVP Scope

app/page.tsx (landing + upload CTA), app/dashboard/page.tsx (upload history + CSV download), app/api/extract/route.ts (PDF parse + Claude vision call + JSON output), app/api/batch/route.ts (multi-file handler), lib/db/schema.ts (users, documents, extractions), lib/claude.ts (vision extraction prompt wrapper), components/UploadZone.tsx (drag-and-drop), components/ResultTable.tsx (structured field display), .env.example.

Core User Journey

Sign up -> upload reference letter PDF -> see structured fields extracted in 30 seconds -> download CSV -> upgrade when free tier hits limit.

Architecture Pattern

User uploads PDF -> Supabase Storage -> /api/extract calls pdf-parse -> base64 image passed to Claude vision API -> structured JSON returned -> saved in Supabase -> ResultTable renders fields -> CSV download generated on demand.

Data Model

User has many Documents. Document has one Extraction with many Fields. Extraction has a confidenceScore and status.

Integration Points

Claude API vision for extraction, pdf-parse for PDF to text, Supabase Storage for file storage, Supabase Postgres for extraction records, Stripe for billing, Resend for email.

V1 Scope Boundaries

V1 excludes: ATS direct push, custom field mapping, mobile app, multi-user agency accounts, webhook API.

Success Definition

A recruiting agency uploads their messiest reference letter batch, gets a clean CSV, imports it into their ATS without editing a single cell, and upgrades to the Agency plan without contacting support.

Challenges

Distribution to HR agencies is slower than B2C — cold outreach conversion is 2-4% and decision cycles are 2-3 weeks. Do not rely on inbound SEO alone in the first 90 days.

Avoid These Pitfalls

Do not build ATS direct integration in V1 — CSV download covers 100% of use cases and saves 3 weeks. Do not skip confidence scores or agencies will not trust automated output. Acquiring first 10 paying agencies takes 3x longer than building — budget time accordingly.

Security Requirements

Supabase Auth with Google OAuth, RLS on all document tables, files auto-deleted after 30 days, 50 req/min rate limit, GDPR deletion endpoint, encrypted storage in Supabase.

Infrastructure Plan

Vercel for Next.js, Supabase for Postgres and Storage and Auth, GitHub Actions for CI, Sentry for errors.

Performance Targets

50 DAU at launch, extraction API response under 5 seconds per document, page load under 2s LCP, batch of 10 under 45 seconds.

Go-Live Checklist

  • Security audit complete.
  • Payment flow tested end-to-end.
  • Sentry error tracking live.
  • Monitoring dashboard configured.
  • Custom domain with SSL live.
  • Privacy policy covering personal data published.
  • 3 agency beta users signed off.
  • Rollback plan documented.
  • Cold outreach sequence drafted for 50 agencies.

First Run Experience

On first run: a pre-extracted demo reference letter is shown in the result table with all fields populated. User can immediately download the demo CSV without uploading anything. No manual config required: demo PDF pre-loaded, Claude key is server-side.

How to build it, step by step

1. Define schema in lib/db/schema.ts with users, documents, extractions, fields tables. 2. Run npx create-next-app with Tailwind and App Router. 3. Install anthropic, pdf-parse, @supabase/supabase-js, stripe. 4. Build UploadZone component with drag-and-drop file input using v0. 5. Build /api/extract route that parses PDF, converts to base64, and calls Claude vision with structured extraction prompt. 6. Build ResultTable component displaying extracted fields with confidence badges. 7. Add CSV export endpoint that serializes extraction results. 8. Build /api/batch route for multi-file parallel processing. 9. Add Stripe checkout for Pro tier and webhook. 10. Deploy to Vercel and walk full journey from PDF upload to CSV download without any manual setup.

Generated

June 10, 2026

Model

claude-sonnet-4-6

Disclaimer: Ideas on this site are AI-generated and may contain inaccuracies. Revenue estimates, market demand figures, and financial projections are illustrative assumptions only — not financial advice. Do your own research before making any business or investment decisions. Technology availability, pricing, and market conditions change rapidly; always verify details independently.