LabelSnap - Zero-Shot Product Attribute Classifier for Catalog Teams

Q: Who can build LabelSnap - Zero-Shot Product Attribute Classifier for Catalog Teams?

This is a intermediate level project. E-commerce catalog managers and Shopify Plus operators at brands with 1k–100k SKUs (est. 50k businesses in this range on Shopify alone)

E-commerce catalog managers spend 40% of their week manually tagging 'material', 'fit', 'occasion', and 'style' on thousands of SKUs. LabelSnap uses a fine-tuned zero-shot NLP classifier to auto-tag any product description with custom attribute schemas — no training data required. The April 2026 vibe-coding wave finally made this fast enough to matter.

𝕏 Post Reddit HN

Difficulty

intermediate

What is it?

Product attribute tagging is the unglamorous backbone of every e-commerce search experience, and it is done manually at 90% of mid-size retailers. LabelSnap ingests a CSV of product descriptions, applies a zero-shot multi-label classifier using HuggingFace's facebook/bart-large-mnli model, and outputs a tagged CSV with confidence scores — all configurable via a simple attribute schema JSON the user defines. The SaaS layer is a Next.js dashboard where catalog managers upload CSVs, define their label schema, review low-confidence tags, and export. No ML background required. Built on HuggingFace Inference API (no GPU needed), FastAPI for the classification service, and Supabase for job state. Comparable products charge $500/month; LabelSnap targets the underserved 10–500 SKU/day tier at $49/month.

Why now?

HuggingFace Inference API now serves bart-large-mnli at sub-second latency with no GPU provisioning, making zero-shot classification cheap enough for a $49/month product for the first time.

▸Zero-shot multi-label classification using bart-large-mnli — no training data or ML setup required from the user
▸Custom attribute schema builder where users define their own label taxonomies as JSON
▸Confidence threshold reviewer so catalog managers only manually correct low-confidence tags
▸CSV import and export with original columns preserved plus new label columns appended

Target Audience

E-commerce catalog managers and Shopify Plus operators at brands with 1k–100k SKUs (est. 50k businesses in this range on Shopify alone)

Example Use Case

Sophie, a catalog manager at a mid-size fashion brand, uploads 2,000 product descriptions on Monday morning, configures her label schema (occasion, fit, fabric), and has all 2,000 SKUs tagged by the time she finishes her coffee — instead of tagging manually for the rest of the week.

User Stories

▸As a catalog manager, I want to upload a CSV and get all SKUs auto-tagged, so that I stop spending 20 hours a week on manual attribute entry.
▸As a merchandising lead, I want to define my own label schema without any ML knowledge, so that classifications match our internal taxonomy.
▸As a catalog team member, I want to review and correct low-confidence tags in the UI, so that export quality is always production-ready.

Done When

✓Upload: done when user uploads a CSV and sees a job progress bar that updates without page refresh
✓Schema builder: done when user adds, removes, and renames attributes and the change persists on page reload
✓Review table: done when user sees each SKU row with predicted labels, confidence percentage, and an editable dropdown override
✓Export: done when user clicks Download and receives the original CSV with new label columns appended and confidence scores included.

Is it worth building?

$49/month x 80 catalog managers = $3,920 MRR at month 3. $149/month x 20 larger teams = $2,980 MRR. Total $6.9k MRR at month 4 — requires 100 paying users, achievable via Shopify app store listing.

Unit Economics

CAC: $20 via LinkedIn outreach + Shopify app store. LTV: $588 (12 months at $49/month). Payback: 1 month. Gross margin: 85%.

Business Model

SaaS subscription — $49/month for 5k SKUs/month, $149/month for 50k SKUs/month

Monetization Path

Free tier: 100 SKUs one-time. Paid triggers on upload > 100 rows or schema > 5 attributes.

Revenue Timeline

First dollar: week 3 via LinkedIn outreach beta upgrade. $1k MRR: month 2. $5k MRR: month 5.

Estimated Monthly Cost

HuggingFace Inference API: $30, Supabase: $25, Vercel: $20, Celery worker on fly.io: $20, Stripe fees: ~$15. Total: ~$110/month at launch.

Profit Potential

$5k–$15k MRR within 6 months via Shopify ecosystem.

Scalability

High — Shopify app listing unlocks distribution, team plans unlock B2B, fine-tuned models can be offered as upsell.

Success Metrics

50 signups week 1, 10 paid month 1, average 500 SKUs classified per active user per week.

Launch & Validation Plan

Post in r/ecommerce and r/shopify asking how they handle attribute tagging — validate 20 DMs before writing a line of code.

Customer Acquisition Strategy

First customer: DM 15 Shopify Plus agencies on LinkedIn offering 3 months free for one client catalog. Ongoing: Shopify app store listing, r/shopify, ProductHunt, cold email to catalog@[brand].com addresses.

What's the competition?

Competition Level

Medium

What's the roadmap?

Feature Roadmap

V1 (launch): CSV upload, zero-shot tagging, schema builder, confidence review, export. V2 (month 2-3): Shopify direct sync, bulk API, team seats. V3 (month 4+): fine-tuned model upsell, image attribute classification.

Milestone Plan

Phase 1 (Week 1-2): classifier service, job queue, CSV upload — done when 200-row CSV is fully tagged. Phase 2 (Week 3): dashboard, review UI, Stripe billing, Vercel deploy. Phase 3 (Month 2): Shopify app store submission, 10 paying users.

How do you build it?

Tech Stack

Next.js, FastAPI, HuggingFace Inference API (bart-large-mnli), Supabase, Celery for job queue, Stripe — build with Cursor for FastAPI service, v0 for dashboard UI

Suggested Frameworks

HuggingFace Transformers, FastAPI, Celery

Time to Ship

3 weeks

Required Skills

HuggingFace Inference API, FastAPI, CSV parsing, Next.js dashboard, Stripe.

Resources

HuggingFace zero-shot-classification docs, FastAPI background tasks docs, Supabase storage docs.

MVP Scope

app/page.tsx (landing + upload UI), app/dashboard/page.tsx (job list and review), app/api/jobs/route.ts (job creation), services/classifier.py (FastAPI + HuggingFace inference), lib/db/schema.ts (Drizzle schema), components/SchemaBuilder.tsx (attribute label editor), components/ReviewTable.tsx (confidence review grid), .env.example (HF_API_KEY, SUPABASE_URL, STRIPE_KEY)

Core User Journey

Architecture Pattern

User uploads CSV -> Supabase Storage -> Celery job enqueued -> FastAPI worker calls HuggingFace Inference API per row -> results written to Supabase -> dashboard polls job status -> user downloads tagged CSV.

Data Model

User has many Jobs. Job has one SchemaConfig and many SKUResults. SKUResult has labels array and confidence scores. SchemaConfig has attributes array.

Integration Points

HuggingFace Inference API for zero-shot classification, Supabase for job state and file storage, Stripe for billing, Celery for async job queue, Resend for job-complete email notification.

V1 Scope Boundaries

V1 excludes: image-based classification, direct Shopify sync, team accounts, fine-tuned model upsell, bulk API access.

Success Definition

A catalog manager at a Shopify store finds LabelSnap via the app store, tags 500 SKUs without any founder help, and renews at month two.

Challenges

The hardest non-technical problem is convincing catalog managers their current Excel workflow is slower — they are resistant to new tools even when the ROI is obvious. Distribution via Shopify app store requires a review process that can take 2–4 weeks.

Avoid These Pitfalls

Do not let users define more than 20 labels in V1 — bart-large-mnli accuracy degrades sharply above 20 candidate labels. Do not skip the confidence reviewer UI — without it, users have no way to audit output quality and will churn.

Security Requirements

Supabase Auth with Google OAuth. RLS on all Job and SKUResult rows by user_id. Uploaded CSVs stored in private Supabase bucket. Rate limit: 10 job submissions/hour per user. GDPR: CSV deletion endpoint exposed in account settings.

Infrastructure Plan

Next.js on Vercel, FastAPI classifier on fly.io, Celery worker on fly.io, Supabase for DB and file storage, Sentry for error tracking, all under $110/month at launch.

Performance Targets

Target: 500-row CSV fully classified in under 3 minutes. Dashboard job status polling every 5 seconds. Page load under 2s. No Redis needed at V1 — Supabase job status polling is sufficient.

Go-Live Checklist

☐Security audit complete.
☐Stripe billing tested end-to-end.
☐Sentry live on classifier service.
☐fly.io health check passing.
☐Custom domain configured.
☐Privacy policy published.
☐5 catalog managers beta-tested.
☐Rollback plan: previous fly.io release.
☐ProductHunt and r/shopify posts drafted.

First Run Experience

On first run: dashboard shows a sample 10-row fashion product CSV pre-loaded with a 5-label schema demo. User can immediately: click Run Classification on the demo data and see results in under 30 seconds. No manual config required: HuggingFace API key is pre-configured for demo mode, Stripe only activates on real uploads above 100 rows.

How to build it, step by step

1. Define data schema: Job, SKUResult, SchemaConfig tables in Supabase with Drizzle. 2. Run npx create-next-app labelsnap --typescript --tailwind --app. 3. Build services/classifier.py FastAPI app with a POST /classify endpoint calling HuggingFace bart-large-mnli zero-shot pipeline. 4. Build Celery worker that reads CSV rows from Supabase Storage and calls classifier.py for each row. 5. Build app/api/jobs/route.ts to create jobs and upload CSV to Supabase Storage. 6. Build components/SchemaBuilder.tsx as a dynamic form for adding custom label names. 7. Build components/ReviewTable.tsx showing SKU, predicted labels, confidence bars, and editable override. 8. Build app/dashboard/page.tsx with job list, status polling, and CSV export button. 9. Add Stripe checkout for plan selection, gating uploads above 100 rows. 10. Verify: upload a 200-row product CSV, define a 5-label schema, confirm all rows are tagged and downloadable within 3 minutes.

Generated

April 22, 2026

Model

claude-sonnet-4-6

← Next

DriftWatch Agent - Autonomous Feature and Prediction Drift Monitor With Slack Triage

PipeWeave - YAML DSL That Turns Data and Embedding Pipelines Into Deployed APIs

Disclaimer: Ideas on this site are AI-generated and may contain inaccuracies. Revenue estimates, market demand figures, and financial projections are illustrative assumptions only — not financial advice. Do your own research before making any business or investment decisions. Technology availability, pricing, and market conditions change rapidly; always verify details independently.