How Wayfair Uses GPT Models to Process a Catalog of 40 Million Products

Core Background: The Unique Challenges of Home Furnishing E-Commerce

Wayfair, one of the largest home furnishing e-commerce platforms in the United States, is leveraging OpenAI's large language models to reshape its product catalog management system. In a recent presentation, Wayfair's technology leaders detailed how they apply AI capabilities to process product data at the scale of 40 million items, and the far-reaching impact of this initiative.

Challenges facing Wayfair

Unlike standardized branded products, Wayfair's product line has extremely distinctive characteristics — unbranded, highly differentiated, style-driven, and emotionally purchased. This means traditional structured data processing methods are virtually inadequate. Every piece of furniture, every light fixture, every curtain needs precise descriptions across multiple dimensions including style, material, and intended setting — dimensions that are often subjective and ambiguous.

Home furnishing is widely recognized as one of the hardest categories to standardize in e-commerce. Unlike consumer electronics (which can be precisely described through objective parameters like processor model and memory capacity), the core selling points of home products often rely on aesthetic judgment and scene imagination. For example, distinguishing between a chair's "Mid-Century Modern" style and "Scandinavian" style involves multi-layered semantic understanding spanning design history, material choices, and visual language. This subjectivity causes traditional rule-based classification systems (such as fixed attribute enumeration + manual annotation) to fail almost entirely when facing massive SKU volumes. As a purely online home furnishing platform, Wayfair — unlike IKEA with its proprietary brand system and unified product naming conventions — sources products from thousands of independent suppliers with wildly inconsistent data quality, further amplifying the complexity of the challenge.

The uniqueness of unbranded, differentiated categories

Catalog Enrichment: A Core AI Application in E-Commerce Product Management

What Is Catalog Enrichment?

Catalog Enrichment is a critical process in e-commerce that involves supplementing, standardizing, and optimizing the raw product information provided by suppliers. For Wayfair, the product data submitted by suppliers is often incomplete — it may lack precise style classifications, miss key attribute descriptions, or use inconsistent formatting.

Within e-commerce technology stacks, Catalog Enrichment is a core component of Product Information Management (PIM). Traditional enrichment workflows typically rely on three approaches: manual annotation teams (expensive and slow), rule-based automation scripts (inflexible and unable to handle semantic ambiguity), and early machine learning models (such as text classifiers based on TF-IDF or shallow neural networks, with limited accuracy). Before large language models emerged, the common industry practice was to combine all three approaches, but even then, for a non-standardized catalog at Wayfair's scale, there was always an irreconcilable tension between coverage and accuracy. The breakthrough of large language models lies in their zero-shot and few-shot learning capabilities — without needing to train a separate classifier for each category, the model can understand the subtle differences between "Bohemian style" and "Farmhouse style" through carefully designed prompts, achieving a quantum leap in engineering efficiency.

Catalog enrichment project

Wayfair's goal is clear: ensure every product is presented both accurately and completely. Accuracy means not misleading consumers, while completeness means extracting and supplementing as many valuable product attributes as possible to help users make better purchasing decisions.

Presenting product information accurately and completely

Why Is Classifying 40 Million SKUs a "Gnarly Problem"?

Wayfair's technical team calls these their "gnarliest problems," and for good reason:

Massive scale: With 40 million SKUs, any manual approach is simply unrealistic
Non-standardized: Home furnishing categories lack a unified industry-standard classification system
Highly subjective: Where is the boundary between "modern minimalist" and "Nordic style"? Such judgments require semantic understanding, not simple rules
Multimodal information: The system needs to simultaneously understand text descriptions and product images

As the Wayfair team put it, this is "not something that we would ever have even tried to do manually." Before AI entered the picture, problems like these were essentially unsolvable.

Technical Implementation: A Scalable Product Processing Pipeline Powered by the OpenAI API

Wayfair chose to drive its catalog enrichment workflow through OpenAI's API. This technology choice reflects several important engineering decisions:

First, calling an API rather than training a custom model. Wayfair opted not to train its own model from scratch, instead directly leveraging the general-purpose capabilities of OpenAI's large models. For processing 40 million products, this approach offers clear advantages in cost-effectiveness and iteration speed. This Model-as-a-Service (MaaS) strategy reflects a major trend in enterprise AI adoption — companies outsource the complexity of compute and model training to specialized AI providers while focusing internally on business logic and Prompt Engineering optimization.

However, at the scale of 40 million SKUs, API calls themselves become a complex engineering challenge. First, there's cost control: based on GPT-4's token pricing, if each product's enrichment requires processing approximately 2,000 tokens (including input product descriptions, image descriptions, and structured attribute outputs), a single full-catalog processing run could cost hundreds of thousands of dollars. Second, there's throughput management: OpenAI's API has rate limits, and large-scale calls require designing asynchronous queues, retry mechanisms, and batch processing strategies. Additionally, there's the issue of output consistency: large language model outputs are inherently stochastic (controlled by the temperature parameter), and the same product may receive different classification results across multiple calls, requiring the engineering team to design voting mechanisms or confidence thresholds to ensure output stability.

Second, the model is the core driving force. The team explicitly stated that "the model is what's powering us," making clear that AI is not an auxiliary tool but the core engine of the entire enrichment workflow.

Third, a continuously evolving technical roadmap. Wayfair also mentioned their anticipation for OpenAI Codex, planning to direct it at those "gnarliest problems that we haven't found solutions for yet," suggesting they are exploring broader applications of AI coding capabilities within their e-commerce tech stack. Codex excels at translating natural language into code, which means Wayfair may be exploring "meta-programming" capabilities such as using AI to automatically generate data processing pipelines and write classification rule scripts, further reducing the engineering team's investment in repetitive tasks.

Business Value: A Multi-Stakeholder Win for Consumers, Suppliers, and the Platform

The value of this AI application isn't one-directional — it creates a chain reaction across the entire ecosystem:

For consumers: More accurate product descriptions mean better search experiences, more precise recommendations, and lower return rates. There is a direct causal relationship between product description accuracy and return rates, which is especially significant in home furnishing e-commerce. According to the National Retail Federation (NRF), the average U.S. e-commerce return rate in 2023 was approximately 17.6%, and home furnishing categories tend to have even higher rates due to "product doesn't match expectations" issues. Industry research suggests that every 10% improvement in product page information completeness can reduce return rates by approximately 2-3 percentage points. For a platform like Wayfair with annual revenue exceeding $12 billion, even a 1 percentage point reduction in return rates could save tens of millions of dollars in reverse logistics costs and product losses.
For suppliers: Even when suppliers submit incomplete raw data, AI can help fill in the gaps, lowering the barrier to listing products. This is especially important for small and medium-sized home furnishing manufacturers who often lack professional e-commerce operations teams to write high-quality product descriptions — AI's automatic enrichment capability essentially provides them with a "free product information optimization service."
For the platform: Standardized, structured product data is the foundation for all downstream systems including search, recommendations, and advertising. When the system can accurately understand that a sofa is "genuine leather, three-seater, modern minimalist style," it can more precisely match users' search intent and browsing preferences, thereby improving conversion rates.

Industry Implications: LLMs Solving E-Commerce's "Structural Problems"

Wayfair's case provides an important reference for the entire e-commerce industry: the most valuable application scenarios for large language models are often not the "nice-to-have" features, but rather the "structural problems" that were previously impossible to solve.

Catalog enrichment for 40 million non-standardized products was a nearly impossible task under traditional technology paradigms. The semantic understanding capabilities of GPT-series models precisely bridge the enormous gap between rule engines and manual annotation. This leap "from impossible to possible" is where AI's true transformative power lies.

Wayfair's case marks a paradigm shift in e-commerce data processing from "rule-driven" to "semantics-driven." Under the traditional paradigm, product classification relied on predefined decision trees: if the description contains "oak," tag the material as "oak"; if it contains "minimalist," tag the style as "minimalist." The fatal flaw of this approach is its inability to handle synonyms, implied semantics, and cross-lingual expressions — a supplier might describe a minimalist product using "clean lines and neutral tones," which a rule engine cannot capture. The semantic understanding capabilities of large language models solve exactly this problem. The deeper significance is that this capability makes "long-tail attribute" extraction possible — fine-grained attributes that were previously abandoned due to insufficient ROI (such as "suitable for small apartments," "easy to assemble," "pet-friendly") can now be extracted in bulk at extremely low marginal cost, opening entirely new possibilities for personalized recommendations and precision marketing. This also explains why the Wayfair team called it "something we would never have even tried" — not because it was technically impossible, but because it wasn't feasible under cost and efficiency constraints.

For other companies facing similar challenges — whether in non-standardized product e-commerce, content platforms, or supply chain management — Wayfair's approach is well worth studying and learning from.

How Wayfair Uses GPT Models to Process a Catalog of 40 Million Products

Core Background: The Unique Challenges of Home Furnishing E-Commerce

Catalog Enrichment: A Core AI Application in E-Commerce Product Management

What Is Catalog Enrichment?

Why Is Classifying 40 Million SKUs a "Gnarly Problem"?

Technical Implementation: A Scalable Product Processing Pipeline Powered by the OpenAI API

Business Value: A Multi-Stakeholder Win for Consumers, Suppliers, and the Platform

Industry Implications: LLMs Solving E-Commerce's "Structural Problems"

Key Takeaways

Related articles

OpenCode In-Depth Review: Hands-On with a Free Open-Source AI Coding Assistant

Codex AI Coding Agent Explained: What's the Real Difference from ChatGPT?

Databricks Open-Sources Omni: A Meta-Framework for Unified Management of All AI Agents