Back to article
# How AI Shopping Agents Discover Products: The Technical Guide to Product Data Optimization

*Last updated: March 2026*

AI shopping agents do not browse your website. They do not scroll through category pages, admire hero banners, or click "Add to Cart." They query structured data feeds, knowledge graphs, and machine-readable product attributes --- and if your catalog is not optimized for that pipeline, your products are invisible.

This is not a ranking problem. In traditional SEO, poor optimization means lower positions in search results. In agentic commerce, poor product data means **zero visibility** --- agents cannot recommend what they cannot parse. Products with comprehensive schema markup appear in AI-generated shopping recommendations 3-5x more frequently than those without (Google Merchant Center data, 2026). Meanwhile, traffic from AI platforms to US ecommerce sites grew 4,700% year-over-year according to Adobe Analytics.

The stakes are clear. Here is exactly how to make your product catalog agent-ready.

---

## How AI Agents Find Products

AI agents do not crawl the web in real time the way traditional search engines do. Instead, they operate on a three-layer discovery pipeline:

**Layer 1 --- Data Ingestion.** Agents pull from pre-indexed sources: Google Merchant Center feeds, Shopify Catalog syndication, schema.org JSON-LD markup on product pages, and proprietary knowledge graphs. The data is structured, standardized, and queryable before a consumer ever asks a question.

**Layer 2 --- Semantic Matching.** When a consumer asks "breathable formal wear for a beach wedding," the agent converts that query into a vector embedding and matches it against product attributes, descriptions, and review text using semantic similarity. This is not keyword matching --- it is meaning matching.

**Layer 3 --- Ranking and Recommendation.** Products are ranked by structural completeness (how many required attributes are populated), semantic density (how rich and descriptive the product data is), trust signals (verified reviews, GTINs, accurate inventory), and personalization signals (user history, preferences, context).

Modern agent architectures use a "squad" model where specialized sub-agents handle intent parsing, product search, comparison, personalization, and transaction execution. Each sub-agent depends on clean, structured data to function.

---

## Google Shopping Graph: The Backbone of Agent Discovery

Google's Shopping Graph is the largest product knowledge graph in the world:

- **50+ billion product listings** indexed globally
- **2+ billion updates per hour** for price, availability, and attribute changes
- Integrates data from merchant feeds, website crawls, schema.org markup, manufacturer databases, and user-generated content

Google's AI surfaces --- Gemini, AI Mode in Search, and AI Overview shopping panels --- all query the Shopping Graph to answer product questions with real-time pricing and availability. The Shopping Graph also powers the Universal Commerce Protocol (UCP), the open standard co-developed with Shopify that enables agents to discover, negotiate, and transact with merchants programmatically.

### How Merchants Feed Into the Shopping Graph

| Channel | Method | Update Frequency |
|---------|--------|-----------------|
| Google Merchant Center | XML, CSV, or Content API feed submission | Hourly minimum recommended |
| Schema.org markup | Crawled from product pages | Depends on crawl schedule |
| Content API for Shopping | Programmatic feed management | Real-time capable |
| Manufacturer Center | Brand-level authoritative data | As needed |
| UCP integration | Agent discovery endpoint at `/.well-known/ucp` | Real-time |

Merchants not present in Google Merchant Center are at a significant disadvantage for AI-driven discovery across Google's surfaces. According to McKinsey's 2026 AI Commerce Index, 34% of US shoppers have already used an AI agent for purchase decisions --- and that number is accelerating.

---

## Required Product Attributes for Agent Visibility

Agents evaluate products on attribute completeness before they evaluate anything else. Missing fields are not penalized --- they are filtered out entirely.

### Critical Attributes

| Attribute | Purpose | Agent Impact |
|-----------|---------|-------------|
| `name` (structured title) | Primary matching signal | Brand + Model + Size + Color format required |
| `description` | Semantic matching via NLP | Must be conversational, not keyword-stuffed |
| `gtin` / `mpn` | Cross-merchant product matching | Without this, agents cannot verify your product exists elsewhere |
| `brand` | Brand-specific queries | Required for brand authority signals |
| `price` + `priceCurrency` | Comparison shopping | Missing currency causes checkout failures |
| `availability` | Inventory filtering | Agents immediately exclude out-of-stock items |
| `material` | Attribute-based filtering | "organic cotton" vs. "cotton" matters for semantic matching |
| `color`, `size` | Variant differentiation | Must be separate attributes, not embedded in title |
| `aggregateRating` + `review` | Trust and ranking signal | Review sentiment feeds recommendation algorithms |
| `image` (high-resolution) | Multi-modal agent understanding | Multiple angles, descriptive ALT text |
| Use-case descriptions | Contextual matching | "morning runs in mild weather" matches intent queries |
| Sustainability certifications | Emerging filter criterion | Eco-labels, carbon footprint data increasingly weighted |

### The GTIN Imperative

GTINs (Global Trade Item Numbers) deserve special emphasis. AI agents use GTINs to perform cross-merchant product matching --- identifying the same product sold by different retailers to enable true price comparison. Products without GTINs are treated as unverifiable unique items, reducing their inclusion in comparison results and lowering trust scores. If your products have GTINs or UPCs, populating them is non-negotiable.

---

## Schema.org JSON-LD Markup Guide

JSON-LD is the preferred structured data format for all major AI systems. It is embedded in the `<head>` of product pages and provides a machine-readable description of your product that agents can parse without rendering the page.

### Comprehensive Product Markup Example

```json
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Patagonia Better Sweater Quarter-Zip Fleece - Men's Midweight Layering Jacket",
  "description": "A warm, breathable quarter-zip fleece made from 100% recycled polyester. Ideal for cool-weather hikes, casual layering, or office wear. Features a stand-up collar, zippered left-chest pocket, and flat-seam construction to reduce bulk under a shell. Warmer than a standard hoodie, lighter than a down jacket.",
  "image": [
    "https://example.com/images/better-sweater-front.jpg",
    "https://example.com/images/better-sweater-back.jpg",
    "https://example.com/images/better-sweater-detail.jpg"
  ],
  "sku": "PAT-BS-QZ-BLU-L",
  "gtin13": "0191338877654",
  "mpn": "25523-NENA-L",
  "brand": {
    "@type": "Brand",
    "name": "Patagonia"
  },
  "material": "100% recycled polyester fleece",
  "color": "New Navy",
  "size": "Large",
  "weight": {
    "@type": "QuantitativeValue",
    "value": "510",
    "unitCode": "GRM"
  },
  "offers": {
    "@type": "Offer",
    "price": "139.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "seller": {
      "@type": "Organization",
      "name": "Example Outdoor Store"
    },
    "priceValidUntil": "2026-06-30",
    "shippingDetails": {
      "@type": "OfferShippingDetails",
      "deliveryTime": {
        "@type": "ShippingDeliveryTime",
        "handlingTime": {
          "@type": "QuantitativeValue",
          "minValue": 0,
          "maxValue": 1,
          "unitCode": "DAY"
        },
        "transitTime": {
          "@type": "QuantitativeValue",
          "minValue": 2,
          "maxValue": 5,
          "unitCode": "DAY"
        }
      }
    }
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.6",
    "reviewCount": "842",
    "bestRating": "5"
  },
  "review": [
    {
      "@type": "Review",
      "author": { "@type": "Person", "name": "Alex R." },
      "datePublished": "2026-01-15",
      "reviewBody": "Perfect weight for spring hiking in the Pacific Northwest. Breathable enough for uphill sections, warm enough at rest stops.",
      "reviewRating": {
        "@type": "Rating",
        "ratingValue": "5"
      }
    }
  ],
  "hasMerchantReturnPolicy": {
    "@type": "MerchantReturnPolicy",
    "returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
    "merchantReturnDays": 60,
    "returnMethod": "https://schema.org/ReturnByMail"
  }
}
```

### Schema Types to Implement

| Schema Type | Purpose | Priority |
|------------|---------|----------|
| `Product` | Core product data (name, description, SKU, GTIN) | Required |
| `Offer` | Pricing, availability, currency, seller | Required |
| `AggregateRating` | Star ratings and review count | Required |
| `Review` | Individual customer review text and rating | High |
| `MerchantReturnPolicy` | Return windows and conditions | High |
| `ShippingDeliveryTime` | Delivery estimates | High |
| `FAQPage` | Common product questions and answers | High |
| `BreadcrumbList` | Site navigation context | Medium |
| `Organization` | Brand identity and trust signals | Medium |

---

## Product Feed Formats and Refresh Frequency

AI agent platforms accept product data through structured feeds. The format matters less than the completeness and freshness of the data inside.

### Accepted Formats

| Format | Use Case | Platform Support |
|--------|----------|-----------------|
| CSV / TSV | Simple catalogs, spreadsheet-managed | Google Merchant Center, ChatGPT Shopping |
| XML (RSS/Atom) | Complex catalogs, automated pipelines | Google Merchant Center, legacy systems |
| JSON | API-driven catalogs, modern architectures | ChatGPT Shopping, UCP endpoints |
| Content API | Programmatic real-time updates | Google Content API for Shopping |

### Refresh Frequency Matters

ChatGPT Shopping accepts feed updates as frequently as every **15 minutes**. Google's Shopping Graph processes 2 billion updates per hour. Agents that encounter stale data --- a product listed as in-stock that is actually sold out, or a price that changed hours ago --- reduce the merchant's trust score. Over time, agents deprioritize feeds from merchants with a history of data staleness.

**Recommended update cadence by catalog size:**

| Catalog Size | Minimum Refresh | Recommended Refresh |
|-------------|----------------|-------------------|
| Under 1,000 SKUs | Every 6 hours | Every 1 hour |
| 1,000 - 50,000 SKUs | Every 2 hours | Every 30 minutes |
| 50,000+ SKUs | Every 1 hour | Every 15 minutes |

For high-velocity categories (fashion drops, flash sales, limited editions), real-time updates via API are strongly recommended over batch feed uploads.

---

## API Latency Requirements

When agents query your product data through UCP endpoints or direct APIs, response time determines whether your products are included in results.

| Metric | Target | Consequence of Missing |
|--------|--------|----------------------|
| Product discovery response | **< 200ms** | Agents skip slow merchants entirely |
| Checkout session creation | **< 500ms** | Cart abandonment by agent |
| Inventory check | **< 100ms** | Agent shows stale availability |
| Error rate | **< 1%** | Trust score penalties, eventual delisting |
| Uptime | **99.9%+** | Removed from agent recommendations |

These are not aspirational targets --- they are filtering thresholds. According to data from Google's UCP implementation guide, agents operating under latency budgets will drop merchant responses that arrive after the timeout and proceed with results from faster merchants. A 600ms response time does not mean a lower ranking; it means exclusion from that query entirely.

**Infrastructure recommendations:** Edge functions (Cloudflare Workers, Vercel Edge, AWS Lambda@Edge) for low-latency product discovery endpoints. Redis for product data caching with webhook-driven invalidation on inventory changes. Separate rate limits for agent traffic versus human traffic.

---

## Attribute Fill Rate Benchmarks

Attribute fill rate measures the percentage of possible product data fields that are populated in your feed. It is one of the strongest predictors of agent visibility.

| Fill Rate | Agent Visibility | Typical Outcome |
|-----------|-----------------|----------------|
| Below 60% | Minimal | Products rarely surfaced; treated as low-quality listings |
| 60% - 80% | Partial | Included in broad queries but filtered out of specific ones |
| 80% - 95% | Good | Competitive visibility in most agent queries |
| **95%+** | Optimal | Maximum discovery rate; eligible for top recommendation slots |

The target for competitive merchants is **95%+ fill rate** across all required and recommended attributes. This means every product in your catalog should have: a structured title, detailed description, brand, GTIN or MPN, price with currency, availability status, at least one high-resolution image, material, color, size (where applicable), aggregate rating, shipping details, and return policy.

Merchants who fully optimized feeds and implemented UCP protocol support report an average **22% increase in AI-attributable revenue** within 90 days, according to early UCP adopter data cited by Google.

---

## Semantic Density: Why Descriptions Must Be Written for Machines

Semantic density is the richness of descriptive language that enables agents to match products to natural language queries. It is the difference between a product that matches one query and a product that matches dozens.

### Low vs. High Semantic Density

| Low Semantic Density | High Semantic Density |
|---------------------|----------------------|
| "Blue t-shirt, size M" | "Navy blue crew-neck t-shirt in 100% organic cotton with a relaxed fit. Lightweight and breathable for warm-weather layering. Machine washable. Available in men's medium." |
| "Running shoes" | "Lightweight neutral running shoes with responsive foam cushioning for daily training runs on pavement. 8mm heel-to-toe drop, breathable mesh upper, reflective accents for low-light visibility." |
| "Cool jacket" | "Water-resistant softshell jacket in charcoal grey with a fleece-lined interior. Blocks wind on exposed ridgelines while remaining packable enough for a daypack. Four-way stretch fabric allows unrestricted movement." |

The high-density descriptions work because they include:

- **Material composition** ("100% organic cotton," "responsive foam cushioning")
- **Use-case context** ("warm-weather layering," "daily training runs on pavement")
- **Comparative language** ("lighter than down," "warmer than fleece")
- **Sensory descriptors** ("breathable," "buttery-soft," "crisp poplin")
- **Occasion and activity** ("morning runs," "beach wedding," "office wear")

Stores implementing semantic search with rich product descriptions see up to a **30% increase in conversions** according to industry benchmarks, and users complete shopping tasks **158% faster** with AI-powered semantic search compared to keyword search.

Avoid keyword stuffing. Agents trained on large language models can detect unnatural text and penalize it. Write descriptions that read like a knowledgeable salesperson explaining the product to a specific customer.

---

## Real-Time Inventory and Pricing Accuracy

Stale data is the fastest way to lose agent trust. When an agent recommends a product that turns out to be out of stock or priced differently than advertised, the platform downgrades the merchant's reliability score.

**Requirements for agent-ready inventory management:**

- Real-time inventory sync with sub-second latency updates to feeds and APIs
- Atomic checkout operations that verify stock, calculate tax, apply discounts, and process payment in a single API call
- 10-minute checkout hold periods to prevent overselling during active agent sessions
- Webhook-driven cache invalidation --- when inventory changes, feeds and API caches must update immediately
- Price consistency between schema markup on product pages and submitted feed data (agents cross-reference these sources)

According to Google's Shopping Graph documentation, price and availability are among the most frequently updated attributes, with the graph processing these changes across its 50 billion listings at a rate of 2 billion updates per hour. Merchants whose data lags behind this standard are at a measurable disadvantage.

---

## Product Data Optimization Checklist

Use this checklist to audit your catalog's readiness for AI agent discovery.

### Structural Completeness

- [ ] All products have structured titles in Brand + Model + Key Attribute + Use Case format
- [ ] GTIN, MPN, or brand identifier populated for every product
- [ ] Category taxonomy aligned with Google Product Category standards
- [ ] Variant data (size, color, material) stored as separate attributes, not embedded in titles
- [ ] Price and `priceCurrency` populated in every Offer object
- [ ] Availability status updated in real time
- [ ] High-resolution images with descriptive ALT text (multiple angles)
- [ ] Shipping details and delivery time estimates included
- [ ] Return policy structured as `MerchantReturnPolicy` schema

### Semantic Density

- [ ] Descriptions include material composition, not just generic terms
- [ ] Use-case and occasion context in every description
- [ ] Comparative language where applicable ("warmer than X, lighter than Y")
- [ ] Sensory and quality descriptors beyond basic specifications
- [ ] FAQ schema on product pages answering top 3-5 customer questions
- [ ] No keyword stuffing --- text reads naturally

### Technical Infrastructure

- [ ] Schema.org JSON-LD implemented on all product pages
- [ ] Google Merchant Center feed active and error-free
- [ ] Feed refresh frequency at or below 1 hour
- [ ] API response times under 200ms for discovery, under 500ms for checkout
- [ ] Error rate below 1% on all agent-facing endpoints
- [ ] `robots.txt` allows OAI-SearchBot, Googlebot, PerplexityBot, and ClaudeBot
- [ ] UCP manifest published at `/.well-known/ucp` (if applicable)

### Trust Signals

- [ ] Verified customer reviews with structured Review schema
- [ ] Aggregate rating populated with review count
- [ ] Consistent data between schema markup and submitted feeds
- [ ] Accurate shipping timelines and costs
- [ ] Sustainability certifications with verification data

### Feed Management

- [ ] Product feed submitted to Google Merchant Center
- [ ] Product feed submitted to ChatGPT Shopping (chatgpt.com/merchants)
- [ ] Attribute fill rate measured and above 95%
- [ ] Feed validation tools run weekly to catch errors
- [ ] Supplemental feeds configured for enhanced attributes

---

## Frequently Asked Questions

**What is the single most important thing I can do to improve agent visibility?**

Populate your Google Merchant Center feed with complete, accurate data and implement Schema.org JSON-LD on every product page. These two actions cover the majority of agent discovery pipelines. Products with comprehensive schema markup appear in AI shopping recommendations 3-5x more frequently than those without.

**Do AI agents use paid advertising or sponsored placements?**

No. ChatGPT Shopping explicitly states there are no paid placements --- products are recommended based on trusted signals across the web, clarity, credibility, and usefulness. Google's AI Mode in Search queries the Shopping Graph rather than the ad auction. This means organic product data quality is the primary lever for visibility, not ad spend.

**How often should I update my product feeds?**

As frequently as possible. ChatGPT Shopping accepts updates every 15 minutes. Google's Shopping Graph processes 2 billion updates per hour. At minimum, update feeds hourly. For high-velocity categories (fashion, electronics, limited releases), use API-based real-time updates rather than batch feed uploads.

**What happens if my product data is inconsistent between my website and my feeds?**

Agents cross-reference data sources. If your schema markup says a product costs $89.99 but your Merchant Center feed says $94.99, agents flag the inconsistency and may reduce your trust score or exclude the product entirely. Maintain a single source of truth for pricing and availability that propagates to all channels simultaneously.

**Do I need to implement UCP to be discovered by AI agents?**

Not yet, but it is becoming increasingly important. UCP is currently in rolling access with a waitlist, and Shopify stores get native support. For immediate visibility, focus on Google Merchant Center feeds and schema.org markup. For medium-term competitive advantage, plan for UCP implementation --- merchants who optimized feeds and implemented UCP report a 22% increase in AI-attributable revenue within 90 days.

**How do I measure whether AI agents are recommending my products?**

Dedicated AEO/GEO monitoring tools are emerging in 2026. Platforms like Semrush (AI Visibility Toolkit), Scrunch, AthenaHQ, and SE Ranking now offer AI citation tracking that monitors which AI platforms mention your brand and products. Google Merchant Center is adding UCP-specific analytics, and Shopify provides agentic channel reporting in its admin dashboard. Start by tracking agent traffic volume, discovery rates (target: 95%+), and conversion rates from AI-referred sessions.

**What is the difference between AEO and traditional SEO?**

Traditional SEO optimizes for ranking in search engine results pages (blue links). AEO (Agent Engine Optimization) optimizes for being cited and recommended in AI-generated answers and agent shopping sessions. The key technical differences: AEO requires structured data feeds (not just meta tags), near-real-time data freshness (not periodic updates), and conversational product descriptions optimized for semantic matching (not keyword density). AEO does not replace SEO --- it builds on it. Brands excelling at AEO in 2026 typically have strong traditional SEO foundations.

---

*Sources: Google Shopping Graph documentation, McKinsey 2026 AI Commerce Index, Adobe Analytics, OpenAI Product Feed Specification, Google UCP Developer Guide, Shopify Engineering, Search Engine Journal, RetailDive.*
    How AI Shopping Agents Discover Products: The Technical Guide to Product Data Optimization (Markdown) | Hexagon