UCPACPMCP

Voice Commerce in the Agentic Era: Alexa+, Gemini, and the $186B Opportunity

Voice commerce is no longer the overhyped promise that failed to meet its $40 billion projections back in 2022. Powered by large language models and agentic AI architectures, voice-driven shopping has entered a new phase -- one where AI agents do not just listen for commands but actively orchestrate

14 min readRecently updated
Hero image for Voice Commerce in the Agentic Era: Alexa+, Gemini, and the $186B Opportunity - UCP and ACP

Voice Commerce in the Agentic Era: Alexa+, Gemini, and the $186B Opportunity

Last updated: March 2026

Voice commerce is no longer the overhyped promise that failed to meet its $40 billion projections back in 2022. Powered by large language models and agentic AI architectures, voice-driven shopping has entered a new phase – one where AI agents do not just listen for commands but actively orchestrate multi-step purchase journeys across smart speakers, vehicles, televisions, and messaging platforms. The global voice commerce market is projected to reach $147 billion to $186 billion by 2030, growing at a compound annual rate above 20%. For commerce leaders and brand strategists, this is the window to get voice-ready before agents decide which products consumers never see.


The Market: $62-70 Billion in 2025 and Accelerating

The numbers tell a clear story of renewed momentum. The global voice commerce market sits between $62 billion and $70 billion in 2025, up from $42 billion to $53 billion in 2024, depending on the research firm. Grand View Research and Global Industry Analysts project the market will reach $147 billion to $186 billion by 2030, with a CAGR between 20% and 24.6%. Roots Analysis extends the forecast further, estimating $636 billion by 2035.

The infrastructure is already in place. Over 8 billion voice assistants are in active use worldwide as of 2026. Smart speaker ownership has reached approximately 75% of US households – roughly 98 million people. More than 50% of American consumers are expected to use voice-based assistants for regular online purchases this year.

What changed between the hype cycle of 2017-2020 and today? The underlying intelligence. The rule-based intent-matching systems that powered early Alexa and Google Assistant could handle “reorder paper towels” but collapsed under anything more complex. LLM-powered agents understand natural, conversational speech. They handle multi-turn dialogues, follow context across a session, and use external tools to compare prices, check inventory, and complete checkout.


Amazon Alexa+: Free with Prime, 600 Million Devices, Shopping Hub

Amazon made the most significant move in voice commerce in February 2026 by making Alexa+ free for all Prime members. The upgrade is available across all Alexa-enabled devices, and the installed base is staggering: over 600 million devices worldwide.

Alexa+ is powered by more than 70 models, including Amazon’s homegrown Nova models and technology from an Anthropic partnership. An Amazon-OpenAI investment deal announced in February 2026 suggests OpenAI models may also power parts of the system.

The shopping capabilities go well beyond simple reordering:

  • Deal tracking and auto-purchase: Alexa+ monitors prices on items in a user’s cart and alerts them when prices hit a target threshold. Users can opt in to have Alexa purchase automatically when the price is right.
  • Delivery consolidation: Users can add items to upcoming deliveries until packages leave the warehouse, reducing shipping waste and cost.
  • Personalized gift recommendations: The system generates gift ideas based on purchase history and recipient preferences, displayed visually on Echo Show screens while narrated by voice.
  • Real-time delivery tracking: Echo Show devices now function as interactive shopping hubs with live package tracking on screen.
  • Automotive integration: The 2026 BMW iX3 ships with Alexa+ as its voice assistant, extending voice commerce into vehicles.
  • Personality modes: Amazon introduced three distinct AI personality modes for different interaction styles.

The transformation of Echo Show devices into full shopping hubs is particularly notable. These screens address voice commerce’s historic weakness – the inability to see products – while preserving hands-free convenience. Users initiate browsing with voice and see product cards, reviews, and images on screen. It is a hybrid model that combines the best of both modalities.


Google Gemini Enterprise: Multimodal Shopping Agents

Google’s approach to voice commerce is inseparable from its broader agentic commerce strategy. In January 2026, the company announced Gemini Enterprise for Customer Experience, which includes a Shopping agent that delivers end-to-end commerce solutions by connecting frontend interfaces – chat, voice, and visual – directly to backend tools.

The defining feature is multimodal input. Google’s Shopping agent understands images, video, and voice simultaneously. A user can photograph a handwritten recipe and the agent will parse the handwriting, identify the ingredients needed, and add them to a shopping cart. This is not a voice-only experience – it is voice-plus-vision, where each modality reinforces the other.

Google also announced new protocols and tools designed specifically for retailers to succeed in what it calls the “agentic shopping era.” These include the Universal Commerce Protocol (UCP), co-developed with Shopify, which provides a standardized way for AI agents to discover products, check inventory, apply discounts, and complete checkout across any merchant that implements it.

For brands, Google’s approach means that voice commerce is not a standalone channel. It is one input modality within a broader agent-mediated commerce system that also includes text search, image search, and AI-powered recommendations in Google Shopping and the Gemini app.


SoundHound Amelia 7: Voice Commerce in Vehicles and Living Rooms

At CES 2026, SoundHound AI unveiled Amelia 7, an agentic AI system designed for vehicles and smart TVs. The system enables AI agents to order food, make restaurant reservations, pay for parking, and book tickets – all via voice from car dashboards and television interfaces.

SoundHound’s positioning captures a broader industry shift. The company describes voice as “the new middleware of commerce” – not just an input method but the orchestration layer through which AI agents coordinate across multiple services, payment rails, and APIs. This framing moves voice from a user interface concern to an infrastructure concern, which has significant implications for how brands think about voice readiness.

The vehicle use case is particularly compelling. Drivers cannot use screens. Voice is not a convenience in a car; it is the only safe commerce interface. With BMW integrating Alexa+ and SoundHound deploying across automotive platforms, the car is becoming a significant commerce surface.


From “Hands-Free Convenience” to “Agent-Led Commerce”

The narrative around voice commerce has fundamentally shifted. The old model (2018-2024) was command-response: “Alexa, reorder paper towels.” The interaction was single-task, reactive, rule-based, voice-only, and limited to smart speakers.

The agentic model (2025-2026 and beyond) is different across every dimension:

  • Interaction: Conversational and multi-turn. “I am hosting a dinner party for eight people – help me plan.”
  • Scope: Multi-step orchestration. The agent plans the menu, identifies ingredients, compares prices across retailers, places orders, and schedules delivery.
  • Intelligence: LLM-powered reasoning with tool use, not rigid intent matching.
  • Modality: Voice combined with vision, text, and augmented reality.
  • Agency: Proactive. The agent sends deal alerts, makes anticipatory suggestions, and can auto-purchase within user-defined parameters.
  • Commerce surfaces: Vehicles, TVs, phones, wearables, and smart displays – not just smart speakers.

This shift explains why voice commerce failed the first time and why it is succeeding now. The technology was not ready. Rule-based systems could not handle the complexity of real shopping decisions. LLMs can.


Multimodal Is the Winning Pattern

Pure voice commerce has a structural limitation: customers cannot see the product. This makes voice-only interfaces best suited for repeat purchases of familiar items – grocery and consumer packaged goods account for 48% of voice purchases. Complex product comparisons in categories like fashion and electronics remain difficult without visual support.

The solution is multimodal commerce, which combines voice, image, video, text, and augmented reality inputs so AI agents can understand user intent more accurately than any single modality alone.

The business results validate this approach:

  • Sephora reported a 35% increase in average order value through their voice assistant compared to their website.
  • Kroger’s smart shopping list improved customer retention rates by 28%.
  • One regional grocery chain saw 67% of delivery customers switch to voice ordering within six months, with customer retention jumping from 58% to 87%.

The platforms enabling multimodal voice commerce include Voiceflow (no-code platform for building chat and voice AI agents), Sierra AI (used by Wayfair, Gap Inc., ASOS, Casper, and Deliveroo for AI voice agents), and specialized e-commerce voice platforms like Ringly.io, Brilo AI, CloudTalk, and Retell AI, many of which offer direct Shopify and WooCommerce integrations.

Augmented reality adds another dimension. Voice commands combined with AR allow users to virtually place furniture in rooms or try on clothing before purchasing. The agent handles the commerce transaction while AR handles the product evaluation – each modality doing what it does best.


Voice Spoofing and the $200 Million Security Challenge

Voice commerce introduces security risks that do not exist in traditional e-commerce. Deloitte predicts that generative AI could enable fraud losses of $40 billion in the US by 2027. Deepfake-enabled fraud losses already exceeded $200 million in Q1 2025 alone.

The threat vectors are specific to voice:

  • Voice spoofing and deepfakes: Generative AI can clone a person’s voice from a few seconds of audio, potentially bypassing voice biometric authentication.
  • Unauthorized purchases: Children, bystanders, or even television audio can trigger purchases on always-listening devices.
  • “Always listening” surveillance concerns: High-profile breaches and GDPR scrutiny make consumers wary of linking financial data to voice devices.

Mitigation strategies are evolving. Voice biometrics are becoming standard for high-value transactions. Pre-saved payment methods mean users are not speaking card numbers aloud. Multi-factor confirmation (voice command plus screen tap, for example) adds security without sacrificing convenience.

For commerce platforms, the security question is not optional. A significant segment of consumers cite data security concerns as the primary reason they have not adopted voice commerce. Solving security is not just a technical requirement – it is a conversion driver.


WhatsApp Voice Notes + AI Transcription: Voice-to-Order for Emerging Markets

Voice notes and voice messages are already a dominant communication mode on WhatsApp, particularly in Brazil, India, and other emerging markets. This creates a natural bridge to voice commerce that does not require smart speakers or dedicated hardware.

The intersection points are clear:

  • Voice-to-order: Users send voice messages that an AI agent transcribes and converts into structured orders. No typing required, no app to download, no new interface to learn.
  • Multimodal WhatsApp flows: Combining voice notes with product images and interactive buttons creates a richer checkout experience than either modality alone.
  • Accessibility: Voice input on WhatsApp lowers barriers for users with lower literacy levels or those who simply prefer speaking to typing – a significant population in emerging markets.
  • Agentic orchestration: An AI agent on WhatsApp can manage the full commerce journey – discovery, recommendation, cart management, payment, and delivery tracking – via voice and text, similar to what Alexa+ and Gemini do on smart speakers but on the world’s most widely used messaging platform.

This approach has a structural advantage over smart speaker commerce: WhatsApp already has 2 billion+ users, many of whom regularly send voice notes. There is no hardware purchase required and no new behavior to adopt. The voice commerce interface already exists in their daily messaging habits.


Getting Products Voice-Ready: An Optimization Guide

As AI agents increasingly mediate product discovery – whether through Alexa+, Gemini, SoundHound, or WhatsApp – brands need to optimize their product data for voice-driven interactions. Voice search queries are conversational and long-tail, requiring different strategies than text-based SEO.

1. Use natural language in product descriptions. Write the way people speak. Instead of “Organic Cold-Pressed Extra Virgin Olive Oil 500mL,” include conversational phrases like “organic olive oil for cooking” that match how people ask voice assistants for products.

2. Structure product data for machine readability. AI agents need structured, machine-readable catalogs – not beautifully designed HTML pages. Implement schema markup (Product, Offer, AggregateRating) so agents can parse specifications, pricing, availability, and reviews programmatically.

3. Optimize for question-based queries. Voice searches are often phrased as questions: “What is the best olive oil for salad dressing?” Create FAQ content and product descriptions that directly answer common questions in your category.

4. Maintain real-time inventory and pricing data. Agents need live data. Stale inventory or outdated pricing causes failed transactions and erodes the agent’s trust in your product feed. API-first architectures that serve real-time data will outperform cached catalog pages.

5. Build for repeat purchase workflows. Grocery and CPG dominate voice commerce because they are repeat purchases. If your product is consumable or replenishable, ensure your data supports easy reordering – consistent product identifiers, subscription options, and predictable sizing.

6. Provide clear trust signals. Verified merchant identity, transparent return policies, and shipping guarantees help agents recommend your products with confidence. Agents are optimizing for customer satisfaction, not just price.

7. Support multiple commerce protocols. Implement compatibility with UCP, ACP, or MCP depending on your platform stack. These protocols are how AI agents discover, evaluate, and transact with merchants. Brands that are not protocol-compatible risk becoming invisible to agent-mediated commerce.


Frequently Asked Questions

How big is the voice commerce market in 2025? The global voice commerce market is estimated between $62 billion and $70 billion in 2025, depending on the research firm. This represents growth from $42 billion to $53 billion in 2024. The market is projected to reach $147 billion to $186 billion by 2030, with a CAGR of 20% to 24.6%.

Is Alexa+ free? Yes. Amazon made Alexa+ free for all Amazon Prime members in February 2026. It is available on all Alexa-enabled devices, with an installed base of over 600 million devices worldwide. Alexa+ is powered by more than 70 AI models including Amazon’s Nova models and technology from Anthropic.

What is the difference between voice commerce and agentic voice commerce? Traditional voice commerce (2018-2024) was command-response: a user gives a single instruction and the assistant executes it. Agentic voice commerce (2025-2026 onward) involves AI agents that handle multi-step tasks autonomously – planning, comparing, negotiating, and purchasing across multiple services and payment rails. The agent is proactive, multimodal, and capable of reasoning, not just pattern-matching against a fixed set of commands.

Which product categories work best for voice commerce? Grocery and consumer packaged goods account for 48% of voice purchases and are the strongest category. Repeat purchases of familiar items – where visual inspection is not required – are the sweet spot. The category evolution timeline suggests electronics, travel, and home goods will follow in 2027-2028, with apparel and luxury adopting agent-mediated commerce by 2029-2030.

What are the main security risks of voice commerce? The primary risks include voice spoofing and deepfake attacks (deepfake-enabled fraud exceeded $200 million in Q1 2025), unauthorized purchases triggered by bystanders or ambient audio, and privacy concerns around always-listening devices. Mitigation strategies include voice biometrics, multi-factor confirmation, and pre-saved payment methods that eliminate the need to speak financial details aloud.

Can voice commerce work on WhatsApp? Yes. WhatsApp voice notes combined with AI transcription create a natural voice-to-order pathway, especially in emerging markets like Brazil and India where voice messages are already the dominant communication mode. An AI agent can transcribe voice notes, convert them into structured orders, and manage the full commerce journey – discovery, recommendation, payment, and tracking – without requiring smart speakers or dedicated hardware.

How do I optimize my products for voice search and AI agents? Focus on natural language product descriptions, structured data markup (schema.org), real-time inventory and pricing APIs, question-based content that matches conversational queries, and compatibility with commerce protocols like UCP, ACP, or MCP. Agents need machine-readable data, not visual design – prioritize data quality and API accessibility over storefront aesthetics.


Voice commerce’s first wave failed because the technology was not ready. Its second wave is succeeding because LLM-powered agents have fundamentally changed what “voice shopping” means – from rigid command execution to intelligent, multimodal, proactive commerce orchestration. The brands that prepare their product data, adopt commerce protocols, and meet consumers where they already speak – whether on Echo Show screens, car dashboards, or WhatsApp voice notes – will capture disproportionate value in a market headed toward $186 billion.

H

Hexagon Team

Published March 8, 2026

Share

Want your brand recommended by AI?

Hexagon helps e-commerce brands get discovered and recommended by AI assistants like ChatGPT, Claude, and Perplexity.

Get Started