WebMCP: Teaching AI Agents to Interact with Your Web App

Pawel Kubiak

2026-03-02 11 min read

#WebMCP
#AI
#Web Standards
#Chrome

AI agents can browse the web. They can read content, click buttons, and fill forms. But they do it by guessing - analyzing the DOM, inferring what elements do, and hoping their assumptions are correct.

This works until it doesn't. A button labeled "Submit" might actually cancel an action. A form field that looks like it accepts a phone number might expect a specific format. The agent has no way to know for certain.

WebMCP changes this. It's a proposed web standard that lets websites expose structured tools to AI agents. Instead of guessing what a button does, the agent can call a function with explicit parameters and get a predictable result.

WebMCP is currently an early preview feature from Google Chrome. While the standard is being developed, users can interact with WebMCP-enabled websites through a Chrome extension that provides AI agent capabilities.

A real game changer would be enabling agents to use WebMCP without requiring the browser or application to be open-allowing them to operate independently.

See It in Action

Here's what shopping with an AI agent looks like when a website supports WebMCP:

In this demo, I'm using natural language to:

Search for an Angular hoodie
Add 2 items to my basket
Navigate to checkout
Fill in shipping details

The AI agent handles all the interactions reliably because the website explicitly tells it what actions are available and how to use them.

How It Works (Non-Technical Overview)

Think of WebMCP like a restaurant menu for your website.

Without WebMCP, an AI agent is like a customer trying to order food by describing what they want and hoping the waiter understands. "I'd like something with chicken... maybe grilled?" The waiter has to guess what dish matches that description.

With WebMCP, your website provides a clear menu: "Here are the dishes we offer, here's what's in each one, and here's exactly how to order them." The AI agent can see all available options and place orders precisely.

sequenceDiagram
    participant User
    participant AI Agent
    participant Website
    participant WebMCP Tools

    User->>AI Agent: "Find Angular hoodie"
    AI Agent->>WebMCP Tools: Discover available tools
    WebMCP Tools-->>AI Agent: search_product, add_to_basket, checkout...
    AI Agent->>WebMCP Tools: Call search_product("Angular hoodie")
    WebMCP Tools->>Website: Execute search
    Website-->>WebMCP Tools: Return product results
    WebMCP Tools-->>AI Agent: Structured product data
    AI Agent-->>User: "Found 3 Angular hoodies"

For businesses, this means:

E-commerce: Customers can shop using AI assistants that reliably find products, compare options, and complete purchases
SaaS applications: Users can automate complex workflows through natural language commands
Customer service: AI agents can help users navigate your application and complete tasks accurately

The Problem with Screen-Scraping

When an AI agent interacts with a website today, it's essentially performing sophisticated screen-scraping:

Parse the HTML to find interactive elements
Infer what each element does based on labels and context
Simulate user interactions (clicks, form fills)
Hope the result matches expectations

This approach has fundamental limitations:

Ambiguity: A "Next" button could mean "next page" or "next step" or "skip"
Hidden functionality: Features behind nested menus or complex UI states are hard to discover
Fragility: UI changes break agent interactions
Performance: Rendering full pages and parsing DOM is slow
Token inefficiency: Agents must process entire HTML documents, consuming significant tokens for context that may not be relevant
Validation: No way to know if form data is valid before submission

What WebMCP Provides

WebMCP bridges the gap between web applications and AI agents by providing a contract for interaction. Websites can explicitly publish their capabilities as tools.

The standard defines two APIs:

1. Imperative API (JavaScript)

window.navigator.modelContext.registerTool({
  name: "search_product",
  description: "Search for products by name or category",
  inputSchema: {
    type: "object",
    properties: {
      query: { 
        type: "string",
        description: "Search query to match product titles"
      },
      category: { 
        type: "string",
        enum: ["Apparel", "Accessories", "Books"]
      }
    }
  },
  execute: async ({ query, category }) => {
    // Your search logic here
    const results = await searchProducts(query, category);
    return { 
      content: [{ 
        type: "text", 
        text: JSON.stringify(results) 
      }] 
    };
  }
});
window.navigator.modelContext.registerTool({
  name: "search_product",
  description: "Search for products by name or category",
  inputSchema: {
    type: "object",
    properties: {
      query: { 
        type: "string",
        description: "Search query to match product titles"
      },
      category: { 
        type: "string",
        enum: ["Apparel", "Accessories", "Books"]
      }
    }
  },
  execute: async ({ query, category }) => {
    // Your search logic here
    const results = await searchProducts(query, category);
    return { 
      content: [{ 
        type: "text", 
        text: JSON.stringify(results) 
      }] 
    };
  }
});

2. Declarative API (HTML Annotations)

Transform standard HTML forms into tools using attributes:

<form 
  toolname="fill_payment_form"
  tooldescription="Fill shipping address and payment details"
  toolautosubmit
  action="/checkout">
  
  <input 
    type="text" 
    name="fullName"
    toolparamtitle="Full Name"
    toolparamdescription="Recipient's complete legal name" />
  
  <select 
    name="country"
    toolparamtitle="Country"
    toolparamdescription="Two-letter country code (US, CA, GB)">
    <option value="US">United States</option>
    <option value="CA">Canada</option>
  </select>
  
  <button type="submit">Complete Purchase</button>
</form>
<form 
  toolname="fill_payment_form"
  tooldescription="Fill shipping address and payment details"
  toolautosubmit
  action="/checkout">
  
  <input 
    type="text" 
    name="fullName"
    toolparamtitle="Full Name"
    toolparamdescription="Recipient's complete legal name" />
  
  <select 
    name="country"
    toolparamtitle="Country"
    toolparamdescription="Two-letter country code (US, CA, GB)">
    <option value="US">United States</option>
    <option value="CA">Canada</option>
  </select>
  
  <button type="submit">Complete Purchase</button>
</form>

The browser automatically converts these annotations into structured tool definitions that agents can discover and invoke.

What This Feels Like in Practice

With WebMCP integrated, AI agents can interact with applications naturally:

User: "I'd like to find an Angular hoodie, can you open product details page?"

Agent:

Calls search_product tool with { query: "Angular hoodie" }
Finds matching products
Navigates to product details page

User: "Ok, great can you add 2 items to the basket"

Agent:

Calls add_product_to_basket with { productId: "...", quantity: 2 }
Gets confirmation with updated basket total
Informs user

User: "Ok let's proceed to the checkout"

Agent:

Calls proceed_checkout to navigate to checkout page
Confirms navigation successful

User: "Fill the shipping details for John Doe at 123 Main St, Anytown, CA 90210, US, with John Doe as the cardholder, and finally, take me to checkout to complete the purchase"

Agent:

Calls fill_payment_address_form with shipping and payment details
Form is filled and validated automatically
User completes payment and submits successfully

The difference from traditional screen-scraping is profound:

Reliability: Tools have explicit contracts
Performance: No need to render and parse full pages
Discoverability: Agents know exactly what's possible
Validation: Errors are caught before submission

Tool Design Best Practices

Through implementation, I learned several principles for effective tool design:

1. Be Explicit About Capabilities

Good:

description: `Search for products by name or category.
Returns product title, price, discount, and availability. Use when user wants to browse or find specific items.`
description: `Search for products by name or category.
Returns product title, price, discount, and availability. Use when user wants to browse or find specific items.`

Bad:

description: 'Search products'
description: 'Search products'

The agent needs context about when to use the tool and what data it returns.

2. Accept Raw User Input

Don't make the agent do math or transformations:

Good:

properties: {
  expiryDate: {
    type: 'string',
    description: 'Card expiry in MM/YY format (e.g., "12/28")'
  }
}
properties: {
  expiryDate: {
    type: 'string',
    description: 'Card expiry in MM/YY format (e.g., "12/28")'
  }
}

Bad:

properties: {
  expiryMonth: { type: 'number' },
  expiryYear: { type: 'number' }
}
properties: {
  expiryMonth: { type: 'number' },
  expiryYear: { type: 'number' }
}

3. Provide Meaningful Enums

Explain the why, not just the what:

Good:

category: {
  type: 'string',
  enum: ['Apparel', 'Accessories', 'Books'],
  description: `Product category. Use "Apparel" for clothing and wearables,
  "Accessories" for bags and gear, "Books" for printed materials.`
}
category: {
  type: 'string',
  enum: ['Apparel', 'Accessories', 'Books'],
  description: `Product category. Use "Apparel" for clothing and wearables,
  "Accessories" for bags and gear, "Books" for printed materials.`
}

4. Return Structured Data

Always return JSON with consistent structure:

return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      success: true,
      data: { /* ... */ },
      message: 'Human-readable summary'
    })
  }]
};
return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      success: true,
      data: { /* ... */ },
      message: 'Human-readable summary'
    })
  }]
};

5. Sync UI State

It would be helpful if WebMCP operated only when the browser and the app are actively open.

When tools modify application state, update the UI:

execute: async (params) => {
  // Update application state
  searchState.setSearchState(params.query, params.category);
  
  // Perform action
  const results = await search(params);
  
  // Return results
  return { content: [{ type: 'text', text: JSON.stringify(results) }] };
}
execute: async (params) => {
  // Update application state
  searchState.setSearchState(params.query, params.category);
  
  // Perform action
  const results = await search(params);
  
  // Return results
  return { content: [{ type: 'text', text: JSON.stringify(results) }] };
}

This ensures users see what the agent is doing.

Current Limitations

WebMCP is an early preview with some constraints:

Browser Support

Currently only available in Chrome 146+ behind a flag:

Navigate to chrome://flags/#enable-webmcp-testing
Enable "WebMCP for testing"
Restart Chrome

Browsing Context Required

Tools must run in a visible browser tab. There's no "headless" mode where agents can call tools without opening the site.

No Discovery Mechanism

There's no built-in way for agents to discover which sites provide tools without visiting them. Search engines or directories may eventually fill this gap.

UI Synchronization Complexity

Developers must ensure UI reflects state changes from tool calls. In complex applications, this requires careful state management.

What This Means for Web Developers

If you build web applications, WebMCP offers a new way to make your app accessible to AI agents.

Instead of hoping agents can figure out your UI, you can provide explicit interfaces. Users with AI assistants will get reliable, fast interactions with your application.

For e-commerce sites, this could mean:

Agents can search products and compare prices accurately
Checkout flows work reliably without form-filling errors
Complex filters and options are discoverable

For SaaS applications:

Agents can trigger actions hidden in nested menus
Complex workflows become automatable
Users can interact through natural language

Comparison with MCP

You might be wondering how WebMCP relates to the Model Context Protocol (MCP) I wrote about previously.

MCP is a server-side protocol. Developers deploy MCP servers that expose tools to AI agents. The agent connects to your server and calls functions remotely.

WebMCP is a client-side protocol. Developers annotate their web applications with tool definitions. The agent interacts with your site through the browser.

Key differences:

Aspect	MCP	WebMCP
Deployment	Requires server infrastructure	Works in existing web apps
Context	Server-side (Node.js, Python)	Client-side (browser)
Use Case	APIs, databases, services	Web applications, forms, UI
Setup	Deploy and maintain servers	Add annotations to HTML/JS
Discovery	Agent connects to known servers	Agent visits websites

They're complementary. MCP is great for backend services and data access. WebMCP is perfect for web application interactions.

What's Next

WebMCP is a proposed standard, currently in early preview. The Chrome team is gathering feedback on:

API ergonomics and developer experience
Use cases and real-world applications
Tooling ecosystem (debugging, testing, validation)
Implementation issues and friction points

If you're interested in shaping how AI agents interact with web applications, now is the time to experiment and provide feedback.

The future of web development might include designing not just for human users, but for AI agents that help them. WebMCP is one step toward making that future more reliable and developer-friendly.

In the next article, I'll show you how to implement WebMCP in an Angular application with practical examples and architecture patterns.

Resources

Have you experimented with WebMCP? I'd love to hear about your experience

Comments

Comments are disabled until analytics consent is granted.

WebMCP: Teaching AI Agents to Interact with Your Web App

Share

See It in Action

How It Works (Non-Technical Overview)

The Problem with Screen-Scraping

What WebMCP Provides

1. Imperative API (JavaScript)

2. Declarative API (HTML Annotations)

What This Feels Like in Practice

Tool Design Best Practices

1. Be Explicit About Capabilities

2. Accept Raw User Input

3. Provide Meaningful Enums

4. Return Structured Data

5. Sync UI State

Current Limitations

Browser Support

Browsing Context Required

No Discovery Mechanism

UI Synchronization Complexity

What This Means for Web Developers

Comparison with MCP

What's Next

Resources

Stay Updated

Comments