UX Challenges with MCPs

MCPs are everywhere now. The concept is compelling: expose data and tools from one app to an LLM so you can interact with the app via natural language.

I observed a few interesting things when I started testing various MCP implementations across Figma, Jira, Google Drive, WhatsApp and more. The use case is great, MCPs are definitely useful. But watching how people actually configure and use them reveals two fundamental challenges with the current approach.

First, configuration is unintuitive. MCPs work like IFTTT where you need to establish connections on both the app side and the LLM side to make them function. This creates setup friction that most users won’t tolerate.

Second, the UX approach doesn’t feel right for the future of apps with natural language capabilities. The way MCPs bolt conversational AI onto existing tools feels like a bridge solution rather than how apps will naturally evolve. The interaction patterns aren’t optimized for mass-market usage.

To understand why these limitations matter, it’s important to see the types of workflows that MCPs help with today. Let’s dive in:


Orchestration workflows

Picture this: You’re a product manager three days before quarterly planning. You need budget constraints scattered across dozens of Google Docs, blocker patterns hiding in Jira tickets, and team sentiment buried in Slack threads. Traditionally, this means tab chaos, copy and paste marathons, and trying to remember which folder contains that crucial spreadsheet from two months ago.

This is MCP’s sweet spot. The workflow becomes conversational: “Show me all Q2 planning docs that mention budget issues,” then “Create a summary of sprint blockers grouped by team,” followed by “Draft an update for leadership using the latest meeting notes.” Each request builds on context that the AI can navigate infinitely better than you clicking through interfaces.

MCP dominates in orchestration work:

✅ Discrete tasks with clear start and end points

✅ Information synthesis across multiple scattered sources

✅ Utility automation that eliminates navigation overhead

✅ Workflows where a few seconds of latency actually improves decision quality

The magic happens because these workflows have natural breathing room. You ask something, the system processes, you get a thoughtful response, then you decide what happens next. The cognitive burden of managing multiple information sources just evaporates.

Real examples that work beautifully:

Notion and Obsidian: “Find all project retrospectives mentioning technical debt and summarize the common themes”

Jira and Linear: “Show me which epics are behind schedule and group the blockers by type”

Drive and Dropbox: “Locate all vendor contracts modified this month and share them with the legal team”

Email and Slack: “Create a project status update using our latest standup notes and send it to stakeholders”

This transforms routine information work because natural language eliminates the need to remember complex file hierarchies, reconstruct search logic across sessions, or context switch between multiple interfaces. The AI becomes your universal translator for scattered digital chaos.


Iteration workflows

Now imagine a completely different scenario: You’re designing a mobile interface in Figma. You select three components, nudge them pixel by pixel until the spacing feels right, then realize the visual weight is off and swap one component for a variant. This entire sequence takes maybe twenty seconds but involves dozens of micro decisions, each informed by immediate visual feedback.

This is where MCPs crash into reality. You can’t pause mid creative flow to articulate “I want to adjust the spacing between these elements to improve visual hierarchy while maintaining overall balance.” By the time you’ve formulated that sentence, the creative moment is gone.

MCP struggles in iteration work:

⛔️ Continuous state evolution with rapid fire updates

⛔️ Spatial or visual reasoning that depends on real time feedback

⛔️ Micro adjustments based on immediate tactile or visual response

⛔️ Flow states where any interruption breaks the creative spell

Creative work operates on what cognitive scientists call embodied cognition. Your hands and eyes collaborate faster than conscious thought. When you’re debugging code, you’re scanning, pattern matching, following intuitive hunches that emerge from deep familiarity with the codebase.

The core UX problems that kill the experience:

⛔️ State staleness: The AI sees snapshots while you’re working with live, evolving state

⛔️ Feedback delay: Violates the 100 millisecond rule that makes interactions feel real time

⛔️ Context switching overhead: Mental friction of translating visual intent into linguistic description

⛔️ Control indirection: Loss of direct manipulation that makes creative tools feel responsive

The fundamental mismatch is temporal. Creative workflows operate on 100 millisecond feedback cycles. Current MCP implementations operate on multi second request response cycles. This creates what interaction designers call temporal friction: tools that can’t keep pace with thought.


Hybrid workflows

Reality refuses to fit into clean categories. Most modern tools blend both patterns, creating what I call workflow gradients. Spaces where optimal AI interaction shifts based on what you’re trying to accomplish moment by moment.

Consider VS Code. When you’re architecting a new feature, saying “Create a user authentication module with OAuth integration and error handling” works perfectly. This is classic orchestration territory: discrete, goal oriented, with clear success criteria. But minutes later, when debugging why the authentication token isn’t parsing correctly, you’re in pure iteration mode. Stepping through code line by line, setting breakpoints, inspecting variables in real time.

Or take Figma’s recent MCP integration that lets AI generate designs using dev mode context. When you say “Create a landing page hero section using our design system tokens,” that’s orchestration. The AI synthesizes existing patterns and creates new assets. But the moment you start adjusting that generated hero section, tweaking spacing, swapping components, refining visual hierarchy, you’ve shifted into iteration mode where MCP can’t follow your real time creative decisions.

Hybrid environments demand nuanced integration approaches:

Same tool, same user, completely different interaction needs. One moment you want to delegate a well defined task. The next, you need the AI to observe and assist without interrupting your detective work. This creates fascinating design challenges: How do you build AI that knows when to speak up versus when to stay quiet? When to offer suggestions versus when to simply watch and learn?


The framework: Orchestration versus iteration

MCPs work brilliantly when your work involves discrete requests with clear outcomes. Finding files, synthesizing information, automating workflows. They struggle when your work requires continuous manipulation with real time feedback. Adjusting layouts, debugging code, refining designs. The difference comes down to whether your tasks have natural stopping points or flow as uninterrupted streams of micro decisions.

The contrast is crystal clear: Orchestration: “Make this happen” (delegate and coordinate) Iteration: “Help me refine this” (continuous improvement cycles)


Technical constraints and UX problems

Current MCP implementations aren’t just limited by engineering constraints. They compound into user experience problems that erode trust over time.

Imagine opening a complex Figma file with hundreds of components, then asking AI to help optimize your design system. What happens next resembles a digital telephone game: the AI can only “see” a compressed version of your file, filtered through token limits and serialization choices completely outside your control. You’re looking at rich, layered composition. The AI sees flattened lists of objects with properties. This mismatch creates what cognitive scientists call common ground erosion. You and the AI stop talking about the same thing, even though you both think you are.

Context window limitations create compound UX debt:

⛔️ Users must mentally translate between their complete view and the model’s partial understanding

⛔️ Response quality becomes inconsistent based on which context gets preserved through compression

⛔️ False confidence emerges when the model appears to “understand” but misses crucial details

Current MCP implementations also work like a series of photographs rather than continuous film. Each time you interact with the AI, it sees a snapshot of your current state but misses everything between snapshots. The careful selection you made, the three options you tried and discarded, the moment you realized the whole approach needed changing. All invisible to the AI.

Current implementation problems that hurt the experience:

⛔️ Polling overhead: Constant re serialization of state creates latency

⛔️ Event blindness: The model misses intermediate steps in your workflow

⛔️ Temporal gaps: Responses get based on outdated application state

This creates a peculiar conversation where one participant has amnesia about everything except final results. These limitations reflect current implementation choices rather than fundamental protocol constraints. The MCP specification supports richer interaction patterns, including bidirectional communication and streaming updates. However, these limitations affect real users today and shape the practical experience of working with MCP enabled tools.


Implementation strategies and the path forward

For orchestration environments: Double down on MCP sophistication with richer schema definitions, better cross tool orchestration, improved natural language interfaces, and enhanced permission controls.

For iteration environments: Explore embedded intelligence approaches with local inference, streaming context, progressive enhancement where AI observes without interrupting, and contextual suggestions triggered by user actions.

For hybrid environments: Develop mode aware AI that adapts interaction patterns, providing passive observation during manipulation phases and proactive assistance during planning phases.

The limitation isn’t with MCPs as a protocol. It’s with the assumption that all AI assistance should flow through natural language requests. The future likely involves multiple interaction modalities working together seamlessly. Ambient intelligence that observes without interrupting, learns user patterns, and provides contextual suggestions at precisely the right moments. Multi modal interaction where users seamlessly switch between direct manipulation, voice commands, and natural language depending on what the task actually requires. Predictive assistance that anticipates user needs based on workflow patterns rather than waiting for explicit requests.


Design principles

  1. Match interaction patterns to workflow type: Use conversational interfaces for orchestration work, embedded intelligence for iteration work
  2. Preserve direct manipulation: AI should enhance spatial and tactile interaction, never replace it
  3. Minimize context switching: Keep AI assistance within the primary work surface where possible
  4. Respect flow states: Avoid interruptions during high concentration work periods
  5. Maintain user agency: Provide suggestions and automation while preserving user control over final decisions

MCPs represent a mature solution for a specific class of problems. Those involving orchestration workflows like information retrieval, synthesis, and utility automation. The critical mistake is assuming this pattern generalizes to iteration workflows where continuous refinement and real time feedback are essential.

Product teams should evaluate their core user workflows against this framework: Are users primarily orchestrating complex tasks or iterating on existing work? Do tasks require real time feedback or can they tolerate request response latency? The answers should drive integration architecture, not technology availability or industry hype.

The goal isn’t making AI conversational everywhere. It’s making intelligence available in whatever interaction model best serves user intent. Sometimes that’s natural language. Sometimes it’s embedded observation. Often it’s a thoughtful combination of both.

MCPs have earned their place in the AI integration toolkit. But they’re one tool among many, not a universal solution. Understanding when to use them and when to look elsewhere will determine whether AI truly enhances human creativity or just adds sophisticated friction to getting work done.

Read more

Understanding Hype Why Structure Your Prompts Products Need Soul but Markets Reward Scale