What's the difference between a button drawn on paper and one drawn on a screen? Interactivity. The same principle is about to change the way we interact with LLMs. Today, most of your interactions with LLMs is via text. You type a message, the LLM gives you a text response.
Taken from ChatGPT
They say history repeats itself; that to predict the future, you must peer into the past. This harkens back to the old days of the web: where websites were mostly just paragraphs of text. But the thing that sets apart a website from a paper document is interactivity. You could draw a button on a notebook and it would never be anything more than a rectangle. But if you could draw a button on a screen, you could use it to do anything: make a sound, launch a nuke, or just call a friend. Interfaces moved from being static to dynamic and interactive with the advent of JavaScript.
I believe LLM interfaces are moving in the same direction: from just plain text, to richer & more interactive experiences.
The main reason is because text is just not the best option. Imagine a version of Amazon which was text-only. That everything was in monospace, and had a label beside it. Instead of buttons, you had to type in commands to perform any action.
That’s where we are with LLMs. Today, despite the relative success of MCP in making LLMs more powerful, we still only have text-only interfaces. From Shopify Engineering and their journey with adding visual components to an LLM assistant:
…the default text-only flow sets a ceiling on the user experience. For commerce, visual context isn't just helpful—it's essential. A product isn't just a SKU and price. It's images showing different angles, color swatches you can click, size selectors that update availability…
Taken from claude.ai
LLMs today can look at your calendar. They can add or modify events on your behalf. They have all the information they need to build the perfect interface customized specifically for what you want and what you like. And yet, we’re having to resort to giving complex text instructions like “Add an event, mark it as important. Schedule it from 4:00PM to 5:00PM. Send me an email 10 minutes in advance.”
Think of how much better it would be if the LLM could show you even a very simple visual form and fill it with sensible defaults, allowing you to have a quick look or make changes before you confirm things. That instead of typing in, “Actually, reschedule it from 3:00PM to 5:00PM”, you could just adjust a slider.
Taken from mcpui.dev
There is a certain charm to text- or voice-only interfaces. Think of the assistant on your phone, and how you could tell it to schedule an alarm and it just does it without showing you a single button. On even smaller devices such as smart watches, it’s hard to create a good interface due to limited screen space. There will always be situations where not presenting an interface is the best and most productive way.
But nobody (in their right mind) is going to be, say, editing code on their Apple watch. Simpler tasks are better off without interfaces, but complex tasks always benefit from visual components. A chart aids in understanding complex data. A picture captures complex emotions. Similarly, an interface makes complex information easier to understand.
What I’m proposing is not a radical new way to build applications. I’m not saying that websites should just be embeddable in the LLM’s response. Instead, LLMs and interfaces you already have should work together. Ask an LLM where a feature in your app is? Instead of “It’s under Preferences > New Projects > Defaults”, open the relevant dropdown. Want to order a pizza? Instead of “Which variant? Margherita, tandoori paneer, pepperoni, …”, show a little popup with a variant selector. Of course, there have to be limits: LLMs should only be allowed to perform certain actions, eg. navigation, after the user’s express consent.
The user should be able to use your app without needing an LLM. But at the same time, in the case they do reach out for an LLM assistant, be ready to offer visual components as well.