Make it Multimodal

I can't stop talking to my computer...

Tadhg O'Leary
February 14, 2025

I've got a confession to make…

I talk to my computer.

Don't worry, this isn't some Joaquin Phoenix in Her situation. But with AI, I can now research, search the web, and send emails - all using my voice.

Up until recently, most of the ways we interacted with software were of the point-and-click variety. But recent AI products are different. LLMs make multimodal experiences possible. Not just possible, but actually enjoyable.

Multimodal products allow for different types of user input - text, video, image, voice. I can talk to ChatGPT and have it generate text, or I can put text into a tool like RunwayML and have it generate a video.

But it’s not just about the initial user input. The entire experience can be multimodal, with AI generating different types of content on the fly. In the middle of a ChatGPT thread, I can ask it to generate an image, and it will. It can seamlessly blend text, visuals, and even voice in a single conversation.

This is a massive shift in how products work. It’s the thing I’m most excited about with AI. All of my favourite recent products have had a multimodal element to them. Here are some of the best ones I’ve found:

1. ChatGPT

Starting with the big dawg first.

ChatGPT dictation has been an absolute game changer for me. I rarely type anything into ChatGPT anymore, I dictate to it and engage in a conversation. ChatGPT goes further than voice-to-text - it can generate images, it can speak back to you, all that good stuff.

2. Wispr Flow

After ChatGPT, this is the multimodal product that I've been using the most. Wispr Flow allows you to talk to your computer and have it generate text directly in tools like your email, LinkedIn, social media, etc. This makes responding to people so much easier and has made email way less of a pain.

3. Boardy

Boardy has been the biggest mind fuck experience for me lately. It doesn’t even really have a UI (it’s an agent that works across platforms). But it definitely has an interesting UX.

You message Boardy on LinkedIn → The AI actually calls you’re phone and has a conversation with you → It follows up by email with recommendations of people to connect with.

4. Cursor

Cursor is an AI code generator that generates code from natural language prompts. You can go from chat to working software without writing a line of code.

What's even more useful about it for me is the ability to give Cursor different types of input and have it generate code based on that. I can share a video or an image of a website I really like and ask it to build a website in the same style as that. It's game-changing for prototyping and fast product exploration.

5. Tldraw

Tldraw is the most enjoyable product on this list for me. It's like an interactive whiteboard with AI built in. The combination of free-hand drawing with a prompt-able agent makes for a really unique experience.

Here's a cool use case: I can sketch out a rough user flow on the whiteboard, then ask Tldraw AI to generate a full storyboard based on that drawing.

6. Cove

Cove is kinda, sorta like an AI-first Miro. It’s similar to Tldraw but a lot more structured.

I’ve been using it a lot recently for research. It gives you a big canvas and allows you to create separate cards that you can prompt with AI. It can generate tables of data from a single prompt and you can upload images, pdfs etc to be analysed.

Let’s say I’m researching a startup idea. I could have one card called “competition” where I prompt AI to research competitors. Another called “MVP ideas”. You get the drift.

Honorable mentions

There's some very cool text-to-video and text-to-image products out there. Things like Pika, RunwayML, and OpenAI Sora. These look class, but to be honest, I haven't used any of them more than once. No point pretending I have.

Product idea

EdTech is a space where I think there's a massive opportunity to build multimodal learning experiences. People learn in different ways. Some people like interactive learning, some people like text, some people like video guides.

Wouldn't it be unreal to have a product that takes the same syllabus/content and adapts the mode of learning to the needs and preferences of the user?

Takeaways for product people

If you're building product, the best thing to do is to just starting using these tools as much as you can. Start playing around with things like Wispr Flow, Replit Agent, or any products you find interesting that are multimodal. You’ll learn a lot more by playing around with products than reading posts like this lol.

Then, if you want to make you’re product more multimodal, the place I’d probably start is by thinking through the areas of your product that have a heavy user input burden. You know, where users are writing a lot of text. Consider opportunities for allowing other modes of input.

Obviously, it’s not as simple as that. It assumes your product is properly integrated with an LLM to begin with etc. But coming purely from the perspective of improving the user experience - this is where I’d start.

Any other examples of multimodal products? If you’ve got good ones, please send them my way.

Reply

or to participate.