March 7, 2024

Balancing AI inputs with outputs

AiUX thought of the day: there’s an inverse relationship between how much information a prompt needs to contain and how predictable the outcome is. This applies to more than just chatbots… đź‘€ 👇

Let’s say I wanted AI to translate the United States Constitution into French. My prompt can be as simple as “Translate the US Constitution into French” and ChatGPT returns “Le PrĂ©amble de la constitution des États-Unis”

[1] Translate into French is a clear directive
[2] The source is known, and fixed

Now let’s say I asked it to translate a message into French asking my mom to pick up dinner. It returns a polite message in French, but it took a guess at what I was trying to say. The message it created was, “Can you pick up dinner on your way back, please?” Cool, I could send that to my mom.

Now, imagine instead that I was making a request to a subcontractor. It’s probably a good idea that I include the specific message that I want to translate. The prompt gets longer

What if I wanted to email the message in three languages: English, Spanish, and French. Now, my prompt
[1] should have specific text
[2] Needs to specify the language
[3] Probably needs to specify whether it’s meant to be in Spanish from Spain, Mexico, Argentina, etc (would people think about this?)
[4] Would need to be email length

Ok, it’s getting longer. That’s a lot to ask a user to remember, AND to type.

But what if you also wanted to make sure it follows your brand voice? Or what if this was being used in a product experience, where the text should change depending on what happened earlier, or what the relationship is between you and the recipient

What if you also wanted to generate an image or a visual to go along with it… etc

—-

The more unknowns are on the other side of your prompt, and the more predictable the result needs to be, the more information should be provided upfront. The “ease” of AI just got really complicated.

That’s where patterns like tuners can be so helpful! https://lnkd.in/gfwK2hxP

We don’t need to clutter the interface if someone can have a positive experience with a basic prompt. This is great for first time users, simple interfaces, or smaller tasks (like “summarize this”).

Complications in the interaction need to match complication in the input. This will be especially important when the thing being generated has a specific purpose, or matches something in the user’s brain like a website.

Midjourney does a great job with this by allowing the user to upload reference images. This one step can contain hundreds of unknown tokens that the AI can use to get a better understanding of what the person is going for. A picture speaks 1000 words

There are reasons that image references are not the best solution. But–the concept of tools that make it easy to add more context is sound.

As people are getting used to this technology, let the interface carry some of the load.