Thinker Taja: Masking flaws of AI as benefits

AI is turning into a big boom that will soon become the next thing. Before that happens it is important to help it and hold its hand to hide the obvious imperfections that are yet to be solved.

Chatbots, analytics tools, androids – they all are possible to make but insanely hard to perfect. The more AI needs to interact with humans, the more it can fail in the eyes of its audience. Speech, shape recognition and empathy are easy for us as humans because they come in naturally to us. This is not the case with computers.

Consider how weird online language translators can translate some sentences. Add understanding of previous context to the mix and the complexity grows exponentially. No wonder chatbots are hard to pull off. For that reason many try to find other means to solve the problems:

make chatbots stupid on purpose
prepare scenarios and turn chatbot into long series of condition
focus only on one very specific feature or topic
use humans as Wizards of Oz to pretend to be an AI

Basic concept – Simple yet complex survey

A challenge somewhat accidentally landed on my desk. I was tasked with looking at simple survey gathering feedback about AI conference. The design was little bit off, some of the features were clunky and the flow was not as clear as it could be.

Objective of the survey was to gather feedback from users and put it into NPS values. Anything beyond that was no longer primary business goal.

The whole flow for users started with picking a language. Once it was selected, first question would appear in selected language. There were multiple text boxes, icons, buttons and other elements present in the view. Each question was in separate view.

View with first question looked like this:

an introduction to survey on top
unlabelled icons working as buttons – one started recording, other stopped it
a box where recorded voice of user appeared as text
a text of first question
a button that let AI analyse the answer
an input with NPS value
a button that let user proceed to another question

Pressing the button to analyse the response in text input, put NPS value deduced by AI into the NPS input and displayed text informing user of the analysis and decision – showing the same value as was put into input. The NPS value could be changed or filled by user from the very beginning and thus this whole analysing could be skipped.

Another views – questions – followed similar template, but since intro was not needed again, the question took place of the intro and therefore the question was located in different location on other views than the first one.

Users could ignore AI and only pick NPS value. Or they could ignore speech-to-text functionality and type their answer which would then be analysed. If they wanted to use all of the features, they would listen the AI to read question from paragraph, answer with voice using clunky interface, wait for answer to be analysed, correct the NPS value or keep it as it was, and moved to another question.

To make it even less consistent, some questions had these features and some did not. Right the second question was read out by AI, but user could no longer reply with voice or have their text answer analysed and parsed into number. This could be problematic. As the moment users understood the system during the first question, this newly found toy of theirs was taken away from them as they moved to next question. There, the users were left with half of the features and big pile of confusion.

It was not just the interface that suffered. The AIs method of analysing text was imperfect. Stating a number from NPS scale would make the AI pick different number. Stating different phrases such as “good” and “awesome” would be both translated into the same value – rather extremely positive value. Same for “bad” and “horrible” as they both represented extremely low and equal value according to AI.

AI did not comprehend context and therefore questions “how likely would you…” would not have correctly analysed “very likely” as an answer. The AI would simply translate it into default average value despite being pretty good if context would be taken into account.

Upon chat with engineers it became clear that the strength of the AI was in long answers where it could attach emotions to words and phrases. And yet even that would still be imperfect.

With all this mess it seemed weird to even use these features. They were not needed to fill the survey, could be skipped and were pretty much in the way. But they could be salvaged and put to use if done right.

So lets recap:

numbers were evaluated as different NPS value (fun fact – number written as word and the same one written as a digit would be evaluated differently)
short phrases and words were either one of the extremes or average should the AI not understand it (e.g. “hfjfhfjfjfhhf”)
long sentences were stronger point but still weak compared to expectations
there was clear inconsistency between available features per each question
features did not really have any reason for being there
interface was hard to understand and unintuitive
there were lots of cool features, they just lacked structure and reason for being used

New approach – make it a game

To make the task even harder for me, clients were actually not sure about how, where and when the survey will be filled. This became the first major question we needed to settle. It become imperative to learn what device and platform will be used, when will the users use the survey, who will the users be, what's the goal of the survey, etc. I was also curious why NPS was chose as the form of the evaluation of feedback.

I got in touch with clients and after some back and forth we mutually agreed that survey will be primarily filled by attendees during the conference. Plan was to have hostesses carrying tablets asking attendees to fill the survey. There was also secondary way – sending survey via mail after the conference for additional feedback.

The second struggle was the structure – the logic of the user experience. This was AI conference and therefore using AI in survey sounded fitting, but with so many problems it lacked proper justification. Why is this bad AI doing something in my survey? What’s its purpose?

If I was not sitting down when it hit me I would have lost my footing and fallen. The key problem with AI is that people expect too much from it. They rarely understand that AI needs to learn on millions of examples before it is even useful. Why not present this and use this crucial detail? After all, it is AI conference, some will understand it and others came to learn it. As for us, it was the break we needed.

“Hello, I am AI and I want to learn new things. I can help you fill in the survey, will you help me as well?” That was a dumb draft of the intro. Users no longer had to use bad AI, they were teaching new AI how to improve. The feature of random talking and listening was replaced with simulation of chat. Users were actually communicating with this AI in a controlled environment as if it was their friend.

To make the users feel more motivated and thrilled there were supposed to be small elements of gamification to provide extrinsic motivation for the users. Few stats telling people they are the first filling the survey – the first to build this community of players – or informing them how many people filled it before them to let them know they belong to a big community of other attendees who filled the survey. There were also other stats related to each question.

Badges were introduced to reward people for making it easier for AI. They never knew how important it was for the analysis feature to have long sentences. What they knew is that there is shiny badge for them if they answer with genuine and longer answer and not just a word.

Beside helping the computer, badges were also there to let users get familiar with the concepts of gamification and to reward them for filling the survey. At first I was thinking about making them visible from the start to let people know what they can achieve. Later on my teammate on the project proposed to show the badges and stats after the answer of a first question. This new solution would gradually reveal gamification aspect of the survey so that users are eased into it and not overwhelmed by it from the start.

Few questions did not have this analysis in old version due to time restrictions as it needed to be coded for each specific question. But this would no longer be a problem in the new version, therefore, we could enable the feature even for those questions. Users were not relying on AI, they were helping it learn. If the AI took a bad guess or even admitted that it is not sure and asked the user to help it pick a value, it would be absolutely reasonable situation. The expectations have shifted massively.

The premise of user experience moved from simple survey to opportunity for users to teach AI something new, talk with it, be part of community with other attendees and play a game – all at the same time.

But there were still use cases when all these fancy features were useless. So instead of letting people choose during each question, we divided them right at the beginning. On the screen with intro they were asked if they want to use AI and help it learn or fill simple survey without AI. Questions were same for both options. The difference was that one had no additional gimmicks while the other one had chatting with AI and gamification.

This was an interesting concept to toy with. As it was being discussed we were also solving a third struggle – content and copy. Some of the questions were redundant, duplicated or just way too long, requiring explanation for what they are actually asking for. So we sat down with team and clients to rethink the questions, their purpose in higher scheme of things and the correct way to ask them.

Collaboration and designing UI

The masterplan was formulated and approved. Next step was putting it on paper as user interface. How to make it look, how to enforce the flow and make it great and intuitive experience for users?

I called a meeting with designers, PMs, frontend developers. Goal was to brainstorm some designs and small design features that would enhance the overall feel. I was open to suggestions not only about layout but also gamification elements, animations and ways of interacting with the survey. It was incredible session with dozen of ideas as an outcome.

After our big session, a visual designer and I sat down to pick a way to go and polish it. We had multiple sessions – brainstorming, rethinking, analysing.

We created drafts. We sent them for approval. We created black and white prototype. And after careful studying of visual standards and branding for the AI, the designer carefully crafted stunning pixel-perfect mockups and turned them into a prototype.

There were some worries about the default branding for chatbot of our specific AI not really looking like chatbot. But we were not here to do branding strategy, we came to create epic experience for users filling the survey. So we trusted the branding and adjusted the bits we could to make it fit our needs.

Conclusion

Project was well received by the clients, the users and the management. There was even an initiative to do something similar in other surveys – to start thinking about them in different way than just surveys. We all received praise for our work, responsiveness, proactivity and creativity. And yet I could not stop hearing the irony of this success as I was aware of the problems that occurred and the regrets I held about this very project.

Thinker Taja

Saturday, 4 November 2017

Masking flaws of AI as benefits

Basic concept – Simple yet complex survey

New approach – make it a game

Collaboration and designing UI

Conclusion

No comments:

Post a Comment