Multimodal interaction, This word has actually existed in the field of human-computer interaction for decades of years. It refers to one thing that can be done through multiple interactive methods. Many small-scale interactions have actually reached multi-modality, such as typing. You can type on a physical keyboard, you can type on a touch screen keyboard, you can use a stylus or handwriting to input, or you can use Voice input. However, there is still has a distance between us and the true, seamless multimodal interaction.
All people are disabled at some specific time. Visual impairment for the disabled can be blindness. For ordinary people, it also can be your eyes cannot see the world clearly when you just wake up in the early morning. Hearing impairment for the disabled can be deaf. For ordinary people, it also can be you are unable to hear your family’s words in a noisy environment. Speech impairment for the disabled can be aphasia. For ordinary people, it also can be you can not communicate with locals when you traveling abroad.
The interaction between humans and machines is composed of two parts: input and output. Any input or output method must be established on a certain perception ability from the user.
Now, the common output and input modalities are vision, hearing, touch; tap, and voice. Most of the current mainstream interaction methods are based on visual output + touch input. Even for operations that have reached multimodal interaction, such as typing. Visual + touch is still the main mode and assists with other modes, which means you still have to press the voice input button before you can start typing by voice.
The future life with multimodal interaction
The alarm clock at the bedside rang which makes me wake up in the early morning, I cannot see the world clearly (visual impairment). So I told the alarm clock to “turn off” to stop it. The system in my house detects that I got up but my eyes were sleepy, so it starts to broadcast the current time, weather, news, and my schedule. Since I was a little confused (cognitive impairment) when I just got up, it uses simple and clear words to report the news. When I brushed my teeth with an electric toothbrush, I couldn’t hear clearly (hearing impairment). The system switched from voice reading to display words in the mirror. So I can read the news with my eyes. Since I am brushing my teeth, so it’s hard for me to use my left hand to control the system(physical impairment). And I can not speak too (speech impairment). So the system adjusts the UI on the mirror and makes them super large which is easy for me to control.
Current multimodal interaction is still limited because of technical problems. But I believe that life with multimodal interaction will come to us in the future.
Nice job using a scenario to illustrate how this system could work across multiple modes and support all levels of need.
That is really fun! Thank you for pulling up the term of multimodal which is new to me. I think this multimodal system is believable and approachable in the future.
This would be very interesting. I notice that there have been a couple of releases of apps and headphones that allow you to translate the conversation. This would be great for communicating with someone of a different language. Maybe then more unification will happen amongst all people.
I like how you made examples of how a persons day would be from getting to brushing their teeth through the use of multimodal interaction.