Joyce Chai: Enhancing Language Understanding through Salience Modeling in multimodal Human Computer Interaction (12 October 2006)
We had the pleasure of welcoming Joyce Chai to MOCHI last Wednesday. Her presentation (described here) gave an excellent overview of efforts to improve the accuracy of voice recognition by introducing multiple modes of input - particularly non-verbal modalities.
She described her research into the effectiveness of particular improvements to models (bigram models + Bayes' theorem) that aim to make a best guess at what words a human is speaking by looking at the likelihood of certain words to be spoken together in a given order (based on a corpus of example utterances).
She described her system which used a person's gestures (and eye movements) to identify what the person might be speaking about. Knowing what the person is gesturing at allows Dr. Chai to feed that information into the model and correspondingly strengthen the likelihood that words from a certain subset of utterances will be considered to have a higher probability of being what the speaker said.
In other words, knowing what I'm pointing at allows the model to better guess what I'm saying. She calls this improved model a salience-driven bigram model. Read more in the papers on her site.
0 Comments:
Post a Comment
<< Home