about | r&d | publications | courses | people | links

A. Drosopoulos, T. Mpalomenos, S. Ioannou, K. Karpouzis and S. Kollias
Emotionally-rich Man-machine Interaction Based on Gesture Analysis
Human-Computer Interaction International 2003, 22 - 27 June, Crete, Greece, vol. 4, pp. 1372 - 1376
Current Man-Machine Interaction (MMI) systems are capable of offering advanced and intuitive means of receiving input and communicating output and other messages to their users. Such interfaces give the opportunity to less technology-aware or handicapped people to use computers more efficiently and thus overcome related fears and preconceptions. In this context, most emotion- and expression-related facial and body gestures are considered to be universal, in the sense that they are recognized along different cultures. Therefore, the introduction of an "emotional dictionary" that includes descriptions and perceived meanings of facial expressions and body gestures can enhance the multilinguality of MMI applications, without the need of text or speech translation. Despite the progress in related research, our intuition of what a human expression or emotion actually represents is still based on trying to mimic the way the human mind works while making an effort to recognize such an emotion. While a lot of effort has been invested in examining individually different aspects of human expression, recent research has shown that this task can benefit from taking into account multimodal information. Facial and hand gestures as well as body pose constitute a powerful way of non-verbal human communication. Analysing such gestures is a complex task involving motion modelling and analysis, pattern recognition, machine learning, as well as psycholinguistic studies. Besides feeding an emotion analysis system, gestures and pose can also assist multi-user environments, where communication is traditionally text-based. Thus, simple chat or e-commerce applications can be transformed into powerful virtual meeting rooms, where different users interact, with or without the presence of avatars that take part in this process taking into account the perceived expressions of the users. To achieve natural interactivity, it is required to track and interpret the behaviour of people that interact with a virtual environment, analysing the implicit messages that people convey through their facial expressions, gestures and emotionally coloured speech. As a consequence, it will be possible to generate avatars that follow the behaviour, i.e., the profile, emotional state, actions, choices and reactions of people in a virtual environment, thus modelling their presence in it, in a natural way. The proposed paper extends work by the authors in the framework of emotion analysis based on facial cues to the context of combined hand and facial gesture analysis for emotionally enriched human-machine interaction; the proposed technologies are tested in real HCI problems, involving interaction of users with their own PC workstation. Our approach includes detection and tracking of the userĘs face, using face position differences from successive frames and pose estimation algorithms. Hand/body gesture extraction is then used for improving the emotion recognition process. Several types of gestures can be recognized: straight-line and turning-in-place motion gestures will be used as an indication of the users' interest in a specific item and its inspection. Also, tactile gestures, such as nudges, may again be viewed as an indication of user's interest, but in this time express the inability or indifference of the users to pick up and inspect the specific item. The first step towards gesture extraction is the detection and modelling of the userĘs arms and hands in the captured image. Then, segmentation and motion analysis is performed, followed by normalization of the results, so as to reflect the fact that the arm and hand form a hierarchical structure of rigid objects. These results are important not only for synthesis purposes, but also in order to interpret the users' emotional state or interest. For example, lack of motion from the part of users may be considered as lack of curiosity or attraction to what they are viewing at that moment, while rapid sequential moves may indicate irritated behaviour. According to different application scenarios, hand gestures can be classified into several categories. Although hand gestures are complicated to model because their meanings depend on people and cultures, a gesture vocabulary may be predefined in many applications, so that the ambiguity can be limited. In order to label emotional or expressive states, we will benefit from the activation-evaluation space model, a simple representation capable of capturing a wide range of significant issues in emotion. This model makes use of the fact that humans are influenced by feelings that are centrally concerned with positive or negative evaluations of people or events, while rating emotional states in terms of the associated activation level, i.e. the strength of the personĘs disposition to take some action rather than none. This arrangement provides a way of describing emotional states which are more tractable than using words, but which can be tr
11 February , 2003
A. Drosopoulos, T. Mpalomenos, S. Ioannou, K. Karpouzis and S. Kollias, "Emotionally-rich Man-machine Interaction Based on Gesture Analysis", Human-Computer Interaction International 2003, 22 - 27 June, Crete, Greece, vol. 4, pp. 1372 - 1376
[ save PDF] [ BibTex] [ Print] [ Back]

© 00 The Image, Video and Multimedia Systems Laboratory - v1.12