点击下载
本文文档

当前位置：首页 - 正文

Behavioral overlays for non-verbal communication e

来源：动视网责编：小OO 时间：2025-09-29 19:36:21

Behavioral overlays for non-verbal communication e

BehavioralOverlaysforNon-VerbalCommunicationExpressiononaHumanoidRobotAndrewG.BrooksRoboticLifeGroupMITMediaLaboratory77MassachusettsAvenue,Cambridge,MA,USAzoz@media.mit.eduRonaldC.ArkinMobileRobotLaboratoryandGVUCenterCollegeofComputing,GeorgiaInst

推荐度：

点击下载本文 文档为doc格式

导读BehavioralOverlaysforNon-VerbalCommunicationExpressiononaHumanoidRobotAndrewG.BrooksRoboticLifeGroupMITMediaLaboratory77MassachusettsAvenue,Cambridge,MA,USAzoz@media.mit.eduRonaldC.ArkinMobileRobotLaboratoryandGVUCenterCollegeofComputing,GeorgiaInst

Behavioral Overlays for Non-Verbal Communication Expression on

a Humanoid Robot

Andrew G.Brooks

Robotic Life Group

MIT Media Laboratory

77Massachusetts Avenue,Cambridge,MA,USA

zoz@media.mit.edu

Ronald C.Arkin

Mobile Robot Laboratory and GVU Center College of Computing,Georgia Institute of Technology

Atlanta,GA,USA

arkin@cc.gatech.edu

July26,2006

Abstract

This research details the application of non-verbal communication display behaviors to an autonomous humanoid robot,including the use of proxemics, which to date has been seldom explored in theﬁeld of human-robot interaction.In order to allow the robot to communicate information non-verbally while simultaneously fulﬁlling its existing instrumental be-havior,a“behavioral overlay”model that encodes this data onto the robot’s pre-existing motor ex-pression is developed and presented.The state of the robot’s system of internal emotions and moti-vational drives is used as the principal data source for non-verbal expression,but in order for the robot to display this information in a natural and nu-anced fashion,an additional para-emotional frame-work has been developed to support the individual-ity of the robot’s interpersonal relationships with hu-mans and of the robot itself.An implementation on the Sony QRIO is described which overlays QRIO’s existing EGO architecture and situated schema-based behaviors with a mechanism for communicating this framework through modalities that encompass pos-ture,gesture and the management of interpersonal distance.

1Introduction

When humans interact with other humans,they use a variety of implicit mechanisms to share informa-tion about their own state and the state of the in-teraction.Expressed over the channel of the phys-ical body,these mechanisms are collectively known

as non-verbal communication or“body language”.It has not been proven that humans respond in precisely the same way to the body language of a humanoid robot as they do to that of a human.Nor have the speciﬁc requirements that the robot must meet in or-der to ensure such a response been empirically estab-lished.It has,however,been shown that humans will apply a social model to a sociable robot(Breazeal, C.2003),and will in many cases approach interac-tions with electronic media holding a set of precon-ceived expectations based on their experiences of in-teractions with other humans(Reeves,B.and Nass, C.1996).If these social equivalences extend to the interpretation of human-like body language displayed by a robot,it is likely that there will be correspond-ing beneﬁts associated with enabling robots to suc-cessfully communicate in this fashion.For a robot such as the Sony QRIO(Figure1)whose principal function is interaction with humans,we identify three such potential beneﬁts.

First is the practical beneﬁt of increasing the data bandwidth available for the“situational awareness”of the human,by transmitting more information without adding additional load to existing communi-cation mechanisms.If it is assumed that it is beneﬁ-cial for the human to be aware of the internal state of the robot,yet there are cases in which it is detrimen-tal for the robot to interrupt other activities(e.g.dia-log)in order to convey this information,an additional simultaneous data channel is called for.Non-verbal communication is an example of such a channel,and provided that the cues are implemented according to cultural norms and are convincingly expressible by the robot,adds the advantage of requiring no addi-tional training for the human to interpret.

1The second beneﬁt is the forestalling of miscom-munication within the expanded available data band-width.The problem raised if humans do indeed have automatic and unconscious expectations of receiving state information through bodily signals,is that cer-tain states are represented by null signals,and hu-mans interacting with a robot that does not commu-nicate non-verbally(or which does so intermittently or ineﬀectively)may misconstrue lack of communica-tion as deliberate communication of such a state.For example,failing to respond to personal verbal com-munications with attentive signals,such as eye con-tact,can communicate coldness or indiﬀerence.If a humanoid robot is equipped with those sorts of emo-tions,it is imperative that we try to ensure that such “false positive”communications are avoided to the fullest extent possible.

The third potential beneﬁt is an increased proba-bility that humans will be able to form bonds with the robot that are analogous to those formed with other humans—for example,aﬀection and trust.We be-lieve that the development of such relations requires that the robot appear“natural”—that its actions can be seen as plausible in the context of the inter-nal and external situations in which they occur.In other words,if a person collaborating with the robot can“at a glance”gain a perspective of not just what the robot is doing but why it is doing it,and what it is likely to do next,we think that he or she will be more likely to apply emotionally signiﬁcant models to the robot.A principal theory concerning how humans come to be so skilled at modeling other minds is Sim-ulation Theory,which states that humans model the motivations and goals of an observed agent by using their own cognitive structures to mentally simulate the situation of the observee(Davies,M.and Stone, T.1995,Gordon,R.1986,Heal,J.2003).This sug-gests that it is likely that the more the observable behavior of the robot displays its internal state by referencing the behaviors the human has been condi-tioned to recognize—the more it“acts like”a human in its“display behaviors”—the more accurate the human’s mental simulation of the robot can become. With these beneﬁts in mind as ultimate goals,we hereby report on activities towards the more imme-diate goal of realizing the display behaviors them-selves,under three speciﬁc constraints.First,such displays should not restrict the successful execution of“instrumental behaviors”,the tasks the robot is primarily required to perform.Second,the applica-tion of body language should be tightly controlled

Figure1:Sony QRIO,an autonomous humanoid robot designed for entertainment and interaction with humans,shown here in its standard posture. avoid confusing the human—it must be expressed when appropriate,and suppressed when not.Third, non-verbal communication in humans is subtle and complex;the robot must similarly be able to use the technique to represent a rich meshing of emotions, motivations and memories.To satisfy these require-ments,we have developed the concept of behavioral “overlays”for incorporating non-verbal communica-tion displays into pre-existing robot behaviors.First, overlays provide a practical mechanism for modifying the robot’s pre-existing activities“on-the-ﬂy”with expressive information rather than requiring the de-sign of speciﬁc new activities to incorporate it.Sec-ond,overlays permit the presence or absence of body language,and the degree to which it is expressed,to be controlled independent of the underlying activity. Third,the overlay system can be driven by an arbi-trarily detailed model of these driving forces without forcing this model to be directly programmed into every underlying behavior,allowing even simple ac-tivities to become more nuanced and engaging.

A brief summary of the contributions of this re-search follows:

1.This work broadens and reappraises the use of

bodily expressiveness in humanoid robots,par-

ticularly in the form of proxemics,which has

hitherto been only minimally considered due to

safety considerations and the relative scarcity of

mobile humanoid platforms.

2.This work introduces the concept of a behav-

ioral overlay for non-verbal communication that

both encodes state information into the physical

output of ordinary behaviors without requiring 2modiﬁcation to the behaviors themselves,and in-creases non-verbal bandwidth by injecting addi-tional communicative behaviors in the absence of physical resource conﬂicts.

3.This paper further develops the behavioral com-

munications overlay concept into a general model suitable for application to other robotic plat-forms and information sources.

4.In contrast with much prior work,the research

described here provides more depth to the infor-mation that is communicated non-verbally,giv-ing the robot the capability of presenting its in-ternal state as interpreted via its own individual-ity and interpersonal memory rather than simply an instantaneous emotional snapshot.

5.Similarly,while expressive techniques such as fa-

cial feature poses are now frequently used in robots to communicate internal state and engage the human,this work makes progress towards the use of bodily expression in a goal-directed fash-ion.

6.Finally,this work presents a functioning imple-

mentation of a behavioral overlay system on a real robot,including the design of data struc-tures to represent the robot’s individual respon-siveness to speciﬁc humans and to its own inter-nal model.

2Proxemics and Body Lan-guage

Behavioral researchers have comprehensively enumer-ated and categorized various forms of non-verbal communication in humans and animals.In consider-ing non-verbal communication for a humanoid robot, we have primarily focused on the management of spa-tial relationships and personal space(proxemics)and on bodily postures and movements that convey mean-ing(kinesics).The latter class will be more loosely referred to as“body language”,to underscore the fact that while many kinesic gestures can convey meaning in their own right,perhaps the majority of kinesic contributions to non-verbal communication occurs in a paralinguistic capacity,as an enhancement of con-current verbal dialog(Dittmann,A.1978).The use of this term is not intended,however,to imply that these postures and movements form a true language with discrete rules and grammars;but as Machotka and Spiegel point out,they convey coded messages

that humans can interpret(Machotka,P.and Spiegel, J.1982).Taken together,proxemics and body lan-guage can reﬂect some or all of the type of interaction, the relations between participants,the internal states of the participants and the state of the interaction.

2.1Proxemics

Hall,pioneer of theﬁeld of proxemics,identiﬁed a number of factors that could be used to analyze the usage of interpersonal space in human-human interactions(Hall,E.T.1966).State descriptors in-clude the potential for the participants to touch,smell and feel the body heat of one another,and the visual appearance one another’s face at a particular distance (focus,distortion,domination of visualﬁeld).The reactions of individuals to particular proxemic situ-ations were documented according to various codes, monitoring aspects such as the amount and type of visual and physical contact,and whether or not the body posture of the subjects was encouraging(“so-ciopetal”)or discouraging(“sociofugal”)of such con-tact.

An informal classiﬁcation was used to divide the continuous space of interpersonal distance into four general zones according to these state descriptors.In order of increasing distance,these are“Intimate”,“Personal”,“Socio-Consultive”and“Public”.Hu-man usage of these spaces in various relationships and situations has been observed and summarized(Weitz, S.1974),and can be used to inform the construc-tion of a robotic system that follows similar guidelines (subject to variations in cultural norms).

Spatial separation management therefore has prac-tical eﬀects in terms of the potential for sensing and physical contact,and emotional eﬀects in terms of the comfort of the participants with a particular spatial arrangement.What constitutes an appropriate ar-rangement depends on the nature of the interaction (what kinds of sensing and contact are necessary,and what kinds of emotional states are desired for it),the relationship between the participants(an appropri-ate distance for a married couple may be diﬀerent than that between business associates engaged in the same activity),and the current emotional states of the participants(the preceding factors being equal, the shape and size of an individual’s ideal personal “envelope”can exhibit signiﬁcant variation based on his or her feelings at the time,as shown in Figure2).

To ensure that a robot displays appropriate usage of and respect for personal space,and to allow it to take actions to manipulate it in ways that the human 3

Figure 2:Illustration of how an individual’s ‘per-sonal’space zone may vary in size and shape accord-ing to emotion.During fear,the space an individ-ual considers his own might expand,with greater ex-pansion occurring to his rear as he avoids potential threats that he cannot see.During anger,the space an individual considers her own might expand to a greater extent to her front as she directs her con-frontational attention to known presences.Proxemic Factor Human QRIO Kinesthetic potential (arms only)

60–75cm 20cm Kinesthetic potential (arms plus torso)90–120cm 25–35cm Minimum face recog-nition distance

<5cm

20cm

Table 1:A comparison of select proxemic factors that diﬀer between adult humans and QRIO (equipped with standard optics).

can understand and infer from them the underlying reasoning,requires consideration of all of these fac-tors.In addition,the size of the robot (which is not limited to the range ﬁxed by human biology)may have to be taken into account when considering the proxemics that a human might be likely to ﬁnd nat-ural or comfortable.See Table 1for a comparison of several proxemic factors in the case of adult humans and QRIO,and Figure 3for the general proxemic zones that were selected for QRIO.

There has been little exploration of the use of prox-emics in human-robot interaction to date.The rea-sons for this are perhaps mostly pragmatic in na-ture.Robotic manipulators,including humanoid up-per torsos,can be dangerous to humans and in most cases are not recommended for interactions within distances at which physical contact is possible.In addition,humanoid robots with full mobility are still relatively rare,and those with legged locomotion (complete humanoids)even more so,precluding

such

Figure 3:QRIO’s proxemic zones in this implemen-tation,selected as a balance between the zones for an adult human and those computed using QRIO’s rel-evant proxemic factors.The demarcation distances between zones represent the midpoint of a fuzzy threshold function,rather than a ‘hard’cutoﬀ.investigations.However,some related work exists.The mobile robot ‘Chaser’by Yamasaki and Anzai focused on one of the practical eﬀects of interpersonal distance by attempting to situate itself at a distance from the human that was ideal for sensor operation,in this case the collection of speech audio(Yamasaki,N.and Anzai,Y.1996).This work demonstrated that awareness of personal distance considerations could be acted upon to improve speech recognition perfor-mance.

In a similar vein,Kanda et al.performed a human-robot interaction ﬁeld study in which the robot dis-tinguished concurrently present humans as either participants or observers based on their proxemic distance(Kanda,T.,Hirano,T.,Eaton,D.and Ishig-uro,H.2004).A single ﬁxed distance threshold was used for the classiﬁcation,however,and it was left up to the human participants to maintain the appropri-ate proxemics.

Likhachev and Arkin explored the notion of “com-fort zones”for a mobile robot,using attachment the-ory to inform an emotional model that related the robot’s comfort to its spatial distance from an object of attachment(Likhachev,M.and Arkin,R.C.2000).The results of this work showed that the robot’s ex-ploration behavior varied according to its level of comfort;while the published work did not deal with human-robot interaction directly,useful HRI scenar-ios could be envisaged for cases in which the object of attachment was a human.

More recently,there have been investigations into modifying the spatial behavior of non-humanoid mo-bile robots in order to make people feel more at ease with the robot.Smith investigated self-adaptation of 4

Similar eﬀorts have used considerations of hu-mans’personal space to aﬀect robot navigation. Nakauchi and Simmons developed a robot whose goal was to queue up to register for a conference along with human participants(Nakauchi,Y.and Simmons, R.2000).The robot thus needed to determine how to move to positions that appropriately matched human queuing behavior.Althaus et al.describe a system developed to allow a robot to approach a group of people engaged in a discussion,enter the group by assuming a spatially appropriate position,and then leave and continue its navigation(Althaus,P.,Ishig-uro,H.,Kanda,T.,Miyashita,T.and Christensen, H.I.2004).Christensen and Pacchierotti use prox-emics to inform a control strategy for the avoidance behavior exhibited by a mobile robot when forced by a constraining passageway to navigate in close prox-imity to humans(Christensen,H.I.and Pacchierotti, E.2005).Pacchierotti et al.then report positive results from a pilot user study,in which subjects pre-ferred the condition in which the robot moved fastest and signaled the most deference to the humans(by moving out of the way earliest and keeping the great-est distance away),though the subjects were all fa-miliar and comfortable with robots(Pacchierotti,E., Christensen,H.I.and Jensfelt,P.2005).While in-formed by proxemics theory,these eﬀorts focus on control techniques for applying social appropriate-ness to navigation activity,rather than using utilizing proxemic behavior as part of a non-verbal communi-cation suite.

Another recent experiment,by te Boekhorst et al.,gave some consideration to potential eﬀects of the distance between children and a non-humanoid robot on the children’s attention to a“pass the par-cel”game(te Boekhorst,R.,Walters,M.,Koay,K.L., Dautenhahn,K.and Nehaniv,C.2005).No signiﬁ-

cant eﬀects were recorded,however the authors ad-mit that the data analysis was complicated by vio-lations of the assumptions underlying the statistical tests and therefore we believe these results should not be considered conclusive.Data from the same series of experiments was used to point out that the chil-dren’s initial approach distances were socially appro-priate according to human proxemics theory(Walters, M.L.,Dautenhahn,K.,Koay,K.L.,Kaouri,C.,te Boekhorst,R.,Nehaniv, C.L.,Werry,I.and Lee, D.2005a).A follow-up experiment was conducted using the same non-humanoid robot to investigate the approach distances that adults preferred when interacting with the robot(Walters,M.L.,Dauten-hahn,K.,te Boekhorst,R.,Koay,K.L.,Kaouri, C.,Woods,S.,Nehaniv,C.L.,Lee,D.and Werry,

I.2005b).A majority,60%,positioned themselves at

a distance compatible with human proxemics theory, whereas the remainder,a signiﬁcant minority of40%, assumed positions signiﬁcantly closer.While these results are generally encouraging in their empirical support of the validity of human-human proxemic theory to human-robot interactions,caution should be observed in extrapolating the results in either di-rection due to the non-humanoid appearance of the robot.

More recently than the initial submission of this pa-per,there has been interest shown in the use of prox-emics and non-verbal communication by the robotic search and rescue community.In a conference poster Bethel and Murphy proposed a set of guidelines for af-fect expression by appearance-constrained(i.e.non-humanoid)rescue robots based on their proxemic zone with respect to a human(Bethel,C.L.and Mur-phy,R.R.2006).The goal of this work was once again to improve the comfort level of humans through so-cially aware behavior.

While not involving robots per se,some virtual en-vironment(VE)researchers have examined reactions to proxemic considerations between humans and hu-manoid characters within immersive VEs.Bailenson et al.,for example,demonstrated that people exhib-ited similar personal spatial behavior towards virtual humans as they would towards real humans,and this eﬀect was increased the more the virtual human was believed to be the avatar of a real human rather than an agent controlled by the computer(Bailenson,J.N., Blascovich,J.,Beall,A.C.and Loomis,J.M.2003). However,they also encountered the interesting result that subjects exhibited more pronounced avoidance behavior when proxemic boundaries were violated by 5an agent rather than an avatar;the authors theo-rize that this is due to the subjects attributing more rationality and awareness of social spatial behavior to a human-driven avatar than a computer-controlled agent,thus“trusting”that the avatar would not walk into their virtual bodies,whereas an agent might be more likely to do so.This may present a lesson for HRI designers:the precise fact that an autonomous robot is known to be under computer control may make socially communicative proxemic awareness(as opposed to simple collision avoidance)particularly important for robots intended to operate in close proximity with humans.

2.2Body Language

Body language is the set of communicative body mo-tions,or kinesic behaviors,including those that are a reﬂection of,or are intended to have an inﬂuence on, the proxemics of an interaction.Knapp identiﬁesﬁve basic categories:

1.Emblems,which have speciﬁc linguistic meaning

and are what is most commonly meant by the term‘gestures’;

2.Illustrators,which provide emphasis to concur-

rent speech;

3.Aﬀect Displays,more commonly known as fa-

cial expressions and used to represent emotional states;

4.Regulators,which are used to inﬂuence conver-

sational turn-taking;and

5.Adaptors,which are behavioral fragments that

convey implicit information without being tied to dialog(Knapp,M.1972).

Dittmann further categorizes body language into discrete and continuous(persistent)actions,with dis-crete actions further partitioned into categorical(al-ways performed in essentially the same way)and non-categorical(Dittmann,A.1978).Body posture itself is considered to be a kinesic behavior,inasmuch as motion is required to modify it,and because they can be modulated by attitude(Knapp,M.1972). The kinds of body language displays that can be realized on a particular robot of course depend on the mechanical design of the robot itself,and these categorizations of human body language are not nec-essarily of principal usefulness to HRI designers other than sometimes suggesting implementational details (e.g.the necessity of precisely aligning illustrators with spoken dialog).However,it is useful to examine

the broad range of expression that is detailed within these classiﬁcations,in order to select those appear-ing to have the most utility for robotic applications. For example,classiﬁed within these taxonomies are bodily motions that can communicate explicit sym-bolic concepts,deictic spatial references(e.g.point-ing),emotional states and desires,likes and dislikes, social status,engagement and boredom.As a result there has been signiﬁcant ongoing robotics research overlapping with all of the areas thus referenced.For a comprehensive survey of socially interactive robotic research in general,incorporating many of these as-pects,see(Fong,T.,Nourbakhsh,I.and Dauten-hahn,K.2003).

The use of emblematic gestures for communication has been widely used on robotic platforms and in theﬁeld of animatronics;examples are the MIT Me-dia Lab’s‘Leonardo’,an expressive humanoid which currently uses emblematic gesture as its only form of symbolic communication and also incorporates ki-nesic adaptors in the form of blended natural idle motions(Breazeal,C.,Brooks,A.G.,Gray,J.,Hoﬀ-man,G.,Kidd, C.,Lee,H.,Lieberman,J.,Lock-erd,A.and Chilongo,D.2004),and Waseda Univer-sity’s WE-4RII‘emotion expression humanoid robot’which was also designed to adopt expressive body postures(Zecca,M.,Roccella,S.,Carrozza,M.C., Cappiello,G.,Cabibihan,J.-J.,Dario,P.,Takanobu, H.,Matsumoto,M.,Miwa,H.,Itoh,K.and Takan-ishi,A.2004).

Comprehensive communicative gesture mecha-nisms have also been incorporated into animated hu-manoid conversational agents and VE avatars.Kopp and Wachsmuth used a hierarchical kinesic model to generate complex symbolic gestures from gesture phrases,later interleaving them tightly with concur-rent speech(Kopp,S.and Wachsmuth,I.2000,Kopp, S.and Wachsmuth,I.2002).Guye-Vuilleme et al. provided collaborative VE users with the means to manually display a variety of non-verbal bodily ex-pressions on their avatars using aﬁxed palette of potential actions(Guye-Vuilleme, A.,Capin,T.K., Pandzic,I.S.,Thalmann,N.M.and Thalmann, D. 1998).

Similarly,illustrators and regulators have been used to punctuate speech and control conversa-tional turn-taking on interactive robots and ani-mated characters.Aoyama and Shimomura im-plemented contingent head pose(such as nodding) and automaticﬁller insertion during speech interac-tions with Sony QRIO(Aoyama,K.and Shimomura, 6

H.2005).Extensive work on body language for an-imated conversational agents has been performed at the MIT Media Lab,such as Thorisson’s implemen-tation of a multimodal dialog skill management sys-tem on an animated humanoid for face-to-face in-teractions(Thorisson,K.R.1996)and Cassell and Vilhjalmsson’s work on allowing human-controlled full-body avatars to exhibit communicative reactions to other avatars autonomously(Cassell,J.and Vilh-jalmsson,H.1999).

Robots that communicate using facial expression have also become the subject of much attention, too numerous to summarize here but beginning with well-known examples such as Kismet and the face robots developed by Hara(Breazeal,C.2000,Hara, F.,Akazawa,H.and Kobayashi,H.2001).In ad-dition to communication of emotional state,some of these robots have used aﬀective facial expression with the aim of manipulating the human,either in terms of a desired emotional state as in the case of Kismet or in terms of increasing desired motivation to perform a collaborative task as in subsequent work on Leonardo(Brooks,A.G.,Berlin,M.,Gray,J.and Breazeal,C.2005).

However much of this related work either focuses directly on social communication through body lan-guage as the central research topic rather than the in-teroperation of non-verbal communication with con-current instrumental behavior,or on improvements to the interaction resulting from directly integrat-ing non-verbal communication as part of the inter-action design process.When emotional models are incorporated to control aspects such as aﬀective dis-play,they tend to be models designed to provide a “snapshot”of the robot’s emotional state(for exam-ple represented by a number of discrete categories such as the Ekman model(Ekman,P.and David-son,R.J.1994))suitable for immediate communica-tion via the robot’s facial actuators but with minimal reference to the context of the interaction.However recent research by Fridlund has strongly challenged the widely accepted notion that facial expressions are an unconscious and largely culturally invariant rep-resentation of internal emotional state,arguing in-stead that they are very deliberate communications that are heavily inﬂuenced by the context in which they are expressed(Fridlund,A.1994).This is a con-tention that may be worth keeping in mind concern-ing robotic body language expression in general. Given the capabilities of our robotic platform (Sony QRIO)and the relevance of the various types

of body language to the interactions envisaged for QRIO,the following aspects were chosen for speciﬁc attention:

I.Proxemics and the management of interper-

sonal distance,including speed of locomo-

tion;

II.Emblematic hand and arm gestures in sup-port of the above;

III.The rotation of the torso during interaction, which in humans reﬂects the desire for in-

teraction(facing more squarely represents a

sociopetal stance,whereas displaying an an-

gular oﬀset is a sociofugal posture)—thus

known as the“sociofugal/sociopetal axis”;

IV.The posture of the arms,including continu-ous measures(arms akimbo,defensive rais-

ing of the arms,and rotation of the fore-

arms which appear sociopetal when rotated

outwards and sociofugal when rotated in-

wards)and discrete postural stances(e.g.

arms folded);

V.Head pose and the maintenance of eye con-tact;

VI.Illustrators,both pre-existing(head nod-ding)and newly incorporated(attentive

torso leaning).

See Figure15in Section6for examples of some of these poses as displayed by QRIO.

3Behavioral Overlays

The concept of a behavioral overlay for robots can be described as a motor-level modiﬁer that alters the resulting appearance of a particular output confor-mation(a motion,posture,or combination of both). The intention is to provide a simultaneous display of information,in this case non-verbal communica-tion,through careful alteration of the motor system in such a way that the underlying behavioral activi-ties of the robot may continue as normal.Just as the behavior schemas that make up the robot’s behav-ioral repertoire typically need not know of the exis-tence or method of operation of one another,behav-ioral overlays should be largely transparent to the behavior schema responsible for the current instru-mental behavior at a given time.Preservation of this level of modularity simpliﬁes the process of adding or learning new behaviors.

7As a simpliﬁed example,consider the case of a sim-ple2-DOF output system:the tail of a robotic dog such as AIBO,which can be angled around horizontal and vertical axes.The tail is used extensively for non-verbal communication by dogs,particularly through various modes of tail wagging.Four examples of such communications via the tail from the AIBO motor primitives design speciﬁcation are the following:•Tail Wagging Friendly:Amplitude of wag large, height of tail low,speed of wag baseline slow but related to strength of emotion.

•Tail Wagging Defensive:Amplitude of wag small,height of tail high,speed of wag baseline fast but related to strength of emotion.

•Submissive Posture:In cases of low domi-nance/high submission,height of tail very low (between legs),no wagging motion.

•“Imperious Walking”:Simultaneous with loco-motion in cases of high dominance/low submis-sion;amplitude of wag small,height of tail high, speed of wag fast.

One method of implementing these communication modes would be to place them within each behavioral schema,designing these behaviors with the increased complexity of responding to the relevant emotional and instinct models directly.An alternative approach —behavioral overlays—is to allow simpler under-lying behaviors to be externally modiﬁed to produce appropriate display activity.Consider the basic walk-ing behavior b0in which the dog’s tail wags naturally left and right at heightφ0with amplitude±θ0and frequencyω0from default values for the walking step motion rather than the dog’s emotional state.The motor state M T of the tail under these conditions is thus given by:

M T(t)=

(t)

θ0sin(ω0t)

φ0

Now consider a behavioral overlay vector for the tail o0=[α,λ,δ]applied to the active behavior ac-cording to the mathematics of a multidimensional overlay coordination functionΩT to produce the fol-lowing overlaid tail motor state M+

M+

T (t)=ΩT(o0,b0(t))=

αθ0sin(λω0t)

φ0+δ

For appropriate construction of o0,the overlay sys-tem is now able to produce imperious walking(α 1,

λ 1,δ 0)as well as other communicative walk styles not speciﬁcally predeﬁned(e.g.“submissive walking”:α=0,δ 0),without any modiﬁca-tion of the underlying walking behavior.Moreover, the display output will continue to reﬂect as much as possible the parameters of the underlying activity (in this case the walking motion)in addition to the internal state used to generate the communications overlay(e.g.dominance/submission,emotion).

However in our AIBO example so far,the overlay system is only able to communicate the dog’s internal state using the tail when it is already being moved by an existing behavior(in this case walking).It may therefore be necessary to add an additional special type of behavior whose function is to keep the over-lay system supplied with motor input.This type of behavior is distinguished by two characteristics:a diﬀerent stimulus set than normal behaviors(either more or less,including none at all);and its output is treated diﬀerently by the overlay system(which may at times choose to ignore it entirely).We refer to such behaviors as“idler”behaviors.In this case,consider idler behavior b1which simply attempts to continu-ously wag the tail some amount in order to provide the overlay system with input to be accentuated or suppressed:

b1(t)=

θ1sin(ω1t)

φ1

This behavior competes for action selection with the regular“environmental”behaviors as normal, and when active is overlaid byΩT in the same fash-ion.Thus the overlay system with the addition of one extremely basic behavior is able to achieve all of the four tail communications displays originally speciﬁed above,including variations in degree and combina-tion,by appropriate selection of overlay components based on the robot’s internal state.For active be-havior b i and overlay o j,the tail activity produced is:

M+

(t)=ΩT(o j,b i(t))

However,the addition of specialized“idler”behav-iors provides additional opportunities for manipula-tion of the robot’s display activity,as these behaviors can be designed to be aware of and communicate with the overlay system—for example,to enable the trig-gering of emblematic gestures.If the robot’s normal behaviors are subject at a given time to the stimu-lus vector[S],the idler behaviors can be thought of as responding to an expanded stimulus vector[S,Ψ]

Figure4:The behavioral overlay model,shown over-laying active environmental behaviors b1..n and idler behaviors i1..m after action selection has already been performed.

whereΨis the vector of feedback stimuli from the overlay system.For instance,AIBO’s tail idler be-havior upon receiving a feedback stimulusψp might interrupt its wagging action to trace out a predeﬁned shape P with the tail tip:

b1(ψp,t)=

P x(t) P y(t)

In general,then,let us say that for a collection of active(i.e.having passed action selection)envi-ronmental behaviors B and idler behaviors I,and an overlay vector O,the overlaid motor state M+ is given according to the model:

M+=Ω(O,[B(S),I([S,Ψ]))

This model is represented graphically in Figure4. In the idealized modular case,the environmental be-haviors need neither communicate directly with nor even be aware of the existence of the behavioral over-lay system.For practical purposes,however,a coarse level of inﬂuence by the environmental behaviors on the overlay system is required,because a complete determination a priori of whether or not motor modi-ﬁcation will interfere with a particular activity is very diﬃcult to make.This level of inﬂuence has been ac-counted for in the model,and the input toΩfrom B and I as shown incorporates any necessary com-munication above and beyond the motor commands themselves.

Behavioral overlays as implemented in the research described in this paper include such facilities for schemas to communicate with the overlay system when necessary.Schemas may,if desired,protect

themselves against modiﬁcation in cases in which in-terference is likely to cause failure of the behavior (such as a task,likeﬁne manipulation,that would similarly require intense concentration and suppres-sion of the non-verbal communication channel when performed by a human).Care should,of course, be taken to use this facility sparingly,in order to avoid the inadvertent sending of non-verbal null sig-nals during activities that should not require such concentration,or for which the consequences of fail-ure are not excessively undesirable.

Schemas may also make recommendations to the overlay system that assist it in setting the envelope of potential overlays(such as reporting the charac-teristic proxemics of an interaction;e.g.whether a speaking behavior takes place in the context of an intimate conversation or a public address).In gen-eral,however,knowledge of and communication with the overlay system is not a requirement for execution of a behavior.As QRIO’s intentional mechanism is reﬁned to better facilitate high-level behavioral con-trol,the intention system may also communicate with the behavioral overlay system directly,preserving the modularity of the individual behaviors.

Related work of most relevance to this concept is the general body of research related to motion pa-rameterization.The essential purpose of this class of techniques is to describe bodily motions in terms of parameters other than their basic joint-angle time series.Ideally,the new parameters should capture essential qualities of the motion(such as its overall appearance)in such a way that these qualities can be predictably modiﬁed or held constant by modifying or holding constant the appropriate parameters.This has advantages both for motion generation(classes of motions can be represented more compactly as pa-rameter ranges rather than clusters of individual ex-emplars)and motion recognition(novel motions can be matched to known examples by comparison of the parameter values).

Approaches to this technique have been applied to motion generation for animated characters can diﬀer in their principal focus.One philosophy in-volves creating comprehensive sets of basic motion templates that can then be used to fashion more com-plex motions by blending and modifying them with a smaller set of basic parameters,such as duration, amplitude and direction;this type of approach was used under the control of a scripting language to add real-time gestural activity to the animated character OLGA(Beskow,J.and McGlashan,S.1997).At the 9other extreme,the Badler research group at the Uni-versity of Pennsylvania argued that truly lifelike mo-tion requires the use of a large number of parameters concerned with eﬀort and shaping,creating the an-imation model EMOTE based on Laban Movement Analysis;however the model is non-emotional and does not address autonomous action generation(Chi, D.,Costa,M.,Zhao,L.and Badler,N.2000).

One of the most well known examples of motion parameterization that has inspired extensive atten-tion in both the animation and robotics communities is the technique of“verbs and adverbs”proposed by Rose et al.(Rose,C.,Cohen,M.and Bodenheimer, B.1998).In this method verbs are speciﬁc base ac-tions and adverbs are collections of parameters that modify the verbs to produce functionally similar mo-tor outputs that vary according to the speciﬁc quali-ties the adverbs were designed to aﬀect.

This technique allows,for example,an animator to generate a continuous range of emotional expressions of a particular action,from say‘excited waving’to ‘forlorn waving’,without having to manually create every speciﬁc example separately;instead,a single base‘waving’motion would be created,and then pa-rameter ranges that described the variation from‘ex-cited’to‘forlorn’provide the means of automatically situating an example somewhere on that continuum. While the essential approach is general,it is typi-cally applied at the level of individual actions rather than overall behavioral output due to the diﬃculty of specifying a suitable parameterization of all possible motion.

Similarly,the technique of“morphable models”proposed by Giese and Poggio describes motion ex-pressions in terms of pattern manifolds inside which plausible-looking motions can be synthesized and de-composed with linear parameter coeﬃcients,accord-ing to the principles of linear superposition(Giese, M.A.and Poggio,T.2000).In their original exam-ple,locomotion gaits such as‘walking’and‘marching’were used as exemplars to deﬁne the parameter space, and from this the parameters could be re-weighted in order to synthesize new gaits such as‘limping’.Fur-thermore,an observed gait could then be classiﬁed against the training examples using least-squares es-timation in order to estimate its relationship to the known walking styles.

Not surprisingly,motion parameterization tech-niques such as the above have been shown signiﬁcant interest by the segment of the robotics community concerned with robot programming by demonstra-tion.Motion parameterization holds promise for the central problem that this approach attempts to solve: extrapolation from a discrete(and ideally small)set of demonstrated examples to a continuous task com-petency envelope;i.e.,knowing what to vary to turn a known spatio-temporal sequence into one that is func-tionally equivalent but better represents current cir-cumstances that were not prevailing during the orig-inal demonstration.The robotics literature in this area,even just concerning humanoid robots,is too broad to summarize,but see(Peters,R.A.II,Camp-bell,C.C.,Bluethmann,W.J.and Huber,E.2003)for a representative example that uses a verbs-adverbs approach for a learned grasping task,and illustrates the depth of the problem.

Fortunately,the problem that behavioral overlays seeks to address is somewhat simpler.In theﬁrst place,the task at hand is not to transform a known action that would be unsuccessful if executed under the current circumstances into one that now achieves a successful result;rather,it is to make modiﬁcations to certain bodily postures during known successful actions to a degree that communicates information without causing those successful actions to become unsuccessful.This distinction confers with it the luxury of being able to choose many parameters in advance according to well-described human posture taxonomies such as those referred to in Section2,al-lowing algorithmic attention to thus be concentrated on the appropriate quantity and combination of their application.Furthermore,such a situation,in which the outcome even of doing nothing at all is at least the success of the underlying behavior,has the added advantage that unused bodily resources can be em-ployed in overlay service with a reasonable level of assuredness that they will not cause the behavior to fail.

Secondly,the motion classiﬁcation task,where it exists at all,is not the complex problem of param-eterizing monolithic observed output motion-posture combinations,but simply to attempt to ensure that such conformations,when classiﬁed by the human observer’s built-in parameterization function,will be classiﬁed correctly.In a sense,behavioral overlays start with motor descriptions that have already been parameterized—into the instrumental behavior it-self and the overlay information—and the task of the overlay function is to maintain and apply this parameterization in such a fashion that the output remains eﬀective in substance and natural in appear-ance,a far less ambiguous situation than the reverse

10case of attempting to separate unconstrained natural behavior into its instrumental and non-instrumental aspects(e.g.‘style’from‘content’).

In light of the above,and since behavioral overlays are designed with the intention of aﬀecting all of the robot’s behavioral conformations,not just ones that have been developed or exhibited at the time of pa-rameter estimation,the implementation of behavioral overlays described here has focused on two main ar-eas.First,the development of general rules concern-ing the desired display behaviors and available bodily resources identiﬁed in Section2that can be applied to a wide range of the robot’s activities.And second,the development of a maintenance and releasing mecha-nism that maps these rules to the space of emotional and other internal information that the robot will use them to express.Details of the internal data repre-sentations that provide the robot with the contextual information necessary to make these connections are given in Section4,and implementation details of the overlay system itself are provided in Section5.

4Relationships and Attitudes An important driving force behind proxemics and body language is the internal state of the individ-ual.QRIO’s standard EGO Architecture contains a system of internally-maintained emotions and state variables,and some of these are applicable to non-verbal communication.A brief review follows;please refer to Figure5for a block diagram of the unmodi-ﬁed EGO Architecture.

QRIO’s emotional model(EM)contains six emo-tions(ANGER,DISGUST,FEAR,JOY,SADNESS, SURPRISE)plus NEUTRAL,along the lines of the Ekman proposal(Ekman,P.and Davidson,R.J. 1994).Currently QRIO is only able to experience one emotion from this list at a given time,though the non-verbal communication overlay system has been designed in anticipation of the potential for this to change.Emotional levels are represented by continuous-valued variables.

QRIO’s internal state model(ISM)is also a sys-tem of continuous-valued variables,that are main-tained within a certain range by a homeostasis mech-anism.Example variables include FATIGUE,IN-FORMATION,VITALITY and INTERACTION.A low level of a particular state variable can be used to drive QRIO to seek objects or activities that can be expected to increase the level,and vice versa.

For more details of the QRIO

emotionally

Figure5:Functional units and interconnections in QRIO’s standard EGO Architecture.

grounded architecture beyond the above summary, see(Sawada,T.,Takagi,T.,Hoshino,Y.and Fu-jita,M.2004).The remainder of this section details additions to the architecture that have been devel-oped speciﬁcally to support non-verbal communica-tion overlays.In the pre-existing EGO architecture, QRIO has been designed to behave diﬀerently with diﬀerent individual humans by changing its EM and ISM values in response to facial identiﬁcation of each human.However,these changes are instantaneous upon recognition of the human;the lack of addi-tional factors distinguishing QRIO’s feelings about these speciﬁc individual humans limits the amount of variation and naturalness that can be expressed in the output behaviors.

In order for QRIO to respond to individual hu-mans with meaningful proxemic and body language displays during personal interactions,QRIO requires a mechanism for preserving the diﬀerences between these individual partners—what we might generally refer to as a relationship.QRIO does have a long-term memory(LTM)feature;an associative memory, it is used to remember connections between people and objects in predeﬁned contexts,such as the name of a person’s favorite food.To support emotional relationships,which can then be used to inﬂuence non-verbal communication display,a data structure to extend this system has been developed.

Each human with whom QRIO is familiar is repre-sented by a single‘Relationship’structure,with sev-eral internal variables.A diagram of the structure can be seen in Figure6.The set of variables chosen for this structure have generally been selected for the practical purpose of supporting non-verbal communi-cation rather than to mirror a particular theoretical model of interpersonal relationships,with exceptions 11

Figure6:An example instantiation of a Relationship structure,showing arbitrarily selected values for the single discrete enumerated variable and each of the ﬁve continuous variables.

noted below.

The discrete-valued‘Type’ﬁeld represents the gen-eral nature of QRIO’s relationship with this individ-ual;it provides the global context within which the other variables are locally interpreted.Because a pri-mary use of this structure will be the management of personal space,it has been based on delineations that match the principal proxemic zones set out by Weitz(Weitz,S.1974).An INTIMATE relationship signiﬁes a particularly close friendly or familial bond in which touch is accepted.A PERSONAL relation-ship represents most friendly relationships.A SO-CIAL relationship includes most acquaintances,such as the relationship between fellow company employ-ees.And a PUBLIC relationship is one in which QRIO may be familiar with the identity of the human but little or no social contact has occurred.It is en-visaged that this value will not change frequently,but it could be learned or adapted over time(for exam-ple,a PUBLIC relationship becoming SOCIAL after repeated social contact).

All other Relationship variables are continuous-valued quantities,bounded and normalized,with some capable of negative values if a reaction simi-lar in intensity but opposite in nature is semantically meaningful.The‘Closeness’ﬁeld represents emo-tional closeness and could also be thought of as fa-miliarity or even trust.The‘Attraction’ﬁeld repre-sents QRIO’s desire for emotional closeness with the human.The‘Attachment’ﬁeld,based on Bowlby’s theory of attachment behavior(Bowlby,J.1969),is a variable with direct proxemic consequences and rep-resents whether or not the human is an attachment object for QRIO,and if so to what degree.The‘Sta-tus’ﬁeld represents the relative sense of superiority or inferiority QRIO enjoys in the relationship,allowing the structure to represent formally hierarchical rela-tionships in addition to informal friendships and ac-quaintances.Finally,the Relationship structure has a‘Conﬁdence’ﬁeld which represents QRIO’s assess-ment of how accurately the other continuous variables in the structure might represent the actual relation-ship;this provides a mechanism for allowing QRIO’s reactions to a person to exhibit diﬀering amounts of variability as their relationship progresses,perhaps tending to settle as QRIO gets to know them better. In a similar vein,individual humans can exhibit markedly diﬀerent output behavior under circum-stances in which particular aspects of their internal states could be said to be essentially equivalent;their personalities aﬀect the way in which their emotions and desires are interpreted and expressed.There is evidence to suggest that there may be positive out-comes to endowing humanoid robots with percepti-ble individual diﬀerences in the way in which they react to their internal signals and their relationships with people.Studies in social psychology and com-munication have repeatedly shown that people pre-fer to interact with people sharing similar attitudes and personalities(e.g.(Blankenship,V.,Hnat,S.M., Hess,T.G.and Brown, D.R.2004,Byrne, D.and Griﬃt,W.1969))—such behavior is known as“sim-ilarity attraction”.A contrary case is made for“com-plementary attraction”,in which people seek inter-actions with other people having diﬀerent but com-plementary attitudes that have the eﬀect of balanc-ing their own personalities(Kiesler,D.J.1983,Orford, J.1986).

These theories have been carried over into the sphere of interactions involving non-human partici-pants.In the product design literature,Jordan dis-cusses the“pleasurability”of products;two of the categories in which products have the potential to satisfy their users are“socio-pleasure”,relating to inter-personal relationships,and“ideo-pleasure”,re-lating to shared values(Jordan,P.W.2000).And a number of human-computer interaction studies have demonstrated that humans respond to computers as social agents with personalities,with similarity at-traction being the norm(e.g.(Nass, C.and Lee, K.M.2001)).Yan et al.performed experiments in which AIBO robotic dogs were programmed to sim-ulateﬁxed traits of introversion or extroversion,and showed that subjects were able to correctly recognize the expressed trait;however,in this case the prefer-ences observed indicated complementary attraction, with the authors postulating the embodiment of the robot itself as a potential factor in the reversal(Yan,

C.,Peng,W.,Lee,K.M.and Jin,S.2004).

As a result,if a humanoid robot can best be thought as a product which seeks to attract and ul-timately fulﬁl the desires of human users,it might be reasonable to predict that humans will be more attracted to and satisﬁed by robots which appear to match their own personalities and social responses. On the other hand,if a humanoid robot is instead best described as a human-like embodied agent that is already perceived as complementary to humans as a result of its diﬀerences in embodiment,it might alternatively be predicted that humans will tend to be more attracted to robots having personalities that are perceptibly diﬀerent from their own.In either case,a robot possessing the means to hold such at-titudes would be likely to have a general advantage in attaining acceptance from humans,and ultimately the choice of the precise nature of an individual robot could be left up to its human counterpart.

To allow QRIO to exhibit individualistic(and thus hopefully more interesting)non-verbal behavior,a system that interprets orﬁlters certain aspects of QRIO’s internal model was required.At present this system is intended only to relate directly to the robot’s proxemic and body language overlays;no at-tempt has been made to give the robot anything ap-proaching a complete‘personality’.As such,the data structure representing these individual traits is in-stead entitled an‘Attitude’.Each QRIO has one such structure;it is envisaged to have a long-to-medium-term eﬀect,in that it is reasonable for the structure to be pre-set and not to change thereafter,but it also might be considered desirable to have the robot be able to change its nature somewhat over the course of a particularly long term interaction(even a lifetime), much as human attitudes sometimes mellow or be-come more extreme over time.Brief instantiation of temporary replacement Attitude(and Relationship) structures would also provide a potential mechanism for QRIO to expand its entertainment repertoire with ‘acting’ability,simply by using its normal mecha-nisms to respond to interactions as though they fea-tured diﬀerent participants.

The Attitude structure consists of six continuous-valued,normalized variables in three opposing pairs, as illustrated in Figure7.Opposing pairs are directly related in that one quantity can be computed from the other;although only three variables are therefore computationally necessary,they are speciﬁed in this way to be more intuitively grounded from the point of view of the programmer or behavior

designer.

Figure7:An arbitrary instantiation of an Atti-tude structure,showing the three opposing pairs of continuous-valued variables,and the ISM,EM and Relationship variables that each interprets.

The‘Extroversion’and‘Introversion’ﬁelds are adapted directly from the Extroversion dimension of the Five-Factor Model(FFM)of personality(McCrae, R.R.and Costa,P.T.1996),and as detailed above were used successfully in experiments with AIBO. In the case of QRIO,these values are used to af-fect the expression of body language and to interpret internal state desires.High values of Extroversion encourage more overt body language,whereas high Introversion results in more subtle bodily expression. Extroversion increases the eﬀect of the ISM variable INTERACTION and decreases the eﬀect of INFOR-MATION,whereas Introversion does the reverse.

The‘Aggressiveness’and‘Timidity’ﬁelds are more loosely adapted from the FFM—they can be thought of as somewhat similar to a hybrid of Agreeable-ness and Conscientiousness,though the exact na-ture is more closely tailored to the speciﬁc emotions and non-verbal display requirements of QRIO.Ag-gressiveness increases the eﬀect of the EM variable ANGER and decreases the eﬀect of FEAR,while Timidity accentuates FEAR and attenuates ANGER. High Timidity makes submissive postures more prob-able,while high Aggressiveness raises the likelihood of dominant postures and may negate the eﬀect of Relationship Status.

Finally,the‘Attachment’and‘Independence’ﬁelds depart from the FFM and return to Bowlby;their only eﬀect is proxemic,as an interpreter for the value of Relationship Attachment.While any human can represent an attachment relationship with the robot, robots with diﬀerent attitudes should be expected to 13respond to such relationships in diﬀerent ways.Re-lationship Attachment is intended to have the eﬀect of compressing the distance from the human that the robot is willing to stray;the robot’s own Indepen-dence or Attachment could be used for example to alter the fall-oﬀprobabilities at the extremes of this range,or to change the‘sortie’behavior of the robot to bias it to make briefer forays away from its attach-ment object.

The scalar values within the Relationship and At-titude structures provide the raw material for aﬀect-ing the robot’s non-verbal communication(and po-tentially many other behaviors).These have been carefully selected in order to be able to drive the out-put overlays in which we are interested.However, there are many possible ways in which this material could then be interpreted and mathematically con-verted into the overlay signals themselves.We do not wish to argue for one particular numerical algorithm over another,because that would amount to claiming that we have quantitative answers to questions such as“how often should a robot which is90%introverted and65%timid lean away from a person to whom it is only15%attracted?”.We do not make such claims. Instead,we will illustrate the interpretation of these structures through two examples of general data us-age models that contrast the variability of overlay generation available to a standard QRIO versus that available to one equipped with Relationships and At-titudes.

Sociofugal/sociopetal axis:We use a scalar value for the sociofugal/sociopetal axis,S,from0.0(the torso facing straight ahead)to1.0(maximum oﬀ-axis torso rotation).Since this represents QRIO’s unwill-ingness to interact,a QRIO equipped with non-verbal communication skills but neither Relationships nor Attitudes might generate this axis based on a(pos-sibly non-linear)function s of its ISM variable IN-TERACTION,I INT,and its EM variable ANGER, E ANG:

S=s(I INT,E ANG)

The QRIO equipped with Relationships and Atti-tudes is able to apply more data towards this compu-tation.The value of the robot’s Introversion,A Int, can decrease the eﬀect of INTERACTION,resulting in inhibition of display of the sociofugal axis accord-ing to some combining function f.Conversely,the value of the robot’s Aggression,A Agg,can increase the eﬀect of ANGER,enhancing the sociofugal result according to the combining function g.Furthermore,the robot may choose to take into account the relative Status,R Sta,of the person with whom it is interact-ing,in order to politely suppress a negative display. The improved sociofugal axis generation function s+ is thus given by:

S =s+(f(I INT,A Int),g(E ANG,A Agg),R Sta) Proxemic distance:Even if the set of appropriate proxemic zones P for the type of interaction is speci-ﬁed by the interaction behavior itself,QRIO will need to decide on an actual scalar distance within those zones,D,to stand from the human.A standard QRIO might thus use its instantaneous EM variable FEAR,E F EA,to inﬂuence whether it was prepared to choose a close value or to instead act more warily and stand back,according to another possibly non-linear function d:

D=d(P,E F EA)

The enhanced QRIO,on the other hand,is able to make much more nuanced selections of proxemic dis-tance.The proxemic Typeﬁeld of the Relationship, R T yp,allows the robot to select the most appropriate single zone from the options given in P.The value of the robot’s Timidity,A T im,increases the eﬀect of FEAR,altering the extent to which the robot displays wariness according to a combining function u.The Closeness of the robot’s relationship to the human, R Clo,can also be communicated by altering the dis-tance it keeps between them,as can its Attraction for the human,R Attr.If the robot has an Attachment re-lationship with the human,its strength R Atta can be expressed by enforcing an upper bound on D,modu-lated by the robot’s Independence A Ind according to the combining function v.The improved ideal prox-emic distance generation function d+is thus given by:

D =d+

P,R T yp,u(E F EA,A T im),

R Clo,R Attr,v(R Atta,A Ind)

Clearly,the addition of Relationships and Atti-tudes oﬀer increased scope for variability of non-verbal communications output.More importantly, however,they provide a rich,socially grounded framework for that variability,allowing straightfor-ward implementations to be developed that non-verbally communicate the information in a way that varies predictably with the broader social context of the interaction.

145System Architecture and Im-plementation

The QRIO behavioral system,termed the EGO Ar-chitecture,is a distributed object-based software ar-chitecture based on the OPEN-R modular environ-ment originally developed for AIBO.Objects in EGO are in general associated with particular functions. There are two basic memory objects:Short-Term Memory(STM)which processes perceptual informa-tion and makes it available in processed form at a rate of2Hz,and Long-Term Memory(LTM)which associates information(such as face recognition re-sults)with known individual humans.The Internal Model(IM)object manages the variables of the ISM and EM.The Motion Controller(MC)receives and executes motion commands,returning a result to the requesting module.

QRIO’s actual behaviors are executed in up to three Situated Behavior Layer(SBL)objects:the Normal SBL(N-SBL)manages behaviors that ex-ecute at the2Hz STM update rate(homeostatic behaviors);the Reﬂexive SBL(R-SBL)manages be-haviors that require responses faster than the N-SBL can provide,and therefore operates at a signiﬁcantly higher update frequency(behaviors requiring percep-tual information must communicate with perceptual modules directly rather than STM);and deliberative behavior can be realized in the Deliberative SBL(D-SBL).

Within each SBL,behaviors are organized in a tree-structured network of schemas;schemas perform minimal communication between one another,and compete for activation according to a winner-take-all mechanism based on the resource requirements of in-dividual schemas.For more information about the EGO Architecture and the SBL system of behavior control,please see(Fujita,M.,Kuroki,Y.,Ishida,T. and Doi,T.2003).

5.1NVC Object

Because behavioral overlays must be able to be ap-plied to all behavior schemas,and because schemas are intended to perform minimal data sharing(so that schema trees can be easily constructed from in-dividual schemas without having to be aware of the overall tree structure),it is not possible or desirable to completely implement an appropriate overlay sys-tem within the SBLs themselves.Instead,to imple-ment behavioral overlays an additional,

independent Figure8:Interdependence diagram of the EGO Ar-chitecture with NVC.

Non-Verbal Communication(NVC)object was added to the EGO Architecture.

The NVC object has data connections with several of the other EGO objects,but its most central func-tion as a motor-level overlay object is to intercept and modify motor commands as they are sent to the MC object.Addition of the NVC object therefore involves reconnecting the MC output of the various SBL objects to the NVC,and then connecting the NVC object to the MC.MC responses are likewise routed through the NVC object and then back to the SBLs.

In addition to the motor connection,the NVC ob-ject maintains a connection to the IM output(for receiving ISM and EM updates,which can also be used as a2Hz interrupt timer),the STM target up-date(for acquiring information about the location of humans,used in proxemic computations)and a cus-tom message channel to the N-SBL and R-SBL(for receiving special information about the interaction, and sending trigger messages for particular gestures and postures).See Figure8for a graphical overview of the EGO Architecture with NVC.In addition to the connections shown,the NVC object manages the behavioral overlay values themselves with reference to the Relationship and Attitude structures;Figure9 has an overview of the internal workings of NVC.

Figure9:The non-verbal communication(NVC) module’s conceptual internal structure and data in-terconnections with other EGO modules.

5.2Overlays,Resources and Timing The data representation for the behavioral overlays themselves is basic yetﬂexible.They are divided according to the major resource types Head,Arms, Trunk and Legs.For each joint(or walking parameter value,in the case of the legs)within a resource type, the overlay maintains a value and aﬂag that allows the value to be interpreted as either an absolute bias (to allow constant changes to the postural conforma-tion)or a relative gain(to accentuate or attenuate incoming motions).In addition,each overlay cate-gory contains a time parameter for altering the speed of motion of the resource,which can also beﬂagged as a bias or a gain.Finally,the legs overlay contains an additional egocentric position parameter that can be used to modify the destination of the robot in the case of walking commands.

Motion commands that are routed through the NVC object consist of up to two parts:a command body,and an option parameter set.Motions that are parameterized(i.e.,that have an option part)can be modiﬁed directly by the NVC object according to the current values of the overlays that the NVC ob-ject is storing.Such types of motions include direct positioning of the head,trunk and arm with explicit joint angle commands;general purpose motions that have been designed with reuse in mind,such as nod-ding(the parameter specifying the depth of nod);and commands that are intended to subsequently run in direct communication with perceptual systems with the SBL excluded from the decision loop,such as head tracking.Unfortunately due to the design of the MC system,unparameterized motion commands (i.e.,those with just a body)cannot be altered before reaching the MC object;but they can be ignored or replaced with any other single parameterized or unpa-rameterized motion command having the same actu-ator resource requirements(this possibility is not yet taken advantage of in the current implementation). Resource management in the EGO Architecture is coarse grained andﬁxed;it is used for managing schema activation in the SBLs as well as just for pre-senting direct motion command conﬂicts.The re-source categories in EGO are the Head,Trunk,Right Arm,Left Arm and Legs.Thus a body language motion command that wanted only to adjust the so-ciofugal/sociopetal axis(trunk rotate),for example, would nevertheless be forced to take control of the entire trunk,potentially blocking an instrumental be-havior from executing.The NVC object implements a somewhatﬁner-grained resource manager by virtue of the overlay system.By being able to choose to modify the commands of instrumental behaviors di-rectly,or not to do so,the eﬀect is as if the resource were managed at the level of individual joints.

This raises,however,the problem of the case of time steps at which environmental behaviors do not send motion commands yet it is desired to modify the robot’s body language;this situation is common. The NVC object skirts this problem by relying on a network of fast-activating idle schemas residing in the N-SBL.Each idle schema is responsible for a single MC resource.On each time step,if no instrumen-tal behavior claims a given resource,the appropriate idle schema sends a null command to the NVC ob-ject for potential overlaying.If an overlay is desired, the null command is modiﬁed into a genuine action and passed on to the MC;otherwise it is discarded. Since commands from idle and other special overlay schemas are thus treated diﬀerently by the NVC ob-ject than those from behavioral schemas,the NVC object must keep track of which schemas are which; this is accomplished by a handshaking procedure that occurs at startup time,illustrated graphically in Fig-ure10.Normal operation of the idle schemas is illus-trated in Figure11.

For practical considerations within the EGO Ar-chitecture,actions involved in non-verbal communi-cation can be divided into four categories according to two classiﬁcation axes.First is the timing require-ments of the action.Some actions,such as postural shifts,are not precisely timed and are appropriately suited to the2Hz update rate of the N-SBL.Oth-ers,however,are highly contingent with the activity, such as nodding during dialog,and must reside in the R-SBL.Second is the resource requirements of the action.Some actions,such as gross posture,re-

Figure10:Special schemas register with the NVC object in a handshaking procedure.An acknowledge-ment from NVC is required because the SBLs do not know when the NVC object is ready to accept messages,so they attempt to register until success-ful.Within the R-SBL,the parent registers instead of the individual schemas,to preserve as closely as possible the mode of operation of the pre-existing conversational reﬂex system.The registration sys-tem also supports allowing schemas to deliberately change type at any time(e.g.from an idle schema to a behavioral schema)though no use of this extensi-bility has been made to

date.

Figure11:Normal idle activity of the system in the absence of gesture or reﬂex triggering.N-SBL idle and gesture schemas are active if resources permit. Active idle schemas send null motor commands to NVC on each time step.The dialog reﬂex parent is al-ways active,but the individual dialog reﬂex schemas are inactive.

quire only partial control of a resource,whereas oth-ers require its total control.Partial resource manage-ment has just been described above,and it functions identically with commands that are also synchronous and originate in the R-SBL.However there are also non-verbal communication behaviors that require to-tal resource control,such as emblematic gestures,and these are also implemented with the use of special-ized“triggered”schemas:at the N-SBL level these schemas are called gesture schemas,and at the R-SBL level they are called dialog reﬂex schemas.

Gesture schemas reside within the same subtree of the N-SBL as the idle schemas;unlike the idle schemas,however,they do not attempt to remain active when no instrumental behavior is operating. Instead,they await gesture trigger messages from the NVC object,because the NVC object can not create motion commands directly,it can only modify them. Upon receiving such a message,a gesture schema de-17

Figure12:Triggering of a gesture occurs by NVC sending a trigger message to all registered N-SBL schemas.There is no distinction between idle and gesture schemas as idle schemas can include gesture-like responses such as postural shifts.Individual ac-tive schemas decode the trigger message and choose whether or not to respond with a motor command; in this typical example one gesture schema responds. termines if it is designed to execute the requested gesture;if so,it increases its activation level(AL), sends the motion command,and then reduces its AL again(Figure12).The gesture,which may be an un-parameterized motion from a predesigned set,is then executed by the MC object as usual.

Dialog reﬂex schemas operate in a similar but slightly diﬀerent fashion.Existing research on QRIO has already resulted in a successful reactive speech interaction system that inserted contingent attentive head nods and speechﬁller actions into theﬂow of conversation(Aoyama,K.and Shimomura,H.2005). It was of course desirable for the NVC system to com-plement,rather than compete with,this existing sys-tem.The prior system used the“intention”mech-anism to modify the ALs of dialog reﬂex schemas, residing in the R-SBL,at appropriate points in the dialog.The NVC object manages these reﬂexes in almost exactly the same way.

Dialog reﬂex schemas are grouped under a parent schema which has no resource requirements and is al-ways active.Upon receiving a control message from the NVC object,the parent schema activates its chil-dren’s monitor functions,which then choose to in-crease their own ALs,trigger the reﬂex gesture and then reduce their ALs again.Thus the only practi-cal diﬀerence from the previous mechanism is that the reﬂex schemas themselves manage their own

Figure13:Triggering of a dialog reﬂex occurs by NVC sending an activation message to the active dialog reﬂex parent,which then signals the self-evaluation functions of its children.Children self-evaluating to active(e.g.speech is occurring)raise their own activation levels and send motor commands if required.The dialog reﬂex system remains active and periodically self-evaluating until receiving a deac-tivation message from NVC,so may either be tightly synchronized to non-causal data(e.g.pre-marked outgoing speech)or generally reactive to causal data (e.g.incoming speech).

values,rather than the triggering object doing it for them.In addition to the existing head nodding dialog reﬂex,a second dialog reﬂex schema was added to be responsible for leaning the robot’s torso in towards the speaker(if the state of the robot’s interest and the Relationship deem it appropriate)in reaction to speech(Figure13).

5.3Execution Flow

The overallﬂow of execution of the NVC behavioral overlay system(Figure14)is thus as follows:

•At system startup,the idle,and gesture schemas

must register themselves with the NVC object

to allow subresource allocation and communica-

tion.These schemas set their own AL high to al-

low them to send messages;when they receive an

acknowledgement from NVC,they reduce their

ALs to a level that will ensure that instrumental

behaviors take priority(Figure10).

•Also at system startup,the dialog reﬂex schema

parent registers itself with NVC so that NVC

knows where to send dialog reﬂex trigger mes-

sages(Figure10).

Figure14:Executionﬂow of the NVC overlay system. Timing and synchronization issues such as respond-ing to changes in the internal model and proxemic state(from STM)and waiting for the human to re-spond to an emblematic gesture are implicitly man-aged within the overlay calculation state(i.e.enter-ing this state does not guarantee that any change to the overlays will in fact be made).

•Proxemic and body language activity com-mences when an N-SBL schema informs NVC that an appropriate interaction is taking place.

Prior to this event NVC passes all instrumental motion commands to MC unaltered and discards all idle commands(Figure11).The N-SBL inter-action behavior also passes the ID of the human subject of the interaction to NVC so it can look up the appropriate Relationship.

•NVC generates overlay values using a set of func-tions that take into account the IM and Attitude of the robot,the Relationship with the human and the proxemic hint of the interaction(if avail-able).This overlay generation phase includes the robot’s selection of the ideal proxemic dis-tance,which is converted into a walking overlay if needed.

•Idle commands begin to be processed according to the overlay values.As idle commands typ-ically need only be executed once until inter-rupted,the NVC object maintains a two-tiered internal subresource manager in which idle com-

mands may hold a resource until requested by an behavior command,but completing behavior commands free the resource immediately.

•To make the robot’s appearance more lifelike and less rigid,the regular ISM update is used to trigger periodic re-evaluation of the upper body overlays.When this occurs,resources held by idle commands are freed and the next incoming idle commands are allowed to execute the new overlays.

•STM updates are monitored for changes in the proxemic state caused by movement of the target human,and if necessary the overlays are regen-erated to reﬂect the updated situation.This pro-vides an opportunity for triggering of the robot’s emblematic gestures concerned with manipulat-ing the human into changing the proxemic sit-uation,rather than the robot adjusting it di-rectly with its own locomotion.A probabilistic function is executed that depends mainly on the robot’s IM and Attitude and the prior response to such gestures,and according to the results an emblematic gesture of the appropriate nature and urgency(e.g.beckoning the human closer, or waving him or her away)is triggered.Trig-ger messages are sent to all registered idle and gesture schemas,with it up to the schemas to choose whether or not to attempt to execute the proposed action(Figure12).

•Activation messages are sent from NVC to the dialog reﬂex schemas when dictated by the pres-ence of dialog markers or the desire for general reactivity of the robot(Figure13).

6Results

All of the non-verbal communication capabilities set out in Section2were implemented as part of the over-lay system.To test the resulting expressive range of the robot,we equipped QRIO with a suite of sam-ple Attitudes and Relationships.The Attitude selec-tions comprised a timid,introverted QRIO,an ag-gressive,agonistic QRIO,and an“average”QRIO. The example Relationships were crafted to represent an“old friend”(dominant attribute:high Closeness), an“enemy”(dominant attribute:low Attraction), and the“Sony President”(dominant attribute:high Status).The test scenario consisted of a simple so-cial interaction in which QRIO notices the presence

19of a human in the room and attempts to engage him or her in a conversation.The interaction schema was the same across conditions,but the active Atti-tude and Relationship,as well as QRIO’s current EM and ISM state,were concurrently communicated non-verbally.This scenario was demonstrated to labora-tory members,who were able to recognize the appro-priate changes in QRIO’s behavior.Due to the high degree of existing familiarity of laboratory members with QRIO,however,we did not attempt to collect quantitative data from these interactions.

Due to the constraints of our arrangement with Sony to collaborate on their QRIO architecture,for-mal user studies of the psychological eﬀectiveness of QRIO’s non-verbal expressions(as distinct from the design of the behavioral overlay model itself)have been recommended but not yet performed.In lieu of such experiments,the results of this work are il-lustrated with a comparison between general inter-actions with QRIO(such as the example above)in the presence and absence of the overlay system.Let us specify the baseline QRIO as equipped with its standard internal system of emotions and instincts, as well as behavioral schema trees for tracking of the human’s head,for augmenting dialog with illustrators such as head nodding and torso leaning,and for the conversation itself.The behavioral overlay QRIO is equipped with these same attributes plus the overlay implementation described here.

When the baseline QRIO spots the human,it may begin the conversation activity so long as it has suf-ﬁcient internal desire for interaction.If so,QRIO commences tracking the human’s face and respond-ing to the human’s head movements with movements of its own head to maintain eye contact,using QRIO’s built-in face tracking andﬁxation module.When QRIO’s built-in auditory analysis system detects that the human is speaking,QRIO participates with its reﬂexive illustrators such as head nodding,so the hu-man canﬁnd QRIO to be responsive.However,it is up to the human to select an appropriate inter-personal distance—if the human walks to the other side of the room,or thrusts his or her face up close to QRIO’s,the conversation continues as before.Sim-ilarly,the human has no evidence of QRIO’s desire for interaction other than the knowledge that it was suﬃciently high to allow the conversational behav-ior to be initiated;and no information concerning QRIO’s knowledge of their relationship other than what might come up in the conversation.

Contrast this scenario with that of the

behavioral

Figure15:A selection of QRIO body postures gener-ated by the overlay system.From left to right,top to bottom:normal(sociopetal)standing posture;maxi-mum sociofugal axis;defensive arm crossing posture; submissive attentive standing posture;open,outgo-ing raised arm posture;defensive,pugnacious raised arm posture.

overlay QRIO.When this QRIO spots the human and the conversation activity commences,QRIO immedi-ately begins to communicate information about its internal state and the relationship it shares with the human.QRIO begins to adopt characteristic body postures reﬂecting aspects such as its desire for in-teraction and its relative status compared with that of the human—for some examples see Figure15.If the human is too close or too far away,QRIO may walk to a more appropriate distance(see Figure3for speciﬁcs concerning the appropriate interaction dis-tances selected for QRIO).Alternatively,QRIO may beckon the human closer or motion him or her away, and then give the human a chance to respond before adjusting the distance itself if he or she does not. This may occur at any time during the interaction—approach too close,for example,and QRIO will back away or gesticulate its displeasure.Repeated cases of QRIO’s gestures being ignored reduces its patience for relying on the human’s response.

The behavioral overlay QRIO may also respond to the human’s speech with participatory illustrators. However its behavior is again more communicative:if this QRIO desires interaction and has a positive rela-tionship with the human,its nods and torso leans will be more pronounced;conversely,when disengaged or uninterested in the human,these illustrators will be attenuated or suppressed entirely.By adjusting the 20parameters of its head tracking behavior,the overlay system allows QRIO to observably alter its respon-siveness to eye contact.The speed of its movements is also altered:an angry or fearful QRIO moves more rapidly,while a QRIO whose interest in the interac-tion is waning goes through the motions more slug-gishly.All of this displayed internal information is modulated by QRIO’s individual attitude,such as its extroversion or introversion.The human,already trained to interpret such signals,is able to continu-ously update his or her mental model of QRIO and the relationship between them.

It is clear that the latter interaction described above is richer and exhibits more variation.Of course,many of the apparent diﬀerences could equally well have been implemented through speciﬁ-cally designed support within the conversational in-teraction schema tree.However the overlay system allows these distinguishing features to be made avail-able to existing behaviors such as the conversation schema,as well as future interactions,without such explicit design work on a repeated basis.

For the most part,the overlays generated within this implementation were able to make use of continuous-valued mapping functions from the var-ious internal variables to the postural and proxemic outputs.For example,a continuous element such as the sociofugal/sociopetal axis is easily mapped to a derived measure of interaction desire(see Section4). However,there were a number of instances in which it was necessary to select between discrete output states.This was due to the high-level motion control interface provided to us,which simpliﬁed the basic processes of generating motions on QRIO and added an additional layer of physical safety for the robot, such as prevention of falling down.On the other hand,this made some capabilities of QRIO’s under-lying motion architecture unavailable to us.Some expressive postures such as defensive arm crossing, for instance,did not lend themselves to continuous modulation as it could not be guaranteed via the ab-straction of the high-level interface that the robot’s limbs would not interfere with one another.In these cases a discrete decision function with a probabilistic component was typically used,based on the variables used to derive the continuous postural overlays and with its output allowed to replace those overlay values directly.

In addition to discrete arm postures,it was simi-larly necessary to create several explicit gestures with a motion editor in order to augment QRIO’s reper-toire,as the high level interface was not designed for parameterization of atomic gestures.(This was presumably a safety feature,as certain parameteriza-tions might aﬀect the robot’s stability and cause it to fall over,even when the robot is physically capable of executing the gesture over the majority of the pa-rameter space.)The gestures created were emblem-atic gestures in support of proxemic activity,such as a number of beckoning gestures using various combi-nations of one or both arms and theﬁngers.Similar probabilistic discrete decision functions were used to determine when these gestures should be selected to override continuous position overlays for QRIO to ad-just the proxemic state itself.

QRIO’s high-level walking interface exhibited a similar trade-oﬀin allowing easy generation of basic walking motions but correspondingly hiding access to underlying motion parameters that could otherwise have been used to produce more expressive postures and complex locomotion behaviors.Body language actions that QRIO is otherwise capable of perform-ing,such as standing with legs akimbo,or making ﬁnely-tuned proxemic adjustments by walking along speciﬁc curved paths,were thus not available to us at this time.We therefore implemented an algorithm that used the available locomotion features to gen-erate proxemic adjustments that appeared as natu-ral as possible,such as uninterruptible linear walking movements to appropriate positions.The display be-haviors implemented in this project should thus be viewed as a representative sample of a much larger behavior space that could be overlaid on QRIO’s mo-tor output at lower levels of motion control abstrac-tion.

7Conclusions and Future Work

This research presented the model of a behavioral overlay system and demonstrated that an implemen-tation of the model could successfully be used to modulate the expressiveness of behaviors designed without non-verbal communication in mind as well as those speciﬁcally created to non-verbally support verbal communication(e.g.dialog illustrators).De-spite limitations in the motion controller’s support for motion parameterization,a spectrum of display behaviors was exhibited.Future extensions to the motion controller could support increasedﬂexibility in the communicative abilities of the system without

21major alterations to the basic overlay implementa-tion.

The proxemic component of the behavioral overlay system emphasized that the spatial nature of non-verbal communication through the management of interpersonal distance is not only an important fea-ture of human-robot interaction that is largely yet to be explored,but can also be eﬀectively modulated by means of an overlay.The proxemics in this case were computed simply in terms of egocentric linear distances,so there remains plenty of potential for exploring alternative conceptualizations of proxemic overlays,such as forceﬁelds or deformable surfaces. In order to be truly proxemically communicative,fu-ture work is required in giving QRIO a better spatial understanding of its environment,not only including physical features such as walls and objects,but also a more detailed model of the spatial activities of the humans within it.

The overlay generating functions used in this project were eﬀective display generators by virtue of being hand-tuned to reﬂect the general body of research into human postures,gestures and interper-sonal spacing,with accommodations made for the dif-ferences in body size and motion capability of QRIO. As such,they represent a system that is generically applicable but inﬂexible to cultural variations and the like.The overlay generation components thus oﬀer an opportunity for future eﬀorts to incorporate learning and adaptation mechanisms into the robot’s communicative repertoire.In particular,it would be beneﬁcial to be able to extract more real-time feed-back from the human,especially in the case of the proxemics.For example,given the nature of current human-robot interaction,being able to robustly de-tect a human’s discomfort with the proxemics of the situation should be an important priority.

This work introduced signiﬁcant additional infras-tructure concerning QRIO’s reactions to its inter-nal states and emotions and to speciﬁc humans,in the form of the Attitude and Relationship structures. The Relationship support assists in providing an emo-tional and expressive component to QRIO’s memory, and the Attitude support oﬀers the beginning of an individualistic aspect to the robot.There is a great deal of potential for future work in this area,primar-ily in the management and maintenance of relation-ships once instantiated;and in“closing the loop”by allowing the human’s activity,as viewed by the robot through theﬁlter of the speciﬁc relationship,to have a feedback eﬀect on the robot’s emotional state.

Finally,returning toﬁrst principles,this paper sought to argue that since it is well supported that non-verbal communication is an important mode of human interaction,it will also be a useful component of interaction between humans and humanoid robots, and that behavioral overlays will be an adequate and scalable method of facilitating this in the long term. Robust HRI studies are now necessary to conﬁrm or reject this conjecture,so that future design work can proceed in the most eﬀective directions.

8Acknowledgements

The work described in this document was performed at Sony Intelligence Dynamics Laboratories(SIDL) during a summer research internship program.The authors wish to gratefully acknowledge the generous support and assistance provided by Yukiko Hoshino, Kuniaki Noda,Kazumi Aoyama,Hideki Shimomura, the rest of the staﬀand management of SIDL,and the other summer internship students,in the realization of this research.

References

Althaus,P.,Ishiguro,H.,Kanda,T.,Miyashita, T.and Christensen,H.I.:2004,Navigation for human-robot interaction tasks,Proc.Interna-tional Conference on Robotics and Automation, Vol.2,pp.14–1900.

Aoyama,K.and Shimomura,H.:2005,Real world speech interaction with a humanoid robot on

a layered robot behavior control architecture,

Proc.International Conference on Robotics and Automation.

Bailenson,J.N.,Blascovich,J.,Beall, A.C.and Loomis,J.M.:2003,Interpersonal distance in immersive virtual environments,Personality and Social Psychology Bulletin29,819–833. Beskow,J.and McGlashan,S.:1997,OLGA—a conversational agent with gestures,Proc.IJCAI-97Workshop on Animated Interface Agents: Making Them Intelligent.

Bethel, C.L.and Murphy,R.R.:2006,Aﬀective expression in appearance-constrained robots, Proceedings of the2006ACM Conference on Human-Robot Interaction(HRI2006),Salt Lake City,Utah,pp.327–328.

22Blankenship,V.,Hnat,S.M.,Hess,T.G.and Brown,

D.R.:2004,Reciprocal interaction and similar-

ity of personality attributes,Journal of Social and Personal Relationships1,415–432.

Bowlby,J.:1969,Attachment&Loss Volume1:At-tachment,Basic Books.

Breazeal,C.:2000,Sociable Machines:Expressive Social Exchange Between Humans and Robots, PhD thesis,Massachusetts Institute of Technol-ogy.

Breazeal,C.:2003,Towards sociable robots,Robotics and Autonomous Systems42(3–4),167–175. Breazeal,C.,Brooks,A.G.,Gray,J.,Hoﬀman,G., Kidd,C.,Lee,H.,Lieberman,J.,Lockerd,A.

and Chilongo,D.:2004,Tutelage and collabora-tion for humanoid robots,International Journal of Humanoid Robots1(2),315–348.

Brooks, A.G.,Berlin,M.,Gray,J.and Breazeal,

C.:2005,Untethered robotic play for repetitive

physical tasks,Proc.ACM International Confer-ence on Advances in Computer Entertainment, Valencia,Spain.

Byrne, D.and Griﬃt,W.:1969,Similarity and awareness of similarity of personality character-istic determinants of attraction,Journal of Ex-perimental Research in Personality3,179–186. Cassell,J.and Vilhjalmsson,H.:1999,Fully embod-ied conversational avatars:Making communica-tive behaviors autonomous,Autonomous Agents and Multiagent Systems2,45–.

Chi, D.,Costa,M.,Zhao,L.and Badler,N.: 2000,The EMOTE model for eﬀort and shape, Proc.27th Annual Conf.on Computer Graphics and Interactive Techniques(SIGGRAPH’00), pp.173–182.

Christensen,H.I.and Pacchierotti,E.:2005,Embod-ied social interaction for robots,in Dautenhahn, K.(ed.),Proceedings of the2005Convention of the Society for the Study of Artiﬁcial Intelligence and Simulation of Behaviour(AISB-05),Hert-fordshire,England,pp.40–45.

Davies,M.and Stone,T.:1995,Mental Simulation, Blackwell Publishers,Oxford,UK.Dittmann, A.:1978,The role of body move-ment in communication,Nonverbal Behavior and Communication,Lawrence Erlbaum Asso-ciates,Hillsdale.

Ekman,P.and Davidson,R.J.:1994,The Nature of Emotion,Oxford University Press.

Fong,T.,Nourbakhsh,I.and Dautenhahn,K.:2003,

A survey of socially interactive robots,Robotics

and Autonomous Systems42,143–166.

Fridlund,A.:1994,Human Facial Expression:An Evolutionary View,Academic Press,San Diego, CA.

Fujita,M.,Kuroki,Y.,Ishida,T.and Doi,T.:2003, Autonomous behavior control architecture of en-tertainment humanoid robot SDR-4X,Proc.In-ternational Conference on Intelligent Robots and Systems,Las Vegas,NV,pp.960–967. Giese,M.A.and Poggio,T.:2000,Morphable models for the analysis and synthesis of complex motion pattern,International Journal of Computer Vi-sion38(1),59–73.

Gordon,R.:1986,Folk psychology as simulation, Mind and Language1,158–171.

Guye-Vuilleme,A.,Capin,T.K.,Pandzic,I.S.,Thal-mann,N.M.and Thalmann, D.:1998,Non-verbal communication interface for collaborative virtual environments,Proc.Collaborative Vir-tual Environments(CVE98),pp.105–112. Hall,E.T.:1966,The Hidden Dimension,Doubleday, Garden City,NY.

Hara, F.,Akazawa,H.and Kobayashi,H.:2001, Realistic facial expressions by SMA driven face robot,Proceedings of IEEE International Work-shop on Robot and Human Interactive Commu-nication,pp.504–510.

Heal,J.:2003,Understanding Other Minds from the Inside,Cambridge University Press,Cambridge, UK,pp.28–44.

Jordan,P.W.:2000,Designing Pleasurable Products: an Introduction to the New Human Factors,Tay-lor&Francis Books Ltd.,UK.

Kanda,T.,Hirano,T.,Eaton,D.and Ishiguro,H.: 2004,Interactive robots as social partners and peer tutors for children,Human-Computer In-teraction19,61–84.

23Kiesler,D.J.:1983,The1982interpersonal circle:A taxonomy for complementarity in human trans-actions,Psychological Review90,185–214.

Knapp,M.:1972,Nonverbal Communication in Hu-man Interaction,Reinhart and Winston,Inc., New York.

Kopp,S.and Wachsmuth,I.:2000,A knowledge-based approach for lifelike gesture animation, Proc.ECAI-2000.

Kopp,S.and Wachsmuth,I.:2002,Model-based ani-mation of coverbal gesture,Proc.Computer An-imation.

Likhachev,M.and Arkin,R.C.:2000,Robotic com-fort zones,Proceedings of SPIE:Sensor Fusion and Decentralized Control in Robotic Systems III Conference,Vol.4196,pp.27–41.

Machotka,P.and Spiegel,J.:1982,The Articulate Body,Irvington.

McCrae,R.R.and Costa,P.T.:1996,Toward a new generation of personality theories:Theoret-ical contexts for theﬁve-factor model,in Wig-gins,J.S.(ed.),Five-Factor Model of Personal-ity,Guilford,New York,pp.51–87. Nakauchi,Y.and Simmons,R.:2000,A social robot that stands in line,Proc.International Confer-ence on Intelligent Robots and Systems,Vol.1, pp.357–3.

Nass, C.and Lee,K.M.:2001,Does computer-generated speech manifest personality?experi-mental test of recognition,similarity-attraction, and consistence-attraction.,Journal of Experi-mental Psychology,Applied7(3),171–181. Orford,J.:1986,The rules of interpersonal com-plementarity:Does hostility beget hostility and dominance,submission?,Psychological Review 93,365–377.

Pacchierotti,E.,Christensen,H.I.and Jensfelt,P.: 2005,Human-robot embodied interaction in hallway settings:A pilot user study,Proceed-ings of the2005IEEE International Workshop on Robots and Human Interactive Communica-tion,pp.1–171.

Peters,R.A.II,Campbell,C.C.,Bluethmann,W.J.

and Huber, E.:2003,Robonaut task learn-ing through teleoperation,Proc.International

Conference on Robotics and Automation,Taipei, Taiwan,pp.2806–2811.

Reeves,B.and Nass,C.:1996,The Media Equation: How People Treat Computers,Television,and New Media Like Real People and Places,Cam-bridge University Press,Cambridge,England.

Rose, C.,Cohen,M.and Bodenheimer, B.:1998, Verbs and adverbs:Multidimensional motion in-terpolation,IEEE Computer Graphics and Ap-plications18(5),32–40.

Sawada,T.,Takagi,T.,Hoshino,Y.and Fujita,M.: 2004,Learning behavior selection through inter-action based on emotionally grounded symbol concept,Proc.IEEE-RAS/RSJ Int’l Conf.on Humanoid Robots(Humanoids’04).

Smith,C.:2005,Behavior adaptation for a socially interactive robot,Master’s thesis,KTH Royal In-stitute of Technology,Stockholm,Sweden.

te Boekhorst,R.,Walters,M.,Koay,K.L.,Dauten-hahn,K.and Nehaniv, C.:2005,A study of

a single robot interacting with groups of chil-

dren in a rotation game scenario,Proc.6th IEEE International Symposium on Computational In-telligence in Robotics and Automation(CIRA 2005),Espoo,Finland.

Thorisson,K.R.:1996,Communicative Humanoids:

A Computational Model of Psychosocial Dia-

logue Skills,PhD thesis,MIT Media Laboratory, Cambridge,MA.

Walters,M.L.,Dautenhahn,K.,Koay,K.L.,Kaouri,

C.,te Boekhorst,R.,Nehaniv,C.L.,Werry,I.

and Lee, D.:2005a,Close encounters:Spa-tial distances between people and a robot of mechanistic appearance,Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots(Humanoids’05),Tsukuba, Japan,pp.450–455.

Walters,M.L.,Dautenhahn,K.,te Boekhorst,R., Koay,K.L.,Kaouri, C.,Woods,S.,Nehaniv,

C.L.,Lee,

D.and Werry,I.:2005b,The inﬂuence

of subjects’personality traits on personal spa-tial zones in a human-robot interaction experi-ment,Proceedings of IEEE Ro-Man2005,14th Annual Workshop on Robot and Human Inter-active Communication,IEEE Press,Nashville, Tennessee,pp.347–352.

Weitz,S.:1974,Nonverbal Communication:Read-ings With Commentary,Oxford University

Press.

Yamasaki,N.and Anzai,Y.:1996,Active inter-face for human-robot interaction,Proc.Interna-

tional Conference on Robotics and Automation,

pp.3103–3109.

Yan,C.,Peng,W.,Lee,K.M.and Jin,S.:2004,Can robots have personality?An empirical study of

personality manifestation,social responses,and

social presence in human-robot interaction,54th

Annual Conference of the International Commu-

nication Association.

Zecca,M.,Roccella,S.,Carrozza,M.C.,Cappiello,

G.,Cabibihan,J.-J.,Dario,P.,Takanobu,H.,