Synthesizing multimodal utterances for conversational agents
read more
Citations
Hand and Mind: What Gestures Reveal about Thought
Towards a common framework for multimodal generation: the behavior markup language
A conversational agent as museum guide: design and evaluation of a real-world application
Towards a common framework for multimodal generation : The behavior markup language
SmartBody: behavior realization for embodied conversational agents
References
Hand and Mind: What Gestures Reveal about Thought
Embodied conversational agents
Hand and Mind: What Gestures Reveal about Thought
BEAT: the Behavior Expression Animation Toolkit
Related Papers (5)
Frequently Asked Questions (19)
Q2. What are the future works in "Synthesizing multimodal utterances for conversational agents" ?
Concerning future work, it appears natural to further exploit the flexibility and generality of their synthesis model for the automatic planning of multimodal utterances of a wide variety. The authors expect this to yield a coordinated accentuation, e. g. according to an underlying rhythmic pulse, and to include the timing of velocity peaks of single movement phases which can be taken into account in their approach. Furthermore, the gesture animation model will be further explored with respect to variations of the parameters, e. g. influencing the relationship between trajectory curvature and velocity.
Q3. What is the way to make a co-verbal gesture?
The animation of co-verbal gesture requires a high degree of control and flexibility with respect to shape and time properties while at the same time ensuring naturalness of movement.
Q4. How can a kinematic feedforward controller be created?
To ensure fluent, at least C1-continuous connection to the given boundary conditions, a kinematic feedforward controller cannot be created until the moment of activation of its LMP.
Q5. What is the effect of the akinematic model on the utterances?
The resulting synthetic utterances achieve cross-modal synchrony even at the syllable level while reproducing natural co-articulation and transition effects.
Q6. how many chunks of speech are considered to be produced?
The authors define chunks of speech–gesture production to be pairs of an intonation phrase and a co-expressive gesture phrase, i.e. complex utterances with multiple gestures are considered to consist of several chunks.
Q7. How do the authors expect this to be done?
The authors expect this to yield a coordinated accentuation, e.g. according to an underlying rhythmic pulse, and to include the timing of velocity peaks of single movement phases which can be taken into account in their approach.
Q8. What is the main problem of the synthesis of multimodal utterances?
An IncrementalModelofTheir approach to synthesizing multimodal utterances starts from straightforward descriptions of their desired outer form, which are supposed to be generated at higher levels of utterance planning and to be specified in MURML, an XML-based representation language.
Q9. What is the simplest way to create a natural flow of speech and gesture?
the gesture planner defines the expressive gesture phase in terms of movement constraints by selecting a lexicalized gesture template in MURML, allocating body parts, expanding abstract movements constraints and resolving deictic references (as described by Kopp and Wachsmuth7).
Q10. What is the kinematic relationship between amplitude and peak velocity?
Their approach to forming wrist trajectories relies on the well-known observation that complex arm movements consist of subsequently and ballistically performed segments with the following kinematic regularities of the effector trajectory:22* short targeted segments are straight or curvilinear(either C- or S-shaped) and always planar;* they exhibit a symmetrical bell-shaped velocityprofile;* a quasi-linear relation between amplitude and peakvelocity, as well as an approximate logarithmic relation between amplitude and movement duration, holds; * at any point except points of extreme bending, themovement speed can be estimated from the radius r of the trajectory by the ‘law of 23’: ¼ k r 1 3, where k is a constant velocity gain factor for each segment and assumed to be a parameter of motor control.
Q11. What is the role of the motor control layer of Max?
As described in the previous section, the motor control layer of Max is in charge of autonomously creating context-dependent gesture transitions.
Q12. What is the kinematic approach for the shoulder and wrist joint?
For the shoulder and wrist joint, the authors apply the approach by Wilhelms and Van Gelder18 to define the joint limits geometrically in terms of reach cones with varying twist limits.
Q13. Why does Cassell say that the problem of creating gesture animations has not been solved so?
Cassell10 states that the problem of creating gesture animations and synchronizing them with speech has not been solved so far, ‘due in part to the difficulty of reconciling the demands of graphics and speech synthesis software’ (p. 16).
Q14. What are the properties of embodied conversational agents?
Such agents are envisioned to have similar properties to humans in face-to-face communication, including the ability to generate simultaneous verbal and non-verbal behaviours.
Q15. Why is the problem of creating gesture animations and synchronizing them with speech difficult to predict?
This can be ascribed, first, to the lack of sufficient means of modulating, e.g. shrinking or stretching, single gesture phases4 and, secondly, to a behavior execution that runs ‘ballistical’, i.e. without the possibility to exert influence, in an animation system whose reliability is sometimes hard to predict.
Q16. how are the structures of overt gesture and speech in humans?
The inherent segmentation of speech–gesture production in humans is reflected in the hierarchical structures of overt gesture and speech and their cross-modal correspondences.
Q17. What is the state of the chunk?
If a chunk can be uttered, i.e. the preceding chunk is Subsiding (see below), the scheduler defines the intra-chunk synchronyas aforementioned and reconciles it with the onsets of the intonation and gesture phrases.
Q18. What is the simplest way to create a synchronized LMP?
To recombine the LMPs for a solution to the overall control problem, LMPs run concurrently and synchronized in an abstract motor control program (MCP) for each limb’s motion (see Figure 4).
Q19. What is the role of the motor layer in the lurking?
At this point, the motor layer is responsible for, first, planning on-the-fly upper-limb animations of the agent that exactly satisfy the given movement and timing constraints.