| Project Info | ||||
|
Project Overview
|
|
|||
| Project Info | ||||
| Participants | ||||
| Contacts | ||||
| Publications | ||||
| Internal access | ||||
| Comments | ||||
| Links | ||||
| Statistics | ||||
| Home | ||||
|
Innovation
|
||||
|
Expected results Speech translation technologies: PF-STAR will target the improvement of current baselines, and the comparison across different application scenarios of different approaches to help define new research directions and specific target applications for each approach. Technologies for emotions: prosodic and other linguistic cues. We will provide: baseline results for different parameters; recommendations for where to put more intensive research (classification technology, prosodic features, linguistic features, and units to be classified) based on results from realistic data rather than predefined sentences; a classification of the different emotion classes which will be tuneable according to a cost function for different system reactions so that the overall system performance, rather than the pure recognition rate, can be optimised; assessment about the interplay of different linguistic parameters in synthesis. We also expect preliminary results w.r.t. the quality of the prosodic feature extraction algorithms on children's speech. Technologies for emotions: synthetic faces. We aim at the definition and assessment of a technological baseline for believable virtual agents in the form of talking heads, which produce speech and communicate emotions using audiovisual speech synthesis. Relatively small but varied database of audiovisual emotional speech in dialogue situations in the target languages Italian and Swedish will be collected and annotated. Speech technologies for children: The literature suggests that error rates for children are at least 100% greater than for adults, even for matched training. We will develop baselines for the involved languages (English, Italian, Swedish and German) with the aim of obtaining: a significant increase in recognition rate, by using 'matched training' (models trained on children) rather than non-matched training (models trained on adults), age-dependent training, and robust methods for spontaneous speech, etc.; an understanding of the extent of inter-speaker variability (this is expected to be much greater for children) and of intra-speaker variability with reference to adults; an assessment of the importance of children-specific pronunciation dictionaries and children-specific language models.
|
||||