Project Info
Project Overview
 


Objectives

  • One of the priorities of IST in FP6 is to build on the European research community to lead the next generation of technology and applications by making them more user and people centred. The goal of this project is to contribute to advance research, and lay the foundations for future efforts on the topic of Multilingual and Multisensorial Communication (MMC).

  • PF-STAR addresses the challenging goals of providing advanced technological baselines, (comparative) evaluations and assessment of prospects for three key technological areas: speech-to-speech translation (STST), the detection and expressions of emotional states, and core speech technologies for children.
    The project builds on years of research already under way under various national and international research projects, most notably NESPOLE!, C-STAR, Verbmobil, SmartKom.
    The languages considered will be English, German, Italian, Spanish and Swedish.
  • PF-STAR results will consist of technological baselines, assessed with respect to both observed performances and future prospects, and linguistic databases. No demonstrations or showcases are planned and no particular scenario will be addressed; these objectives are already pursed by a number of other projects such as NESPOLE!, C-STAR, SPEECON.
Project Info
Participants
Contacts
Publications
Internal access
Comments
Links
Statistics
Home
 

Innovation

  • Technologies and language resources for Multilingual and Multisensorial Communication have been the subject of many research projects funded under FP5 by the IST Programme. Among them we may recall: Coretex, FAME, LC-STAR, NESPOLE!, SPEECON.
    We must improve on, refine, stabilise, and align current achievements to turn them into true technological baselines, which can be delivered to the European research and development community along with careful assessments and evaluations. This is a pioneering activity, since it is designed to pave the way for future research efforts within FP6, by providing them with the necessary information and technological starting points.

  • The contribution of PF-STAR is as follows: by bringing together many relevant actors in the field, we intend to provide the European R&D community with the technological baselines for future research and development efforts, with a strong focus towards achieving the common goal of bringing a solution to Multilingual and Multisensorial Communication, an objective of paramount importance in a multilingual environment such as Europe.

Expected results

Speech translation technologies: PF-STAR will target the improvement of current baselines, and the comparison across different application scenarios of different approaches to help define new research directions and specific target applications for each approach.

Technologies for emotions: prosodic and other linguistic cues. We will provide: baseline results for different parameters; recommendations for where to put more intensive research (classification technology, prosodic features, linguistic features, and units to be classified) based on results from realistic data rather than predefined sentences; a classification of the different emotion classes which will be tuneable according to a cost function for different system reactions so that the overall system performance, rather than the pure recognition rate, can be optimised; assessment about the interplay of different linguistic parameters in synthesis. We also expect preliminary results w.r.t. the quality of the prosodic feature extraction algorithms on children's speech.

Technologies for emotions: synthetic faces. We aim at the definition and assessment of a technological baseline for believable virtual agents in the form of talking heads, which produce speech and communicate emotions using audiovisual speech synthesis. Relatively small but varied database of audiovisual emotional speech in dialogue situations in the target languages Italian and Swedish will be collected and annotated.

Speech technologies for children: The literature suggests that error rates for children are at least 100% greater than for adults, even for matched training. We will develop baselines for the involved languages (English, Italian, Swedish and German) with the aim of obtaining: a significant increase in recognition rate, by using 'matched training' (models trained on children) rather than non-matched training (models trained on adults), age-dependent training, and robust methods for spontaneous speech, etc.; an understanding of the extent of inter-speaker variability (this is expected to be much greater for children) and of intra-speaker variability with reference to adults; an assessment of the importance of children-specific pronunciation dictionaries and children-specific language models.