Skip navigation

Evaluation of singing synthesis: Methodology and case study with concatenative and performative systems

Evaluation of singing synthesis: Methodology and case study with concatenative and performative systems

Feugère, Lionel ORCID logoORCID: https://orcid.org/0000-0003-0883-5224, d'Alessandro, Christophe, Delalez, Samuel, Ardaillon, Luc and Roebel, Axel (2016) Evaluation of singing synthesis: Methodology and case study with concatenative and performative systems. In: Proceedings Interspeech 2016. International Speech Communication Association, San Francisco, pp. 1245-1249. (doi:10.21437/Interspeech.2016-1248)

[thumbnail of Publisher's PDF] PDF (Publisher's PDF)
23556 FEUGERE_Evaluation_of_Singing_Synthesis_2016.PDF - Published Version
Restricted to Registered users only

Download (227kB) | Request a copy

Abstract

The special session Singing Synthesis Challenge: Fill-In the Gap aims at comparative evaluation of singing synthesis systems. The task is to synthesize a new couplet for two popular songs. This paper address the methodology needed for quality assessment of singing synthesis systems and reports on a case study using 2 systems with a total of 6 different configurations. The two synthesis systems are: a concatenative Text- to-Chant (TTC) system, including a parametric representation of the melodic curve; a Singing Instrument (SI), allowing for real-time interpretation of utterances made of flat-pitch natural voice or diphone concatenated voice. Absolute Category Rating (ACR) and Paired Comparison (PC) tests are used. Natural and natural-degraded reference conditions are used for calibration of the ACR test. The MOS obtained using ACR shows that the TTC (resp. the SI) ranks below natural voice but above (resp. in between) degraded conditions. Then singing synthesis quality is judged better than auto-tuned or distorted natural voice in some cases. PC results show that: 1/ signal processing is an important quality issue, making the difference between sys- tems; 2/ diphone concatenation degrades the quality compared to flat-pitch natural voice; 3/ Automatic melodic modelling is preferred to gestural control for off-line synthesis.

Item Type: Conference Proceedings
Title of Proceedings: Proceedings Interspeech 2016
Additional Information: INTERSPEECH 2016 was held from September 8–12, 2016, San Francisco, USA.
Uncontrolled Keywords: singing synthesis, singing quality assessment, computer music
Subjects: M Music and Books on Music > MT Musical instruction and study
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > Natural Resources Institute
Faculty of Engineering & Science > Natural Resources Institute > Agriculture, Health & Environment Department
Last Modified: 21 Jul 2021 13:07
URI: http://gala.gre.ac.uk/id/eprint/23556

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics