A distinguished American critic who had his vocal chords removed during surgery for thyroid cancer has been given his natural voice back following the intervention of a Scottish company that specialises in synthesising speech.
Roger Ebert - the first film critic to win the Pulitzer Prize - appeared on the Oprah Winfrey Show on Tuesday where he was able to deliver his Oscar tips and give an emotional account of how he regained his voice.
“It still needs improvement, but at least it still sounds like me,” said Mr Ebert, who uses a keyboard and laptop to voice his words. “In first grade, they said I talk too much. And now I still can.”
After a series of operations in 2006, Mr Ebert’s face-to-face communication was at first restricted to hastily scribbled notes and rudimentary sign language. Then he began using off -the-shelf text-to-speech computer packages that enable users to speak in a computerised version of standard English.
These made him sound like “Robby the Robert” said Mr Ebert, and in a blog entry last August, he complained: “Eloquence and intonation are impossible. I dream of hearing a voice something like my own.”
During an internet search, he came across CereProc a spin-out company from Edinburgh University that specialises in synthesising “natural” voices. He trialed samples of former President George W Bush and Arnold Schwarzenegger, the governor of California, and was amazed. “Their Dubya and Arnold wouldn’t fool their wives, but you can certainly tell it’s them," he said.
When he contacted the company - which occupies a single small office within the university's school of informatics - CereProc told him they could clone his voice if he could provide good quality tapes. Mr Ebert had recorded commentaries for classic films such as Citizen Kane and Casablanca and was able to provide these pure audio tracks. The result is the company's first historical recreation of a human voice.
“Roger doesn't want to sound like everyone - he wants to sound like Roger,” said Matthew Aylett, the company’s chief technical officer. “We took the audio commentaries but it was much more challenging than the normal process when we would control the recording environment."
For its business clients, CereProc records five hours of material to compile a library of hundreds of thousands of sounds. Common phrases are reproduced verbatim, but more complex sentences are blended from natural sounds on the database.
The same techniques have been used to recreate Mr Ebert’s voice and he already has three hours of sounds on his computer. CereProc say they will double that amount and could even include the sound of Mr Ebert’s laughing, sighing or screaming.
In commercial applications, such as an avatar devised for the Scottish Qualifications Authority website, the software has to overcome difficulties with words such as “permit”, in which intonation changes the meaning between a noun and verb.
Mr Ebert finds it easier to overcome these problems himself, said Dr Aylett. “He can type, listen and modify. He can control emphasis and pitch. He can modify the way he speaks, when he speaks. We want to give people more control so they can use the synthesis like a musical instrument.”
You can find this article, in abbreviated form, at the timesonline website. Readers may be interested to know that after a redsign of the Times pages, most news stories in the paper and online will be considerably shorter than before, around 500 words, rather than 650-900. A shame in my view, but I am merely the monkey, not the organ-grinder.