4 years ago

rSpeak Technologies provides embedded, server and cloud-based high quality text-to-speech applications with different footprint levels for businesses, developers, OEMs and partners. rSpeak Technologies provides developers with the tools to integrate fluent, natural text-to-speech voices and increase accessibility for all users. It provides a unique end-to-end customer feedback loop for pronunciation improvements through it online platform, that none of the current providers of TTS can offer.

In their freshly released, 2016 Text-to-Speech Accuracy Testing report, rSpeak Technologies’ text to speech engine has been recognized by the independent speech industry observer ASR News as the most accurate text-to-speech engine available on the market. Among the 12 TTS products tested, representing the text to speech vendors’ latest available releases, rSpeak Technologies was the winner with an outstanding overall accuracy rate of 98.6 (out of 100).

The research and development efforts of rSpeak result in advanced expertise in deep neural networks, deep learning for speech analysis, prosody models and pronunciation modelling. In the most recent R&D projects, rSpeak has been investigating the implementation of Statistical Parametric Speech Synthesis (SPSS) using deep learning technology to develop new voices.

The main advantages of using SPSS are twofold. Firstly, it is much easier to develop several new voices because the recording and segmentation of the recordings takes much less time. Secondly, the acoustic parameters such as pitch, duration, and spectrum can be adapted to fit specific applications. The presentation will describe preliminary results regarding the creation of personalized voices with SPSS. As a side effect, we will make advances in better part-of-speech tagging and prosody prediction using deep learning methods.

At a business level, this revolutionary development can potentially deliver very high-end custom voices at a fraction of the costs and recording time from today’s synthetic voice development, resulting in a proliferation of custom voice for man-machine interfaces and branding. At this point in time, there is only a select number of companies worldwide using deep learning technology in the development of their speech technology.