VOYSIS envisions a world where people communicate with machines by voice in a truly interactive way, as easily as they communicate with humans.
This future is now in our grasp.
VOYSIS deep learning technology enables machines to learn voices and languages fully automatically for the first time.
This now opens the door for machines and devices to have virtually unlimited voices, languages, dialects, accents, and even the subtleties of human voice: character, gender, age and personality.
Now we're talking.
Current voice generation technology requires large amounts of very carefully crafted data in order to produce a synthetic voice. The manual process of obtaining and preparing appropriate data often takes teams of people years, just for a single language.
At VOYSIS we enable computers to learn voices automatically.
We don’t need lots of laboriously crafted datasets; instead we focus on utilizing machine learning to create a broader range of voices and languages with a consistent quality.
Specifically, the VOYSIS engine learns from raw datasets of people reading text aloud. Audiobooks are a good example of the type of data we use, where we have both audio and the corresponding book text. From these datasets the VOYSIS engine automatically learns to speak.
VOYSIS envisions a world where people easily communicate with machines by voice in a truly interactive way, almost as easily as they communicate with humans. The traditional barriers of technical skills, literacy, and even language are greatly lowered when we interact by voice.
Interface design is all about real communication: not just one-way digital instruction, but two-way intelligent interactive communication. Smart devices are now everywhere, and even old machines and appliances are seeing a new lease of “smart life”. All machines are getting exponentially more powerful, more intelligent, cheaper, smaller, identified, located, connected, and integrated. The world is becoming truly networked.
Despite decades of investment, the standard voice industry approach of finely tuned manual expert systems has resulted in coverage for less than 2% of the world's languages.
VOYSIS deep learning technology overcomes these limitations, enabling the world to communicate with machines by voice in a truly interactive way.
Dr. Peter Cahill, who has worked in text-to-speech for the past 12 years, leads the team at VOYSIS. During this time he was part of a group of scientists that attracted a total of $117M funding for the CNGL/ADAPT projects.
Dr. Cahill is an active member of the text-to-speech research community where he chairs SynSIG, the global speech synthesis special interest group.
Peter’s specific expertise is in text-to-speech engines that learn language automatically from human voice recordings.
His work has resulted in the technology at the core of the VOYSIS engine which enables the system to learn new voices and languages fully automatically from hearing real people speaking.
The VOYSIS team is primarily composed of speech scientists and software engineers. Everyone on the team can speak several languages in addition to their major skillsets, which primarily include machine learning and software development.
If you are an exceptional speech scientist, machine learning researcher, or software engineer, and want to be part of a world-class team working on exciting ground-breaking technology in an inspiring and collaborative environment then please get in touch.
Use the form below to get in touch.