Today’s technological advancements have changed the way people communicate with their devices – and each other. Automatic Speech Recognition (ASR) – when it works – is an effortless means of interfacing with devices of all types. However, laptop and desktop users alike often find themselves in environments which are not especially hospitable to ASR performance. While the home and office settings have relatively stationary noise profiles, they are often less-than-perfect environments for ASR; public spaces offer entirely new noise challenges that make it nearly impossible for ASR engines to function. That’s why Waves created MaxxSpeech, a truly pioneering ASR performance enhancement solution. MaxxSpeech is a suite of advanced technologies that improve the performance of Automatic Speech Recognition applications, for flawless hands-free voice-controlled communication between users and their devices. Comprised of three different noise reduction processors, each of which addresses a specific ASR challenge, MaxxSpeech increases command acceptance and significantly reduces word error rates, so users can communicate with their devices as naturally as talking with a friend.
MaxxSpeech is powered by three primary technologies:
Waves MaxxEC Stereo echo canceller overcomes one of the most challenging scenarios for ASR: When multimedia is being played back by a device’s internal loudspeakers, which are usually located much closer to the microphones than the user. Unlike monophonic echo cancellers which are designed for voice calls, MaxxEC Stereo was designed specifically for ASR, eliminating intrusive sound from stereophonic media content like music, movies and games. MaxxEC Stereo effectively cancels out any and all sounds produced from the computer’s speakers. MaxxEC Stereo can handle true stereo leaks at fast convergence rates and deep cancellation levels. Truly adaptive, MaxxEC Stereo makes continuous real-time adjustments in response to the media being played, as well as changes in the user environment. With MaxxEC Stereo, users can listen to music, watch movies and play games, while simultaneously communicating with their devices and running commands in real-time via ASR.
In speech systems, “babble” is the commonly used term to describe the noise encountered when a crowd or a group of people are talking. The problem is, accurate speech recognition requires precise capture of a single voice. Noisy environments can (and do) cause a reduction in the efficiency of speech recognition engines. In fact, any background noise or interference will hamper your computer’s ability to identify the speech correctly and carry out the intended action or command. In many settings, the ASR software generally just does not work as well. Words will come up wrong, or the software will have difficulty hearing commands at all.
Waves DeBabble is a patent-pending diffused noise attenuator that leverages Waves’ renowned noise reduction and source localization technologies to provide a purer, more precise signal, so the ASR engine can accurately identify user commands. With a direction-sensitive filter optimized for automatic speech recognition performance, DeBabble detects and tracks the main speech source and then assumes all other speech-like sources are unwanted environmental babble. DeBabble operates in real-time by applying a combination of linear direction filter and non-linear directional filter, both controlled by the detected direction of speech, suppressing some of the frequencies in the unwanted noise and speech sources, so as to minimize their effect on the ASR engine’s performance. DeBabble’s directional filter has been trained on ASR engines for maximum Command Acceptance Rate and minimum Word Error Rate. DeBabble works like a dedicated compass that automatically knows where the user is.
To ensure accurate, effective Automatic Speech Recognition, MaxxSpeech incorporates Waves MaxxBeam microphone-array technology which uses two microphones to create a “beam,” then differentiates between signals inside the beam and signals outside of it. This allows MaxxBeam to suppress stationary and non-stationary noise from outside the beam, while providing optimal sound quality and clarity. MaxxBeam’s Real-time Configurable Beam Direction lets users instantly focus the array from narrow directional beams for single users to wider beams, allowing multiple users to take advantage of ASR capabilities when necessary. Perfect for consumer-grade microphones, MaxxBeam’s Automatic Microphone Calibration compensates for differences in microphone pairs, and is compatible with a wide variety of microphone types.
Jack Joseph Puig, eleven-time GRAMMY® award-winning producer/engineer (Lady Gaga, U2 and many others), realized that the sound he creates—using cutting-edge processors in a world-class studio, and drawing on his years of experience—often ends up on a smartphone with small, tinny speakers, or through inexpensive headphones. Listeners weren’t hearing the music at its best, so he decided to do something about it. He joined forces with Waves Audio to bring studio sound to listeners everywhere. Now, with MaxxAudio, the magic of the recording studio can be experienced on personal consumer devices.