Voice Enhancement
EasyWatch™

Technologies  / Voice Enhancement  / EasyWatch™

 

 

Understand fast talkers on TV, movies, and streaming video as never before

 

 

 
 
 
 
 
 

 

 

 

Introduction

 

Do you like watching Movies, TV shows, or Debates?

Have you ever felt that you were missing some of the dialogue because people were speaking too quickly?

EasyWatch™ is a patent-pending, real-time technology that dynamically slows down speech segments in TV sound, making fast speech easier to understand. Video is slowed in parallel, ensuring synchronization of audio and video streams.

EasyWatch™ improves intelligibility of speech for all viewers and especially those struggling to catch fast dialogue, foreign language, and heavily accented speech.

We invite you to experience EasyWatch for yourself by downloading and installing the EasyWatch Player, a hands-on technology demo that enables EasyWatch processing on your own video files.

 

 

Download EasyWatch Player (for Microsoft Windows)

 
 
         
 

 

 

The EasyWatch Player allows you to experience and test EasyWatch for yourself on any MP4 video file.

(Microsoft Windows is required to install and run the EasyWatch Player.)

 

USER GUIDE (PDF):

Download EasyWatch Player User Guide

   
         

 

 

 

About EasyWatch™

 

Driven by sophisticated speech analysis, EasyWatch™ is a real-time audio/video processing technology that dynamically slows down fast speech and accompanying video to make fast talkers more understandable.

Sometimes, people on TV speak so quickly that it’s difficult to understand what’s being said. The fast dialog undermines comprehension and enjoyment of our favorite programs. This is further complicated with the fact that different shows or movies have distinct acoustic profiles or are in a second (foreign) language which results in varying intelligibility. With so many variables, it befits TV manufacturers to offer features allowing TV viewers to personalize the audio-video to enhance understanding of what they hear and watch.

EasyWatch™ improves intelligibility of speech for all viewers, and particularly those with disadvantaged auditory or cognitive function. Problems of understanding speech occur more frequently while viewing movies and talk shows versus newscasts and documentaries which place a priority on speech cadence and annunciation. Hence, EasyWatch may be especially beneficial for comprehending movies and talk shows. It is language independent and works equally well with Western and Eastern tonal languages.
Synchronization of audio/video

In today’s systems the speed of video playback defines the speed of the audio. In other words, video is the master (see Figure 1 below). Almost all video players and online video services allow the user to change the video playback speed relative to the original video speed. The audio speed is modified according to the video playback speed.

In a system where video is the master, the audio content is not analyzed. Silence, noise, and music are treated with equal merit to speech. To improve intelligibility, we need only slow down the speech segments. Slowing down of irrelevant sounds unnecessarily increases the duration and may even produce artifacts on music and other non-speech segments. Humans are more sensitive to audio artifacts than video artifacts. Speed change with audio as slave will exacerbate distortions and degrade intelligibility.

 

SYNCHRONIZATION OF AUDIO/VIDEO

In today’s systems the speed of video playback defines the speed of the audio. In other words, video is the master (see Figure 1 below). Almost all video players and online video services allow the user to change the video playback speed relative to the original video speed. The audio speed is modified according to the video playback speed.

In a system where video is the master, the audio content is not analyzed. Silence, noise, and music are treated with equal merit to speech. To improve intelligibility, we need only slow down the speech segments. Slowing down of irrelevant sounds unnecessarily increases the duration and may even produce artifacts on music and other non-speech segments. Humans are more sensitive to audio artifacts than video artifacts. Speed change with audio as slave will exacerbate distortions and degrade intelligibility.

 
 
 
Figure 1
 
 

With EasyWatch the audio is the master (see Figure 2). Audio and the corresponding video speed can be controlled according to the audio content. The audio and video streams are split, the audio is analyzed and adjusted, the video (the slave) is correspondingly adjusted, and then the streams are recombined into the modified audio/video stream. This allows for flexible speed modifications:

  1. Only segments of the audio/video stream containing voice can be slowed.

  2. Non-voice segments of the video can be sped while voice can remain unaltered for instances when one desires to speed up a program without compromising dialog clarity.
  3. Voice segments of the audio/video can be slowed while non-voice segments can be sped up to compensate for the delay accumulated during the voice segments. This enables improved intelligibility of real time video streaming without accumulating large delays.

Furthermore, the algorithm will only slow-down audio and voice segments where there is no danger of producing artifacts.

 
 
 
Figure 2
 
 

ADJUSTING AUDIO SPEED

The Audio Speed Changer block is shown in Figure 2. This block is expanded to show its constituent parts in Figure 3: Audio Content Analyzer, Audio Speed Controller, and Time-scale Audio Processor.

The purpose of Audio Content Analyzer is to detect and identify time segments corresponding to different types of the audio signal or their mixture (e.g., silence, noise, music, voice). This classification may be extended. For example, voice segments can be subdivided on monologs and dialogs. New segment types such as “sound effect” may be introduced. And a special “unknown” type can be used to designate segments that cannot be clearly identified and labelled. The Audio Content Analyzer receives the pure original audio signal extracted from the video and produces time stamps designating the beginning, ending and type of the segment.

The Audio Speed Controller determines what time scale modification shall be performed by the Audio Speed Changer block. For example, silence or noise segments can be sped up (shortened), voice segments slowed down (stretched), and music segments preserved in their duration. The decision may depend on the user preference, type of application, and the accumulated delay produced by the Audio Speed Changer block.

The purpose of the Time-scale Audio Processor block is to speed or slow an input audio signal without affecting the frequency content, such as the perceived pitch of any tonal components. For example, the output of speech should sound like the speaker is talking at a slower or faster pace, without distortion of the spoken vowels.

 
 
 
Figure 3
 

 

 
Video Examples
 
Example 1
 

 

UNPROCESSED

 
EasyWatch OUTPUT: SLOWED 25%

     
EasyWatch OUTPUT: SLOWED 50%

   

 

 
Example 2
 

 

UNPROCESSED

 
EasyWatch OUTPUT: SLOWED 25%

     
EasyWatch OUTPUT: SLOWED 50%

   

 

 
Example 3
 

 

UNPROCESSED

 
EasyWatch OUTPUT: SLOWED 25%

     
EasyWatch OUTPUT: SLOWED 50%

   
 

 

 
 

Applications

 
  • Televisions
  • TV set top boxes
  • Mobile phones
  • Video decoder chips
  • Video streaming services
 

 

Top ▲