Automated Speech Recognition

Automatic Speech Recognition or ASR, as it’s known in short, is the technology that allows human beings to use their voices to speak with a computer interface in a way that, in its most sophisticated variations, resembles normal human conversation.

Automated Speech Recognition

Conversation Analyzer

In terms of technological development, we may still be at least a couple of decades away from having truly autonomous, intelligent artificial intelligence systems communicating with us in a genuinely “human-like” way. Automated speech recognition (ASR) is a technology that allows users of information systems to speak entries rather than punching numbers on a keypad. ASR is used primarily to provide information and to forward telephone calls.

The most advanced version of currently developed ASR technologies revolves around what is called Natural Language Processing, or NLP in short. This variant of ASR comes the closest to allowing real conversation between people and machine intelligence and though it still has a long way to go before reaching an apex of development, we’re already seeing some remarkable results in the form of intelligent smart phone interfaces like the Siri program on the iPhone and other systems used in business and advanced technology contexts. Sophisticated ASR systems allow the user to enter direct queries or responses, such as a request for driving directions or the telephone number of a hotel in a particular town. This shortens the menu navigation process by reducing the number of decision points. It also reduces the number of instructions that the user must receive and comprehend.

In recent years, ASR has become popular in the customer service departments of large corporations. It is also used by some government agencies and other organizations. Basic ASR systems recognize single-word entries such as yes-or-no responses and spoken numerals. This makes it possible for people to work their way through automated menus without having to enter dozens of numerals manually with no tolerance for error. In a manual-entry situation, a customer might hit the wrong key after having entered 20 or 30 numerals at intervals previously in the menu, and give up rather than call again and start over. ASR virtually eliminates this problem.

Predict Customer Headaches

Analysis & reporting is no longer a resource for hindsight, the Conversation Analyzer monitors every contact that your customer is having with your business and assesses their state of mind during these interactions to not only highlight but predict potential problems, so that your agents can effectively resolve queries before they become complaints.

Monitor Every Customer Interaction

Currently, your Training Managers and QA teams are actively listening to a random cross-section of calls and interactions, but what if they could listen to every conversation, be it via chat, phone or email?

Powered by AI, the Conversation Analyzer collects a full picture of each customer across all contact channels, alerts are triggered by positive or negative emotional states, keywords or behaviors and provides both micro and macro live reporting so the on shift manager can efficiently oversee the department as whole, and drill down into specific conversations.

Combat Sales Objections

Armed with all of this live knowledge and emotional analysis, your sales team have a much stronger chance of combating sales objections having understood the customers’ journey prior to connecting the call. For example, knowing that this particular customer has had problems with excessive downtime in the past will enable the sales agent to diffuse this concern before the customer has even mentioned it, giving the customer a feeling of being understood and his needs cared for.

How Automatic Speech Recognition Works

The basic sequence of events that makes any Automatic Speech Recognition software, regardless of its sophistication, pick up and break down your words for analysis and response goes as follows:

  • You speak to the software via an audio feed
  • The device you’re speaking to creates a wave file of your words
  • The wave file is cleaned by removing background noise and normalizing volume
  • The resulting filtered wave form is then broken down into what are called phonemes. (Phonemes are the basic building block sounds of language and words. English has 44 of them, consisting of sound blocks such as “wh”, “th”, “ka” and “t”.
  • Each phoneme is like a chain link and by analyzing them in sequence, starting from the first phoneme, the ASR software uses statistical probability analysis to deduce whole words and then from there, complete sentences
  • Your ASR, now having “understood” your words, can respond to you in a meaningful way.


To receive our newsletter please complete the form below. We take your privacy seriously and we will not share your information with others. You can unsubscribe at any time.