RSS

Monday, January 4, 2010

Voice browsing technology is a rapidly-growing field

Voice browsing technology is a rapidly-growing field. Whether or not it proves to be the next internet, it deserves a careful examination in its present form.
Level One: Voice Browsing

The easiest way to understand voice browsing and the voice web is to imagine the the Web itself. People visit websites and receive visual information. The voice web has voice sites where the information is conveyed through speech. The most basic example is calling up an airline or financial portal, where speech recognition software gives the caller a series of options - buy or sell stocks, book airline tickets - and the caller interacts with a computer system. Financially, voice browsing makes great sense for companies, as it costs around a dollar a minute for human operators to interact with customers but only around 10 cents a minute for an automated system.

Corporate use of voice browsing is primarily used at banks and airlines. SpeechWorks, a voice application provider, has over 70 partners including United Airlines, Continental Airlines, AOL, FED-Ex, and Hewlett & Packard. Internationally, SpeechWorks partners with E-Plus in Germany, Singapore Telecom, and Credit Lyonais in France. "We haven't scratched the service of corporate adoption," promises Pete Settles, Media Manager of SpeechWorks.

Wireless Carriers, like QWEST, are building general portals, which will offer traffic updates, taxi services, pizza orders and movie reservations. In this field, BeVocal is the national leader with 10 million potential customers, as the exclusive voice browser for QWEST and Sprint. Worldwide, Phillips Speech Processing supports the largest voice portal, Italy's Omnitel mobile carrier. Phillips also supports voice portals for KG Telecom in Taipei - in Mandarin - and Cegetel, the second largest mobile provider in France.
Level Two: Voice Browsing the World Wide Web

The next level of technology is voice browsing websites that offer voice portals. In attempt to maintain customer loyalty, portals like myYahoo are offering voice browsing of personal content. AOL recently bought Quack.com in an attempt to claim this market too. Look for Excite and Netscape to become involved as they can not afford to lose customers to churn.

Europe has taken a more aggressive approach to the voice portal market. The United States, with over 40% of homes connecting to the web by PC's, has no immediate demand for web connection through phone. However, in Europe PC penetration rates are much lower and people are purchasing internet-ready phones at a rate of 5 to 1. Italy has only twenty percent PC penetration rates but over fifty percent wireless rates. If you are in Italy and you need to use the web, voice browsing is becoming the most common option. "The phone is the great social equalizer," says Kris Hopkins, CEO of NewFound, a company that offers voice browsing for search engines over wireless devices. "Everybody has a phone and everybody will have the ability to get on the Web." In Asia people are buying phones over PC's at a 9 to 1 rate.
Level Three: The Voice Web

The next big step in voice technology is the voice web - an entirely voice-based network of sites. To increase interest in voice browsing and speech recognition several companies have introduced forums for programmers to set up voice sites. These forums become voice webs, sprouting voice auction sites and voice based chat rooms. Nuance, BeVocal, and TellMe have all introduced forums. TellMe, builder of web-based voice applications for companies, pioneered the developers' forum with the TellMe Studio. At the studio programmers have their voice applications posted for free. Call up 1-800-555-TELL to voice-browse through the creations of 10,000 developers. BeVocal has its own active developer's program, the BeVocal Cafe, where third parties develop applications that BeVocal can host for their carrier customers (games, supposedly, are in big demand). In September Nuance, the premier speech recognition software manufacturer, waived its $495 membership fee for its developers network and has new members joining at a rate of 20 per day. Steve Elrich, spokesperson of Nuance, calls this, "The birth of the internet all over again." In the spirit of creating a voice community SpeechWorks has launched its Open Speech Web, with an open source voice browser based on the Linux model.
Voice Browsing Technology

The programming language responsible for connecting voicemail, live agents, and speech enabled sites is called VoiceXML, which was devised by Lucent, IBM, Motorola, and AT&T. Although there is one defacto programming language the software that converts text to speech and back varies. IBM has several patents on its natural language understanding engine which uses probability to guess what people mean if the words are unclear. SpeechWorks technology is based on research conducted at Massachusetts Institute of Technology. Phillips offers Speech Pearl with a 200,000 word vocabulary, and Speechwave offers text to speech in 35 languages. The leader in this field is Nuance, working with around 1,500 application companies and 1,500 platform companies.

For another programming option VocalPoint converts regular HTML text into a voice platform. "If you've got a HTML structure on your web site - you've got a platform for our voiceplatform," says Kurt Losert, their CEO. VocalPoint needs a day to set up and about 10 hours of programming. VoiceXML takes 15 days to optimize a web site and over a hundred and fifty hours of programming hours.

How well all this technology works depends who you are talking to. Yahoo-by-Voice claims to be a voice portal but it requires touch-tone responses. Much better are the technologies like IBM's natural voice understanding engine, which converts complete sentences and can even ask for missing information. SpeechWorks and United Airlines recorded a 97%-99% accuracy reading for their service. The most fun I had with voice browsing was calling Phillips Speech Processing center and talking to their browser which did not recognize the words, "Public Relations," "Media Relations," or "Press." Finally, I asked for the operator.
The Future of Voice Technology

The predicted growth of the speech technology market is phenomenal: possibly too much so. Cahners In-Stat Group predicts a $1.2 Billion voice portal market by 2005. The Kelsey Group estimates that by 2005 the voice browsing market will be worth $6.5 billion and generate $5 Billion in e-commerce. UK-based OVUM predicts a world market of $26 Billion by 2005. And Allied Business Intelligence ups the figure to 56 million mobile voice portal users in North America alone by the end of 2005, with 250,000 voice sites and a $50 Billion v-commerce market!

The variation in these figures makes the actual growth of the voice technology industry anyone's guess, especially as the big three handheld manufacturers (Ericsson, Motorola, and Nokia) have recently down-graded their sales predictions. Forrester Research is now forecasting a lowly mobile retail revenue for 2005 in Europe at 5 billion Euros, exactly one percent of Lehman Brother's predictions of revenues of 500 billion Euros by 2005.

On the technology side, the future of voice browsing will be visual. Multi-Modal is the buzzword, naming the combination of voice direction and visual data. "Right now navigation is very difficult on a WAP phone, to scroll through a variety of lists," Peter Settles of SpeechWorks says, "It will be much easier to do hands-free interaction."

In the wireless world a phone is for two purposes: To relay data using sound and to relay data visually. Ignoring the visual or sound side of the phone is like amputating a leg and trying to run. Multi-modal is problematic however, getting the two legs to works together has been difficult. The multi-modal technology is not prevalent today for two reasons. Steve Elrich, spokesperson for Nuance, a manufacturer of speech recognition software in 22 languages, tells WirelessDevNet: "[Multi-modal] is not a constraint on the software side. It is a constraint on the networks. People say wait for 3G technology, but others say that bandwidth will be filled up as soon as the devices can use it. Also, most phones are not capable of supporting two channels at the same time, just one for speech and one for visuals." Nuance showcased their voice browser where a phone directed a PC browser in October of 1998.

Sunil Soares, Program Director of Product Management for IBM Voice Services agrees that multi-modal is the future, "Customers are going to start demanding this [multi-modal] functionality," he claims. "In two to three years time this will take hold." BeVocal is the first portal to enable "voice in/WAP out" directions with driving directions shown on WAP output by calling 18004BVOCAL. In January NewFound began beta testing direct voice browsing of Google, AltaVista, or Fast Search. The public launch of this technology is expected in 3Q.

Steve Elrich of Nuance pictures the future with an extended vision: Nuance is working on the Intelligent Dial Tone. "We will be replacing the dial tone with a voice - really a voice browser. For example it will ask what you want to do, and you can answer, 'Call Dad at home,' or 'Check email messages,' or 'Check sports scores.' This is a full blown ecosystem forming!" VAI

No comments:

Post a Comment

Labels

Premium rate numbers

 
Copyright Premium rate numbers | Premium rate scams | Premium rate telecom 2009. Powered by Blogger.Designed by Ezwpthemes .
Converted To Blogger Template by Anshul .