«

»

Mar 20 2011

Using Voice Recognition In The Cloud


em13ujri thumb Using Voice Recognition In The CloudIn the world of operating systems and usability, and in the realm of voice recognition for the general population, Microsoft, Google and Nuance are the only ones that are competing. Microsoft is the worst of the bunch, while Microsoft does have a good record of voice recognition with their operating system, the GUI is so bad… it’s realistically unusable.

Google has voice recognition as well, and two years ago, I would’ve said it wasn’t very good— but the fact is that its gotten a lot better.

And lastly, the best voice recognition engine is from Nuance. Their target is the PC/MAC market, but they do offer options for the Smartphones; and their performance in these areas are stellar. Nuance’s flagship offering is Dragon NaturallySpeaking,but they have their hands in all kinds of other voice recognition projects; they’ve had years of R&D to play with [and it shows in their results].

For the desktop, Microsoft’s voice recognition isn’t going even make the cut on this article. It’s barely worth mentioning. Microsoft doesn’t even ‘talk’ about their voice recognition… it’s not something they even want to compare. So, I’m dropping them from any desktop contention right here. But they might, they’ve got the option, to add something— if someone else thinks about it. And if they do, they will be using the Tellme option’s.

If you use voice recognition as much as I do, then then you already know about Google’s Chrome, and the plug-in called Voice Search. This plug-in allows the user to use Google’s voice recognition technology to enter the search terms; to do searching a bit faster. It use a functions called ‘form speech input’— it’s something you can turn on via the command-line as well.

The voice recognition is strictly English right now, and depending on how well you speak and the quality of the sound input, depends on how useful the option is… but this true of most current voice recognition systems/features.

And all this great, the more Google can use their voice recognition engine the better it will be, and what better way to improve your engine than to have millions of people sending you tons of voice samples and text corrections…

 

But focusing on this point, I was wondering about an option like this from Nuance. Nuance does a way to get examined data back to the Nuance data farms to be examined, but it’s not live; it has to be composed, compressed and transferred to Nuance at a random time— that means there’s a delay.

In the technology world, the word ‘delay’ means loser… technology moves fast, and you can’t have ‘delays’. Delays mean you’re not first, or you’re following. Ask Microsoft about that, they know all about it for the last 10 years.

 

So, I posed a question to Nuance…

Question:

'…Do you think Nuance would ever offer the services of their voice recognition for the PC/MAC with a direct connection back to your servers? Like the FlexT9?’

Response:

We thought about that, don’t have anything to announce…

 

Reading in to this, I believe they already know that ‘delay’ is a bad thing; you need to be able to move and adjust on the fly when you need to; not like what Microsoft has done in their tenure as leading operating system [for now].

 

The MAIN reason I asked this question, was that there’s obviously a fairly large shift going on right now. Stationary operating system’s are losing ground fast. Smartphones and tablet computing is taking flight at a rapid rate. The world of voice recognition for Microsoft is an empty dream. Companies like Google and Nuance are going to survive on embracing this platform.

Nuances response to this general query is:

‘…We are very market-driven – we focus on the operating systems that have large consumer market share because we have somewhat limited development resources for the specialized kind of application.’

 

And that makes complete sense… from a perspective. You can’t have a small army of programmers on hand to develop applications for everything in the market of operating systems. Or are you losing a foothold because you’re NOT in the markets? Perception is everything…

But the beauty of what Google and Nuance has done has released the requirements being a desktop based application. Now, they can be anywhere with an Internet connection. They can be everywhere and not require a ton of system resources to function. Their design says that if the operating system can record sound, it can do voice recognition; it’s that simple.

Microsoft has an option like this too, but Microsoft is losing market share quickly. They can’t seem to hold the Smartphone market with a Windows operating system, and with iOS and Android tablets flooding the market— it’s not looking good for them.

Reminds me of a fat cop chasing an Olympic track runner. 

Another BIG difference, between Nuance and Google is the development costs. Google offers an API for their voice recognition for free and Nuance has a fee attached with their SDK… and like other stories [Netscape vs. Internet Explorer]; it’s hard to compete with free. Every Android device can have voice recognition if they just write to the API; this encourages the development and proliferation of voice recognition based applications for Android. But Dragon NaturallySpeaking charges for this interface, and at a steep cost; so, that’s a roadblock for a lot of developers.

 

I believe some people reading this would scream ‘privacy’, but there are plenty of people already using these service and I haven’t heard of any fouls yet, plus there’s a disclaimer for using it [all providers have theirs]; so you have to agree to use it. If it’s that private, yeah, use a keyboard.

 

I’d like to see an option from Google and from Nuance [even Microsoft] to offer voice recognition to all computers based on cloud voice recognition. Everyone doesn’t know it yet, but voice recognition is where it’s going to be for user input. Offering voice recognition to everyone is the right thing to do, and having a high end computer to do voice recognition isn’t the answer, the answer is to have the ‘cloud’ handle the voice recognition work and just send back the results.

If it can be done with 1ghz Smartphones, it can be done on 1ghz old computers.

 

If you had voice recognition on every device you used; would you use it?

 

Thank you,
Larry Henry Jr.
LEHSYS.com 

 Using Voice Recognition In The Cloud


pixel Using Voice Recognition In The Cloud

2 comments

2 pings

  1. Chris

    I'm a little bemused by your antipathy for Windows Speech Recognition. I use speech with my medical applications, utilizing Dragon Dictate Medical ($1500), but I also use WSR and Voice Search. DNS wins hands down but WSR is very good.

    1. lehenryjr

      Windows speech recognition is good but with the existing GUI and operation it is barely workable. To be productive. Microsoft would have to completely revamp the GUI. Thanks

      for your comments. They are appreciated.

      Larry

  1. Motorola Xoom now has Dragon NaturallySpeaking— FlexT9 | LEHSYS

    [...] Using Voice Recognition In The Cloud (lehsys.com) [...]

  2. Techie Review: Dragon NaturallySpeaking 11 | LEHSYS

    [...] Using Voice Recognition In The Cloud (lehsys.com) [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>