| Kristan 的个人资料The Monopoly日志列表网络 | 帮助 |
|
2008/1/19 Voice ActivationVoice activation is the next big thing. It is already practical for many large organizations such as governments and vehicle security companies. There is still one small hurdle to be made. This small hurdle will be made when artificial intelligence becomes more widely useable.
In a world of sight, sound, taste, and touch, we may lose track of our other senses and become trapped in the consumerism bubble. It is something you always know is there, but you never pay attention to it, usually not even after you are trapped. This is what many large companies are running into when they try to implement voice recognition software. Voice recognition is the ultimate outsource. It can understand whatever language or dialect it needs to, simply through guided input. Also, if a computer is not programmed properly, you simply have to re-program it to work correctly, unlike with a human being. The only problem is when the consumer becomes so annoyed with it that he or she decides to just give up instead of reporting a glitch in the program, since many voice activated technologies can be semi-functional even if you can't do what you are trying to do.
Voice recognition software is not only extremely useful, it is, as I said earlier, the next big thing. It will take over the jobs of all telephone operators in the entire world. Although the perfect voice activated program is still probably years away, the science behind it is here, and has been since the fifties. The voice programs that you may be familiar with currently give you a list of options and ask you to say one of them. It gives you a short amount of time to say one, and then either goes to the next option or asks you to repeat what you said. Newer voice activated programs allow you to choose a word to use as a command, such as a family member's name or some unique password. When you say a word, it tries to figure out which word you used based on what you said before.
I am not familiar with the pogramming behind the more recent voice recognition software, but I can tell you how it should be done. First, all of the valid sounds that a person can make must be stored into a phonetic database. There must be a list of alternates for each sound, based on similarity checks to every other sound in the database. Secondly, all words in the target language must be phonetically mapped. This does not mean to record every word and get the computer to reognize it. This means to spell each word based on how it sounds. For example, here would look like hEr and dictionary would look like dic-shun-erE. This is because everything a person would say is not spelled in the same way it is written, it is spelled in a phonetic code. Once this is done, the words are much easier for the computer to understand.
The phonetic dictionary takes all possible meanings of the sounds and puts them in a list of words that are spelled correctly. This includes words that may be unique to a dialect, such as yeah for yes. The dialect specific word is translated into the standard word, and a new list is made. Now the words that couldn't possibly make sense in the context of the sentence must be thrown out. For example, if the computer wants a yes or no answer, and the list is HELLO GOODBYE YES GANGSTER PEA, the words that don't make sense are thrown out, and the new list only contains the term YES. Since it is completely possible that multiple words may remain in the list after this step, the program should never say it does not understand the answer, it should instead give the possible words back, and ask for 1 or 2 or 3, or to make a sound when the correct word is returned.
If at all possible, an interactive screen should be present in order to help with mis-understandings and new word entry. I also did not cover the differences in voices, because it is a main focus with voice recognition software already. It is possible to simply measure the average tones and raise or lower the voiceprint automatically. 引用通告此日志的引用通告 URL 是: http://kwifler.spaces.live.com/blog/cns!BF0BDF4F37B18E13!198.trak 引用此项的网络日志
|
|
|