|
Speech |
|
||||||
|
The 1990s saw the first commercialization of spoken language understanding systems. Computers can now understand and react to humans speaking in a natural manner in ordinary languages within a limited domain. Basic and applied research in signal processing, computational linguistics and artificial intelligence have been combined to open up new possibilities in human-computer interfaces. Common sense boosts speech software. By Eric Smalley. Technology Research News (March 23 / 30, 2005). "Speech recognition software matches strings of phonemes -- the sounds that make up words -- to words in a vocabulary database. The software finds close matches and presents the best one. The software does not understand word meaning, however. This makes it difficult to distinguish among words that sound the same or similar. The Open Mind Common Sense Project database contains more than 700,000 facts that MIT Media Lab researchers have been collecting from the public since the fall of 2000. These are based on common sense like the knowledge that a dog is a type of pet rather than the knowledge that a dog is a type of mammal. The researchers used the phrase database to reorder the close matches returned by speech recognition software. ... 'One surprising thing about testing interfaces like this is that sometimes, even if they don't get the absolutely correct answer, users like them a lot better,' said [Henry] Lieberman. 'This is because they make plausible mistakes, for example 'tennis clay court' for 'tennis player', rather than completely arbitrary mistakes that a statistical recognizer might make, for example 'tennis slayer',' he said. "
Spoken Language Systems Group, MIT Computer Science and Artificial Intelligence Laboratory.
Conversations control computers. By Eric Smalley. Technology Research News (January 12/19, 2005). "Because information from spoken conversations is fleeting, people tend to record schedules and assignments as they discuss them. Entering notes into a computer, however, can be tedious -- especially when the act interrupts a conversation. Researchers from the Georgia Institute of Technology are aiming to decrease day-to-day data entry and to augment users' memories with a method that allows handheld computers to harvest keywords from conversations and make use of relevant information without interrupting the personal interactions. ... The researchers' system protects privacy by only using speech from the user's side of the conversation, said [Kent] Lyons."
Making Computers Talk - Say good-bye to stilted electronic chatter: new synthetic-speech systems sound authentically human, and they can respond in real time. By Andy Aaron, Ellen Eide and John F. Pitrelli. Scientific American Explore (March 17, 2003). "Scientists have attempted to simulate human speech since the late 1700s, when Wolfgang von Kempelen built a 'Speaking Machine' that used an elaborate series of bellows, reeds, whistles and resonant chambers to produce rudimentary words."
Ernestine, Meet Julie - Natural language speech recognition is markedly improving voice-activated self-service. By Karen Bannan. CFO Magazine (January 1, 2005). "A new technology, called natural language speech recognition, is markedly improving voice-activated self-service. Powered by artificial intelligence, these speech-recognition systems are altering consumer perceptions about phone self-service, as calls for help no longer elicit calls for help. That, in turn, is spurring renewed corporate interest in the concept of phone self-service. In 2004, sales of voice self-service systems topped $1.2 billion. 'We've seen voice systems move from emerging technology to applied technology over the last few years,' says Steve Cramoysan, principal analyst at Stamford, Connecticut-based research firm Gartner. 'It's still fairly immature. But it's proven and moving toward the mainstream.'" The Futurist - The Intelligent Internet. The Promise of Smart Computers and E-Commerce. By William E. Halal. Government Computer News Daily News (June 23, 2004). "Scientific advances are making it possible for people to talk to smart computers, while more enterprises are exploiting the commercial potential of the Internet. ... [F]orecasts conducted under the TechCast Project at George Washington University indicate that 20 commercial aspects of Internet use should reach 30% 'take-off' adoption levels during the second half of this decade to rejuvenate the economy. Meanwhile, the project's technology scanning finds that advances in speech recognition, artificial intelligence, powerful computers, virtual environments, and flat wall monitors are producing a 'conversational' human-machine interface. These powerful trends will drive the next generation of information technology into the mainstream by about 2010. ... The following are a few of the advances in speech recognition, artificial intelligence, powerful chips, virtual environments, and flat-screen wall monitors that are likely to produce this intelligent interface. ... IBM has a Super Human Speech Recognition Program to greatly improve accuracy, and in the next decade Microsoft's program is expected to reduce the error rate of speech recognition, matching human capabilities. ... MIT is planning to demonstrate their Project Oxygen, which features a voice-machine interface. ... Amtrak, Wells Fargo, Land's End, and many other organizations are replacing keypad-menu call centers with speech-recognition systems because they improve customer service and recover investment in a year or two. ... General Motors OnStar driver assistance system relies primarily on voice commands, with live staff for backup; the number of subscribers has grown from 200,000 to 2 million and is expected to increase by 1 million per year. The Lexus DVD Navigation System responds to over 100 commands and guides the driver with voice and visual directions." From Your Lips to Your Printer. By James Fallows. The Atlantic (December 2000). "First, the computer captures the sound waves the speaker generates, tries to filter them from coughs, hmmmms, and meaningless background noise, and looks for the best match with the phonemes available. (A phoneme is the basic unit of the spoken word.)" Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. By Daniel Jurafsky and James H. Martin. Prentice-Hall, 2000. Both the Preface and Chapter 1 are available online as are the resources for all of the chapters. FAQs.
Topics covered include: general information, signal processing, speech
coding and compression, natural language processing, speech synthesis,
and speech recognition.
Speech
Recognition Using Neural Networks. By John-Paul
Hosom, Ron Cole, and Mark Fanty at the Center for Spoken Language Understanding,
Oregon Graduate Institute of Science and Technology. "There are four
basic steps to performing recognition. ... First, we digitize the speech
that we want to recognize; for telephone speech the sampling rate is 8000
samples per second. Second, we compute features that represent the spectral-domain
content of the speech (regions of strong energy at particular frequencies).
... Third, a neural network (also called an ANN, multi-layer perceptron,
or MLP) is used to classify a set of these features into phonetic-based
categories at each frame. Fourth, a Viterbi search is used to match the
neural-network output scores to the target words (the words that are assumed
to be in the input speech), in order to determine the word that was most
likely uttered." This tutorial also includes several diagrams that
clarify the many of the concepts.
Experts Use AI to Help GIs Learn Arabic. By Eric Mankin. USC News (June 21, 2004). " To teach soldiers basic Arabic quickly, USC computer scientists are developing a system that merges artificial intelligence with computer game techniques. The Rapid Tactical Language Training System, created by the USC Viterbi School of Engineering's Center for Research in Technology for Education (CARTE) and partners, tests soldier students with videogame missions in animated virtual environments where, to pass, the students must successfully phrase questions and understand answers in Arabic." Read the story and then watch the video! Speech in Education. By Phillip Britt. Speech Technology Magazine (June / July 2005). "Speech-enabled applications and hardware are increasingly finding their way into the classroom and into the offices of educators at all levels of education, but educational applications still represent a small, though growing, segment of the speech technology market, according to industry analysts." A
Short Introduction to Text-to-Speech Synthesis. Capitalize on Customer Conversations with Speech Analytics. By Donna Fluss. Speech Technology Magazine (September / October 2005). "For years, speech analytics have been used worldwide by security organizations to help government agencies identify potential risks and threats. In the past two years, contact centers have begun to use speech analytics applications to capture and structure customer communications. The applications analyze the structured data to identify customer trends and insights for the purpose of improving service quality, customer satisfaction, and generating new revenue. There are three major analysis techniques and outputs from speech analytics: Keyword or Key Phrase Identification ... Emotion Detection ... Talk Analysis." Automatic Speech Recognition, Spring 2003. Staff Instructors: Dr. James Glass and Professor Victor Zue. Available from MIT OpenCourseWare. "6.345 is a course in the department's 'Bioelectrical Engineering' concentration. This course offers a full set of lecture slides with accompanying speech samples, as well as homework assignments and other materials used in the course. 6.345 introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing." IBM gets smart about Artificial Intelligence. By Pamela Kramer. IBM Think Research (June 2001). "Computer vision is important to speech recognition, too. Visual cues help computers decipher speech sounds that are obscured by environmental noise. Chalapathy Neti, manager of IBM's audiovisual speech technologies (AVST) group at Watson, often cites HAL's lip-reading ability in 2001 in promoting the group's work."
Men all ears as health technology gets hearing. The Northern Daily Leader & tamworth.yourguide (June 16, 2004). "A revolutionary hearing aid was just one of a number of new technological exhibits on show at the Men's Health Expo in Tamworth yesterday to coincide with Men's Health Week. The hearing aid allows the person wearing it to focus on a specific conversation more clearly while drowning out any other noises in the room. It has been designed to select the best speech over noise using parallel processing through a new concept called syncro. ... Spokesman James Battersby for Oticon, which manufactures the hearing aid, said ... 'It's design has been created by using artificial intelligence and allows the wearer to cancel out up to four different noises simultaneously.'"
The Power of Speech. By Lawrence Rabiner, Center for Advanced Information Processing, Rutgers University. Science (September 12, 2003; Volume 301, Number 5639: 1494-1495). "In the multimedia world of future communications, speech will play an increasingly important role. From speaker verification to automatic speech recognition and the understanding of key phrases by computers, the spoken word will replace keyboards and pointing devices like the mouse. In his Perspective, Rabiner discusses recent advances and remaining challenges in the processing of speech by communication devices. The key challenge is to make the user interface for 21st-century services and devices as easy to learn and use as a telephone is today for voice conversations."
Computers That Speak Your Language. By Wade Roush. Technology Review (June 2003). Be sure to see the illustration in the article: Inside a Conversational Computer. Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32. Is There a Future for Speech in Vehicles? By Kenneth White, Harvey Ruback and Roberto Sicconi. Speech Technology Magazine (November / December 2004). "Today, speech recognition technology is becoming an important component in how people are using and interacting with their cars. ... Many people associate speech in cars with science fiction movies and television shows where the cars act like R2D2 robots on wheels. In today’s world the main reason for using speech is less Hollywood and more pragmatic. In fact, it usually boils down to safety. ... The car represents a very challenging environment for voice technologies. The challenges range from creating optimal operation in an unpredictable and noisy environment to dealing with very limited system resources, such as memory/CPU." "The Institute for Signal and Information Processing (ISIP) [at Mississippi State University] has been established to launch a multidisciplinary program to develop next generation information processing techniques. Research at ISIP is centered on intelligent information processing, perhaps the most important technology of the next century. ISIP draws upon a wide range of research experience in areas such as signal processing, communications, natural language, database query, intelligent systems, and discrete controls. Its present vision is to develop systems capable of intelligent interactions with users by the integration of a multiplicity of interface technologies including speech, natural language, database query, and imaging."
The Centre for Speech Technology Research at the University of Edinburgh [CSTR]: "Founded in 1984, CSTR is concerned with research in all areas of speech technology including speech recognition, speech synthesis, speech signal processing, information access, multimodal interfaces and dialogue systems. We have many collaborations with the wider community of researchers in language, cognition and machine learning for which Edinburgh is renowned." Be sure to see their collection of current research projects . HP SpeechBot - audio search using speech recognition. From Hewlett-Packard.
The Meeting Recorder Project at ICSI [The International Computer Science Institute]. "Despite recent advances in speech recognition technology, successful recognition is limited to co-operative speakers using close-talking microphones. There are, however, many other situations in which speech recognition would be useful - for instance to provide transcripts of meetings or other archive audio. Speech researchers at ICSI, UW, SRI, and IBM are very interested in new application domains of this kind, and we have begun to work with recorded meeting data." - from the Introduction ModelTalker. From the Speech Research Laboratory, duPont Hospital for Children and University of Delaware. Not only can you pick a voice for the demo, but you can also pick an emotion! Quantifying Room Acoustic Quality Using Artificial Neural Networks Project. Salford Acoustics Audio and Video at the University of Salford. "This project was concerned with spaces where good acoustics are required for speech. Such spaces include shopping malls and railway stations where announcements need to be intelligible, and theatres where the quality of sound plays a crucial role in the enjoyment of a performance. The project researched a novel measurement technique intended to increase understanding of acoustics by enabling in-use, non-invasive evaluation of room acoustics to be made. ... The measurement system proposed derives the acoustic quality from a speech signal as received by a microphone in a room. Neural networks learn how to extract the determining characteristics from the speech signals that lead to the objective parameters. In this way, the neural networks predict the reverberation time, early decay time, STI (Speech Transmission Index) and RASTI (RApid Speech Transmission Index). In addition to enabling occupied measurements, the development of the neural network sensing system is of academic interest, as it is forming an artificial intelligence system to mimic the behaviour of human perception." Speech at CMU Web Page. An extensive collection of speech resources from Carnegie Mellon University with links to many exciting projects (both at CMU and around the world). Talking
Heads. "This website provides an overview of the rapidly growing
international effort to create talking heads (physiological / computational
/ cognitive models of audio-visual speech), the historical antecedents
of this effort, and related work. Links are provided (where possible)
to the sites of many researchers and commercial entities working in this
diverse and exciting area." "This site is also designed as a
working outline for a book presently being written by Philip Rubin and
Eric Vatikiotis-Bateson," who maintain the site. Here's a peek at
what awaits you:
Dennis Klatt's History of Speech Synthesis. "Audio clips of synthetic speech illustrating the history of the art and technology of synthetically produced human speech."
Aaron, Andy and Ellen Eide, John F. Pitrelli. June 2005: Conversational Computers. Scientific American (subscription req'd). "Call a large company these days, and you will probably start by having a conversation with a computer. Until recently, such automated telephone speech systems could string together only prerecorded phrases. ... Computer-generated speech has improved during the past decade, becoming significantly more intelligible and easier to listen to. But researchers now face a more formidable challenge: making synthesized speech closer to that of real humans--by giving it the ability to modulate tone and expression, for example--so that it can better communicate meaning. This elusive goal requires a deep understanding of the components of speech and of the subtle effects of a person's volume, pitch, timing and emphasis. That is the aim of our research group at IBM and those of other U.S. companies, such as AT&T;, Nuance, Cepstral and ScanSoft, as well as investigators at institutions including Carnegie Mellon University, the University of California at Los Angeles, the Massachusetts Institute of Technology and the Oregon Graduate Institute." Erman, Lee D. and Frederick Hayes-Roth, Victor R. Lesser, D. Raj Reddy. 1980. The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. ACM Computing Surveys 12(2): 213 - 253. "The Hearsay-II speech-understanding system ... recognizes connected speech in a 1000-word vocabulary with correct interpretations for 90 percent of test sentences. Its basic methodology involves the application of symbolic reasoning as an aid to signal processing. A marriage of general artificial intelligence techniques with special acoustic and linguistic knowledge was needed to accomplish satisfactory speech-understanding performance." Nii, Penny H. 1986. Blackboard Systems: The Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures. AI Magazine 7 (2): 38-64. "The first blackboard system was the HEARSAY-II speech understanding system (Erman et al.,1980) that evolved between 1971 and 1976. Subsequently, many systems have been built that have similar system organization and run-time behavior. The objectives of this article are (1) to define what is meant by 'blackboard systems' and (2) to show the richness and diversity of blackboard system designs."
|