Natural Language Processing

Natural Language Processing

Good Places to Start

Readings Online

Related Web Sites

More Readings
(see FAQ)

Recent News about THE TOPICS (annotated)

Progress on building computer systems that process natural language in any meaningful sense requires considering language as part of a larger communicative situation.

Regarding language as communication requires consideration of what is said (literally), what is intended, and the relationship between the two.

-Barbara Grosz, Utterance and Objective

The value to our society of being able to communicate with computers in everyday "natural" language cannot be overstated. Imagine asking your computer "Does this candidate have a good record on the environment?" or "When is the next televised National League baseball game?" Or being able to tell your PC "Please format my homework the way my English professor likes it." Commercial products can already do some of these things, and AI scientists expect many more in the next decade. One goal of AI work in natural language is to enable communication between people and computers without resorting to memorization of complex commands and procedures. Automatic translation---enabling scientists, business people and just plain folks to interact easily with people around the world---is another goal. Both are just part of the broad field of AI and natural language, along with the cognitive science aspect of using computers to study how humans understand language.

Good Places to Start

What is Computational Linguistics? Hans Uszkoreit, CL Department, University of the Saarland, Germany. 2000. A short, non-technical overview this exciting field

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. By Daniel Jurafsky and James H. Martin. Prentice-Hall, 2000. Both the Preface and Chaper 1 are available online as are the resources for all of the chapters.

Natural Language. A summary by Patrick Doyle. Very informative, though there are some spots that are quite technical.

NLP Tutorials. From Dave Inman, School of Computing, South Bank University, London. The topics covered include: Can computers understand language?; What kinds of ambiguity exist and why does ambiguity hinder NLP?; and A simple Prolog parser to analyse the structure of language.

Natural Language Processing. Lecture Notes from Associate Professor John Batali's course in Artificial Intelligence Modeling at the University of California at San Diego's Department of Cognitive Science.

Glossary of Linguistic Terms. Compiled by Dr. Peter Coxhead of The University of Birmingham School of Computer Science for his students.

Also see his Introduction to NLP.

The Futurist - The Intelligent Internet. The Promise of Smart Computers and E-Commerce. By William E. Halal. Government Computer News Daily News (June 23, 2004). "Scientific advances are making it possible for people to talk to smart computers, while more enterprises are exploiting the commercial potential of the Internet. ... [F]orecasts conducted under the TechCast Project at George Washington University indicate that 20 commercial aspects of Internet use should reach 30% 'take-off' adoption levels during the second half of this decade to rejuvenate the economy. Meanwhile, the project's technology scanning finds that advances in speech recognition, artificial intelligence, powerful computers, virtual environments, and flat wall monitors are producing a 'conversational' human-machine interface. These powerful trends will drive the next generation of information technology into the mainstream by about 2010. ... The following are a few of the advances in speech recognition, artificial intelligence, powerful chips, virtual environments, and flat-screen wall monitors that are likely to produce this intelligent interface. ... IBM has a Super Human Speech Recognition Program to greatly improve accuracy, and in the next decade Microsoft's program is expected to reduce the error rate of speech recognition, matching human capabilities. ... MIT is planning to demonstrate their Project Oxygen, which features a voice-machine interface. ... Amtrak, Wells Fargo, Land's End, and many other organizations are replacing keypad-menu call centers with speech-recognition systems because they improve customer service and recover investment in a year or two. ... General Motors OnStar driver assistance system relies primarily on voice commands, with live staff for backup; the number of subscribers has grown from 200,000 to 2 million and is expected to increase by 1 million per year. The Lexus DVD Navigation System responds to over 100 commands and guides the driver with voice and visual directions."

Experts Use AI to Help GIs Learn Arabic. By Eric Mankin. USC News (June 21, 2004). " To teach soldiers basic Arabic quickly, USC computer scientists are developing a system that merges artificial intelligence with computer game techniques. The Rapid Tactical Language Training System, created by the USC Viterbi School of Engineering's Center for Research in Technology for Education (CARTE) and partners, tests soldier students with videogame missions in animated virtual environments where, to pass, the students must successfully phrase questions and understand answers in Arabic." Read the story and then watch the video!

Natural Language Processing: She Needs Something Old & Something New (maybe something borrowed and something blue, too.) Karen Sparck Jones, University of Cambridge, UK. Her 1994 Presidential Address to the Assn. for Computational Linguistics (ACL). "I want to assess where we are now, in computational linguistics and natural language processing, compared with where we started, and to put my view of what we need to do next. ... Computational linguistics, or natural language processing (NLP), is nearly as old as serious computing. Work began more than forty years ago, and one can see it going through successive phases...."

Natural Language Processing FAQ. Maintained by Dragomir R. Radev. Dept. of Computer Science, Columbia University.

An Overview of Empirical Natural Language Processing. By Eric Brill and Raymond J. Mooney (1997). AI Magazine 18 (4): 13 - 24. "In recent years, there has been a resurgence in research on empirical methods in natural language processing. These methods employ learning techniques to automatically extract linguistic knowledge from natural language corpora rather than require the system developer to manually encode the requisite knowledge. The current special issue reviews recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction, and machine translation. This article presents an introduction to the series of specialized articles on these topics and attempts to describe and explain the growing interest in using learning methods to aid the development of natural language processing systems."

Chatbots / Chatterbots:

Alice, the Chat Robot. Winner of the 2000, 2001 & 2004 Loebner Prize.
Brian is a computer program that thinks it's an 18 year old college student. It was written as an entry in the 1998 Loebner Competition, where it won third place out of six entries. From Joe Strout.
Chatter Bots. From BotSpot.
Chatterbots. Hosted by Simon Laven. An extensive collection that includes classic chatterbots, complex chatterbots, friendly chatterbots, non-English bots (German, Spanish, French, and other languages), and much more.
Chatterbot links from the Open Directory project.
Deepti -- A Hindi Speaking Chat Robot. Be sure to see our news archive for the BBC article, Hindi chatbot breaks new ground (August 27, 2002).
Eliza - "a friend you could never have before". One of the earliest chatterbots.
- Also see: Eliza Doolittle on our Namesakes page.
George - winner of the 2005 Loebner Prize. [See these related articles.]
"Jabberwacky is an artificial intelligence - a chat robot, often known as a 'chatbot' or 'chatterbot'. It aims to simulate natural human chat in an interesting, entertaining and humorous manner. Jabberwacky is different. It learns. In some ways it models the way humans learn language, facts, context and rules." [Also see a related article.]
"John Lennon Artificial Intelligence Project (JLAIP) is recreating the personality of the late Beatle, John Lennon, by programming an Artificial Intelligence (AI) engine with Lennon's own words and thoughts. Triumph PC's breakthroughs take the field of AI to an entirely new level, thus making Persona-Bots™ (robots inhabited with unique and authentic human personalities) possible and further blurring the precarious line between man and machine."
Mr. Mind: "Turning the Turing Test upside down, MRMIND challenges you to prove to him that you are human. Can you claim that your 'human' attributes will forever be exclusively human? The Blurring Test is about human progress: Someday it might be important to convince our computers (and each other) that we are human."
"Pandorabots.com is a software robot (also known as a bot) hosting service. From any browser, you may create and publish your own robots to anyone via the web. We believe that our technology yields the fastest bots available on the Internet. The bots are based on AIML and spring entirely from the work of Dr. Richard Wallace and the A.L.I.C.E. and AIML free software community based at www.AliceBot.Org." As of this posting (8/02), there is no charge to create your own Chat Bot.
The Personality Forge. "Come on in, and chat with bots and botmasters, then create your own artificial intelligence personality, and turn it loose to chat with both real people and other chat bots. Here you'll find thousands of AI personalities...." Maintained by Benji Adams.
Valerie and Marion (Tank) LeFleur, the roboceptionists from the lobby of Carnerie Mellon University's Newell-Simon Hall. [See this related article.]
dialogues with colorful personalities of early ai
Why did the chicken cross the road? See our NewsToon!
To find out who coined the term, chatterbot, see our Namesakes page (which is where you'll also meet the man who designed the Turing Test).

Whatever happened to machines that think? By Justin Mullins. New Scientist (April 23, 2005; Issue 2496: pages 32 - 37). "Clever computers are everywhere. From robotic lawnmowers to intelligent lighting, washing machines and even car engines that self-diagnose faults, there's a silicon brain in just about every modern device you can think of. But can you honestly call any machine intelligent in a meaningful sense of the word? One rainy afternoon last February I decided to find out. I switched on the computer in my study, and logged on to www.intellibuddy.com, home to one of the leading artificial intelligences on the planet, to see what the state-of-the-art has to offer. ..."

Readings Online

"Computational Linguistics is the only publication devoted exclusively to the design and analysis of natural language processing systems. From this unique quarterly, university and industry linguists, computational linguists, artificial intelligence (AI) investigators, cognitive scientists, speech specialists, and philosophers get information about computational aspects of research on language, linguistics, and the psychology of language processing and performance. Published by The MIT Press for: The Association for Computational Linguistics." Abstracts are available online.

Natural Language Understanding. By Avron Barr (1980). AI Magazine 1(1): 5-10. "This is an excerpt from the Handbook of Artificial Intelligence, a compendium of hundreds of articles about AI ideas, techniques, and programs being prepared at Stanford University by AI researchers and students from across the country." Don't miss the fascinating section: Early History.

Empirical Methods in Information Extraction. By Claire Cardie (1997). AI Magazine 18 (4): 65-79. "This article surveys the use of empirical, machine-learning methods for a particular natural language-understanding task-information extraction. The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture."

Duo-Mining - Combining Data and Text Mining. By Guy Creese. DMReview.com (September 16, 2004). "As standalone capabilities, the pattern-finding technologies of data mining and text mining have been around for years. However, it is only recently that enterprises have started to use the two in tandem - and have discovered that it is a combination that is worth more than the sum of its parts. First of all, what are data mining and text mining? They are similar in that they both 'mine' large amounts of data, looking for meaningful patterns. However, what they analyze is quite different. ... Collections and recovery departments in banks and credit card companies have used duo-mining to good effect. Using data mining to look at repayment trends, these enterprises have a good idea on who is going to default on a loan, for example. When logs from the collection agents are added to the mix, the understanding gets even better. For example, text mining can understand the difference in intent between, 'I will pay,' 'I won't pay,' 'I paid' and generate a propensity to pay score - which, in turn, can be data mined. To take another example, if a customer says, 'I can't pay because a tree fell on my house;' all of a sudden it is clear that it's not a 'bad' delinquency - but rather a sales opportunity for a home loan."

A Short Introduction to Text-to-Speech Synthesis. By Thierry Dutoit. The Circuit Theory and Signal Processing Lab of the Faculte Polytechnique de Mons. "I try to give here a short but comprehensive introduction to state-of-the-art Text-To-Speech (TTS) synthesis by highlighting its Digital Signal Processing (DSP) and Natural Language Processing (NLP) components."

At I.B.M., That Google Thing Is So Yesterday. By James Fallows. The New York Times (December 26, 2004; reg. req'd.). "Suddenly, the computer world is interesting again. ... The most attractive offerings are free, and they are concentrated in the newly sexy field of 'search.' ... [T]oday's subject is the virtually unpublicized search strategy of another industry heavyweight: I.B.M. ... I.B.M. says that its tools will make possible a further search approach, that of 'discovery systems' that will extract the underlying meaning from stored material no matter how it is structured (databases, e-mail files, audio recordings, pictures or video files) or even what language it is in. The specific means for doing so involve steps that will raise suspicions among many computer veterans. These include 'natural language processing,' computerized translation of foreign languages and other efforts that have broken the hearts of artificial-intelligence researchers through the years. But the combination of ever-faster computers and ever-evolving programming allowed the systems I saw to succeed at tasks that have beaten their predecessors. ... ... Jennifer Chu-Carroll of I.B.M. demonstrated a system called Piquant, which analyzed the semantic structure of a passage and therefore exposed 'knowledge' that wasn't explicitly there. After scanning a news article about Canadian politics, the system responded correctly to the question, 'Who is Canada's prime minister?' even though those exact words didn't appear in the article."

Check out IBM's Natural Language Processing projects and their Natural Language Processing research overview.

dialogues with colorful personalities of early ai. By Guven Guzeldere and Stefano Franchi. (1995). From Constructions of the Mind: Artificial Intelligence and the Humanities, a special issue of the Stanford Humanities Review, Volume 4,Issue 2. "Of all the legacies of the era of the sixties, three colorful, not to say garrulous, "personalities" that emerged from the early days of artificial intelligence research are worth mentioning: ELIZA, the Rogerian psychotherapist; PARRY, the paranoid; and (as part of a younger generation) RACTER, the "artificially insane" raconteur. All three of these "characters" are natural language processing systems that can "converse" with human beings (or with one another) in English.

Meet our collection of chatterbots!

LifeCode: A Deployed Application for Automated Medical Coding. By Daniel T. Heinze, Mark Morsch, Ronald Sheffer, Michelle Jimmink, Mark Jennings, William Morris, and Amy Morsch. AI Magazine 22(2): 76-88 (Summer 2001). This paper is based on the authors' presentation at the Twelfth Innovative Applications of Artificial Intelligence Conference (IAAI-2000). "LifeCode is a natural language processing (NLP) and expert system that extracts demographic and clinical information from free-text clinical records."

Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language. Edited by Lucja M. Iwanska and Stuart C. Shapiro. AAAI Press. The following excerpt is from the Preface which is available online: "The research direction of natural language-based knowledge representation and reasoning systems constitutes a tremendous change in how we view the role of natural language in an intelligent computer system. The traditional view, widely held within the artificial intelligence and computational linguistics communities, considers natural language as an interface or front end to a system such as an expert system or knowledge base. In this view, inferencing and other interesting information and knowledge processing tasks are not part of natural language processing. By contrast, the computational models of natural language presented in this book view natural language as a knowledge representation and reasoning system with its own unique, computationally attractive representational and inferential machinery. This new perspective sheds some light on the actual, still largely unknown, relationship between natural language and the human mind. Taken to an extreme, such approaches speculate that the structure of the human mind is close to natural language. In other words, natural language is essentially the language of human thought."

"I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. By Lillian Lee, Cornell Natural Language Processing Group. To appear in the National Academies' Study on Fundamentals of Computer Science. "A brief, general-audience overview of the history of natural language processing, focusing on data-driven approaches."

A Performance Evaluation of Text-Analysis Technologies. By Wendy Lehnert and Beth Sundheim (1991). AI Magazine 12 (3): 81-94. "A performance evaluation of 15 text-analysis systems conducted to assess the state of the art for detailed information extraction from unconstrained continuous text. ... Based on multiple strategies for computing each metric, the competing systems were evaluated for recall, precision, and overgeneration. The results support the claim that systems incorporating natural language-processing techniques are more effective than systems based on stochastic techniques alone."

Natural Language Lecture Slides & Accompanying Transcripts from Professors Tomás Lozano-Pérez & Leslie Kaelbling's Spring 2003 course: Artificial Intelligence. Available from MIT OpenCourseWare.

Also available from OpenCourseWare: 6.881 Natural Language Processing (Fall 2004), Professor Regina Barzilay's graduate level introducory course.

Natural Language Understanding and Semantics. Section 1.2.4 of Chapter One (available online) of George F. Luger's textbook, Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 5th Edition (Addison-Wesley; 2005). "One of the long-standing goals of artificial intelligence is the creation of programs that are capable of understanding and generating human language. Not only does the ability to use and understand natural language seem to be a fundamental aspect of human intelligence, but also its successful automation would have an incredible impact on the usability and effectiveness of computers themselves. ... Understanding natural language involves much more than parsing sentences into their individual parts of speech and looking those words up in a dictionary. Real understanding depends on extensive background knowledge about the domain of discourse and the idioms used in that domain as well as an ability to apply general contextual knowledge to resolve the omissions and ambiguities that are a normal part of human speech."

Computers That Speak Your Language - Voice recognition that finally holds up its end of a conversation is revolutionizing customer service. Now the goal is to make natural language the way to find any type of information, anywhere. By Wade Roush. Technology Review (June 2003). "Building a truly interactive customer service system like Nuance’s requires solutions to each of the major challenges in natural-language processing: accurately transforming human speech into machine-readable text; analyzing the text’s vocabulary and structure to extract meaning; generating a sensible response; and replying in a human-sounding voice." And be sure to see the illustration in the article: Inside a Conversational Computer.

Chatbot bids to fool humans - A computer program designed to talk like a human is preparing for its biggest test in its bid to be truly "intelligent". By Jo Twist. BBC (September 22, 2003). "Jabberwacky lives on a computer hard drive, tells jokes, uses slang, sometimes swears and can be quite a confrontational conversationalist. What sets this chatty AI (artificial intelligence) chatbot apart from others is the more it natters, the more it learns. The bot is the only UK finalist in this year's Loebner Prize and is hoping to chat its way to a gold medal for its creator, Rollo Carpenter. The Loebner Prize is the annual competition to find the computer with the most convincing conversational skills and started in 1990. Jabberwacky will join eight other international finalists in October, when they pit their wits against flesh and blood judges to see if they can pass as one of them. It is the ultimate Turing Test, which was designed by mathematician Alan Turing to see whether computers 'think' and have 'intelligence'."

Meet our collection of chatterbots!
Go to our Turing Test page.

Related Web Sites

The Association for Computational Linguistics (ACL) is the "international scientific and professional society for people working on problems involving natural language and computation."

ACL NLP/CL Universe. Choose "Browse" to see menus of what is offered (introductory materials, research groups, conferences, bibliographies, etc.) or choose "Search" for a keyword search engine. ["The NLP/CL Universe is a Web catalog/search engine that is devoted to Natural Language Processing and Computational Linguistics Web sites. It exists since March 18, 1995." Maintained by Dragomir R. Radev for ACL.]

AI on the Web: Natural Language Processing. A resource companion to Stuart Russell and Peter Norvig's "Artificial Intelligence: A Modern Approach" with links to reference material, people, research groups, books, companies and much more.

Natural Language Group. Information Sciences Institute, University of Southern California. Be sure to see What It's All About: "Natural Language Processing (or Human Language Technology, or Computational Linguistics) is about the treatment of human languages by computer dating since the early 1950s. NLP has experienced unprecedented growth over the past few years. ..."

Natural Language Learning at UT Austin. "Natural language processing systems are difficult to build, and machine learning methods can help automate their construction significantly. Our research in learning for natural language mainly involves applying inductive logic programming and other relational learning techniques to constructing database interfaces and information extraction systems from supervised examples. However, we have also conducted research in learning for syntactic parsing, machine translation, word-sense disambiguation, and morphology (past tense generation)." Check out the 3 demos of learning natural-language interfaces: Geoquery; RestaurantQuery; and JobQuery.

Natural Language Processing Course Listing, part of the 2004 NLP Course Survey conducted by ACL (Association for Computational Linguistics).

The Natural Language Processing Dictionary (NLP Dictionary). Compiled by Bill Wilson, Associate Professor in the Artificial Intelligence Group, School of Computer Science and Engineering, University of NSW. "You should use The NLP Dictionary to clarify or revise concepts that you have already met. The NLP Dictionary is not a suitable way to begin to learn about NLP."

Natural Language Processing Group. Department of Artificial Intelligence, University of Edinburgh.

"The goal of the [Microsoft] Natural Language Processing (NLP) group is to design and build a computer system that will analyze, understand, and generate languages that humans use naturally, so that eventually you can address your computer as though you were addressing another person. This goal is not easy to reach. ... The challenges we face stem from the highly ambiguous nature of natural language."

Natural Language Processing Resource Sites. from Mary D. Taffet. "Please note: This webpage was created primarily for the use of the students in the Natural Language Processing course taught by Elizabeth Liddy at Syracuse University's School of Information Studies. Students of other NLP or Computational Linguistics courses are more than welcome to make use of this page as well. The primary purpose of this page is to point to: 1.NLP-related demos that are available online ... 2.Resources relevant to the various levels of language processing 3.Other useful links for NLP students, relating to any aspect of Natural Language Processing that might be encountered in an academic course, from the lowest levels of language processing to the highest levels."

She also offers an instructor version: "This page for instructors differs from the student version by also including links to online course descriptions and syllabi and online degree program descriptions."

Natural Language Program. Artificial Intelligence Center, SRI. "The SRI AI Center Natural Language Program does research on natural language processing theory and applications. The Program has three subgroups. Multimedia/Multimodal Interfaces ... Spoken Language Systems ... Written Language Systems." Be sure to follow their links to projects, applications, and more!

"The Natural Language Software Registry (NLSR) [fourth edition] is a concise summary of the capabilities and sources of a large amount of natural language processing (NLP) software available to the NLP community. It comprises academic, commercial and proprietary software with specifications and terms on which it can be acquired clearly indicated." From the Language Technology Lab of the German Research Centre for Artificial Intelligence (DFKI GmbH).

START. "The START Natural Language System is a software system designed to answer questions that are posed to it in natural language. START parses incoming questions, matches the queries created from the parse trees against its knowledge base and presents the appropriate information segments to the user. In this way, START provides untrained users with speedy access to knowledge that in many cases would take an expert some time to find."

Stanford NLP Group. "A distinguishing feature of the Stanford NLP Group is our effective combination of sophisticated and deep linguistic modeling and data analysis with innovative probabilistic and machine learning approaches to NLP."

Be sure to see Christopher Manning's annotated list of statistical natural language processing and corpus-based computational linguistics resources.

Related Pages

Discourse Analysis
Ethics
General Index to AI in the news: Natural Language Processing
Human-Computer Interfaces
Machine Translation
Marketing, Customer Relations & E-Commerce
Natural Language Understanding & Generation
Philosophy
Software & Hardware
Speech
Turing Test
and check out our page of Namesakes to find discover what contribution George Kingsley Zipf (1902-1950) made to NLP

More Readings

Aikins, Janice, Rodney Brooks, William Clancey, et al. 1981. Natural Language Processing Systems. In The Handbook of Artificial Intelligence, Vol. I, ed.Barr, Avron and Edward A. Feigenbaum, 283-321. Stanford/Los Altos, CA: HeurisTech Press/William Kaufmann, Inc.

Allen, J. F. 1994. Natural Language Understanding. Redwood City, CA: Benjamin/Cummings. A new edition of a classic work.

Bobrow, Daniel. 1968. Natural Language Input for a Computer Problem Solving System. In Semantic Information Processing, ed. Minsky, Marvin, 133-215. Cambridge, MA: MIT Press.

Charniak, E. 1993. Statistical Language Learning. Cambridge, MA: MIT Press.

Cohen, P., J. Morgan, and M. Pollack. 1990. Intentions in Communication. Cambridge, MA: MIT Press.

Grosz, Barbara J., Martha E. Pollack, and Candace L. Sidner. 1989. Discourse. In Foundations of Cognitive Science, ed. Posner, M., 437-468. Cambridge, MA: MIT Press.

Grosz, Barbara J., Karen Sparck Jones, and Bonnie L. Webber, editors. 1986. Readings in Natural Language Processing. San Mateo, CA: Morgan Kaufmann.

Mahesh, Kavi, and Sergei Nirenburg. 1997. Knowledge-Based Systems for Natural Language. In The Computer Science and Engineering Handbook, ed. Allen B. Tucker, Jr., 637-653. Boca Raton, FL: CRC Press, Inc.

McKeown, K., and W. Swartout. 1987. Language Generation and Explanation. In Annual Review of Computer Science, Vol. 2, Palo Alto, CA: Annual Reviews.

Patterson, Dan W. 1990. Natural Language Processing. In Introduction to Artificial Intelligence and Expert Systems by Dan W. Patterson, 227-270. Englewood Cliffs, NJ: Prentice Hall.

Shank, Roger C. 1975. The Structure of Episodes in Memory. In Computation and Intelligence: Collected Readings, ed. Luger, George F., 236-259. Menlo Park/Cambridge, MA: AAAI Press/The MIT Press, 1995.

Weizenbaum, J. 1965. ELIZA--A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM, 9 (1): 36-45. A pioneering work.

Winograd, T. 1972. Understanding Natural Language. New York: Academic Press. A pioneering work.

Fair Use Notice