|
Information
Retrieval & Extraction (a subtopic of Applications) |
|
|||||||
|
How in the world can anyone find just the right bit of information that they need, out of the available ocean of information, an ocean that continues to expand at an astonishing rate? Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed. AI Knows It’s Out There. Red Herring (August 22, 2005 print issue). "Intelligent Search ... More upscale, with costs in the hundreds of thousands of dollars, are the intelligent search systems sold by InQuira of San Bruno, California. The systems are based on natural language processing, a branch of AI that enables the system to comprehend what a person is really asking, at least if the question is posed in standard English. 'Pointing customers at documents does not approach the productivity of being able to understand a request and pull the right paragraph up to their screen,' says Bob Macdonald, chief marketing officer at InQuira."
Information Service Agent Research. "The Information Service Agent Lab at Simon Fraser University develops novel techniques for interactive information gathering and integration. The research applies artificial intelligence planning and learning techniques and database technologies to create knowledge bases from large collections of dynamically changing, potentially inconsistent and heterogeneous data sources, permitting users access to information at the right abstraction level." Projects. Software Agents Group, MIT Media Lab. Wide-ranging approaches to information retrieval that include user profiling, information filtering, privacy, recommender systems, communityware, negotiation mechanisms and coordination. Inside Google - From the Labs, Google Labs [audio]. Presentation by Peter Norvig at the 2005 O'Reilly Emerging Technology Conference. Available from IT Conversations. "Google has expanded from searching webpages to searching videos, books, places and even files on your own desktop. This expansion is made possible though Google's understanding and classification of information, facilitated by the application of algorithms in the domains of Machine Learning, Natural Language Processing and Artificial Intelligence. ... Peter Norvig is the Director of Search Quality at Google Inc. He is a Fellow and Councilor of the American Association for Artificial Intelligence and co-author of Artificial Intelligence: A Modern Approach, the leading textbook in the field."
Information Agents Group at the Information Sciences Institute, University of Southern California. CIRES - Content Based Image REtrieval System developed by Qasim Iqbal at the Computer and Vision Research Center (CVRC) in the Department of Electrical and Computer Engineering at The University of Texas at Austin. " CIRES is a robust content-based image retrieval system based upon a combination of higher-level and lower-level vision principles. Higher-level analysis uses perceptual organization, inference and grouping principles to extract semantic information describing the structural content of an image. Lower-level analysis employs a channel energy model to describe image texture, and utilizes color histogram techniques. ... The system is able to serve queries ranging from scenes of purely natural objects such as vegetation, trees, sky, etc. to images containing conspicuous structural objects such as buildings, towers, bridges, etc." Be sure to check out the sample queries. CIIR. The Center for Intelligent Information Retrieval at UMass. "The scope of the CIIR's work is broad and goes significantly beyond traditional areas of information retrieval such as search strategies and information filtering. The research includes both low-level systems issues such as the design of protocols and architectures for distributed search, as well as more human-centered topics such as user interface design, visualization and data mining with text, and multimedia retrieval."
Information/Internet Agents: An Overview. An extensive FAQ from British Telecommunications. "Information agents have come about because of the sheer demand for tools to help us manage the explosive growth of information we are experiencing currently, and which we will continue to experience henceforth. Information agents perform the role of managing, manipulating or collating information from many distributed sources." Seeking Better Web Searches - Deluged with superfluous responses to online queries, users will soon benefit from improved search engines that deliver customized results. By Javed Mostafa. Scientific American (February 2005). "New search engines are improving the quality of results by delving deeper into the storehouse of materials available online, by sorting and presenting those results better, and by tracking your long-term interests so that they can refine their handling of new information requests. In the future, search engines will broaden content horizons as well, doing more than simply processing keyword queries typed into a text box. They will be able to automatically take into account your location--letting your wireless PDA, for instance, pinpoint the nearest restaurant when you are traveling. New systems will also find just the right picture faster by matching your sketches to similar shapes. They will even be able to name that half-remembered tune if you hum a few bars." Academia's quest for the ultimate search tool. By Stefanie Olsen. CNET News.com (August 15, 2005). "The University of California at Berkeley is creating an interdisciplinary center for advanced search technologies and is in talks with search giants including Google to join the project, CNET News.com has learned. ... The principal areas of focus: privacy, fraud, multimedia search and personalization. ... The success of the $5 billion-a-year search-advertising business is fueling Internet research and development in many ways. ... The search problems of today are different from those of five years ago. ... Jaime Carbonell, director of CMU's Language Technologies Institute, said his research team is perfecting a technology for personalized search that would solve some of the privacy concerns surrounding the wide-scale collection of sensitive data, such as names and query histories. ... CMU is also working under a government grant on a longer-term project called Javelin, focused on question-and-answer search technology. ... The universities of Texas and Pennsylvania are also exploring different approaches to the same problem. Stanford continues in its role as a breeding ground for search projects. ... Stanford associate professor Andrew Ng, among others, is working on artificial-intelligence techniques for extracting knowledge from text in a search index. ... Stanford, the Massachusetts Institute of Technology and many other universities are working to solve problems presented by the library of tomorrow, which will be largely digitized. Sifting through and organizing billions of digital documents will require new search technology." ... and here are some more articles from our AI in the news collection:
Learning Probabilistic User Profiles. By Mark Ackerman and et al. (1997). AI Magazine 18 (2): 47-56. Applications for finding interesting web sites and notifying users of changes. The Web as a Database: New Extraction Technologies and Content Management. Katherine C. Adams (2001). Online Magazine; Volume 25, Number 2. "Information extraction research in the United States has a fascinating history. It is a product of the Cold War. In the late 1980s, a number of academic and industrial research sites were working on extracting information from naval messages in projects sponsored by the U.S. Navy. To compare the performance of these software systems, the Message Understanding Conferences (MUC) were started. These conferences were the first large-scale effort to evaluate natural language processing (NLP) systems and they continue to this day." Moving Up the Information Food Chain. By Oren Etzioni (1997). AI Magazine 18 (2): 11-18. A look at deploying softbots on the World Wide Web. When the web starts thinking for itself. By David Green. vnunet's Ebusinessadvisor (December 20, 2002). "The so-called semantic web is an extension of the current web in which data is given meaning through the use of a series of technologies. ... Ontologies provide a deeper level of meaning by providing equivalence relations between terms (i.e. term A on my web page is expressing the same concept as term B on your web page). An ontology is a file that formally defines relations among terms, for example, a taxonomy and set of inference rules. By providing such 'dictionaries of meaning' (in philosophy ontology means 'nature of existence') ontologies can improve the accuracy of web searches by allowing a search program to seek out pages that refer to a specific concept rather than just a particular term as they do now. While XML, RDF and ontologies provide the basic infrastructure of the semantic web, it is intelligent agents that will realise its power." Is There an Intelligent Agent in Your Future? By James A. Hendler (1999). (This wonderful paper received the AAAI-2000 Effective Expository Writing Award.) Savvysearch... By Adele Howe, and Daniel Dreilinger (1997). AI Magazine 18 (2): 19-25. Description of a metasearch engine that learns which search engines to query. Designing Systems That Adapt to Their Users. An AAAI-02 Tutorial by Anthony Jameson, Joseph Konstan, and John Riedl. "Personalized recommendation of products, documents, and collaborators has become an important way of meeting user needs in commerce, information provision, and community services, whether on the web, through mobile interfaces, or through traditional desktop interfaces. This tutorial first reviews the types of personalized recommendation that are being used commercially and in research systems. It then systematically presents and compares the underlying AI techniques, including recent variants and extensions of collaborative filtering, demographic and case-based approaches, and decision-theoretic methods. The properties of the various techniques will be compared within a general framework, so that participants learn how to match recommendation techniques to applications and how to combine complementary techniques." Microsoft Research seeks better search. By Michael Kanellos. CNET News (April 17, 2003). "Microsoft Research is plugging away at one of the growing dilemmas in computing: so much data, so little time. Scientists in the Redmond, Wash.-based software giant's labs are experimenting with new types of search and user interface technology that will let individuals and businesses tap into the vast amounts of data on the Internet, or inside their own computers, that increasingly will be impractical or impossible to find." 18th century theory is new force in computing. By Michael Kanellos. ZDNet (February 19, 2003). "Search giant Google and Autonomy , a company that sells information retrieval tools, both employ Bayesian principles to provide likely (but technically never exact) results to data searches. ... Probabilistic thinking changes the way people interact with computers. ... 'The idea is that the computer seems more like an aid rather than a final device,' said Peter Norvig, director of security quality at Google. 'What you are looking for is some guidance, not a model answer.' Search has benefited substantially from this shift. A few years ago, common use of so-called Boolean search engines required queries submitted in the 'if, and, or but' grammar to find matching words. Now search engines employ complex algorithms to comb databases and produce likely matches." IBM aims to get smart about AI. By Michael Kanellos. CNET News (January 20, 2003). "In the coming months, IBM will unveil technology that it believes will vastly improve the way computers access and use data by unifying the different schools of thought surrounding artificial intelligence. The Unstructured Information Management Architecture (UIMA) is an XML-based data retrieval architecture under development at IBM."
The Hidden Web. By Henry Kautz, Bart Selman, and Mehul Shah (1997). AI Magazine 18 (2): 27-35. A project that helps users locate experts on the Web. Lifestyle Finder: Intelligent User Profiling Using Large-Scale Demographic Data. By Bruce Krulwich (1997). AI Magazine 18 (2): 37-56. In Search of a Lost Melody - Computer assisted music: identification and retrieval. By Kjell Lemstrom. Finnish Music Quarterly Magazine 3-4/2000. The Search Engine That Could. Reported by Spencer Michels. The NewsHour (PBS; November 29, 2002). Also available in audio and video formats. Hear/see Larry Page and Sergay Brin, co-founders of Google, Skip Battle, the new CEO at Ask Jeeves, and others. Diagnosing Delivery Problems in the White House Information-Distribution System. By Mark Nahabedian and Howard Shrobe (1996). AI Magazine 17 (4): 21-29. Use of AI in selective information distribution.
Search engines try to find their sound. By Stefanie Olsen. CNET News (May 27, 2004). "Most 'spiders' that crawl and index the Web are effectively blind to audio and video content, making NPR's highly regarded radio programming all but invisible to mainstream search engines. ... Consumers armed with broadband connections at home are driving new demand for multimedia content and setting off a new wave of technology development among search engine companies eager to extend their empires from the static world of text to the dynamic realm of video and audio. ... Most ambitiously of all, a handful [of search engines] are bent on searching inside the files to extract meaning and relevance by examining audio and video features directly. StreamSage is starting to make waves with its audio and video search technology, introduced late last year. The Washington, D.C.-based company developed software after roughly three years of research that uses speech recognition technology to transcribe audio and video. It then uses contextual analysis to understand the language and parse the themes of the content. As a result, it can generate a kind of table of contents for the topics discussed in the files." The Revolution in Legal Information Retrieval or: The Empire Strikes Back. By Erich Schweighofer (1999). The Journal of Information, Law and Technology 1999(1). "The issue is how to deal with the Artificial Intelligence (AI)-hard problem of making sense of the mass of legal information." Text Mining Technology - Turning Information Into Knowledge. A white paper from IBM (1998), Daniel Tkach, editor.
The Role of Intelligent Systems in the National Information Infrastructure. The American Association for Artificial Intelligence. Edited by Daniel S. Weld. A
cure for info overload. ACM Special Interest Group on Information Retrieval (SIGIR). "ACM SIGIR addresses issues ranging from theory to user demands in the application of computers to the acquisition, organization, storage, retrieval, and distribution of information." Be sure to check out their collection of Information Retrieval Resources. Brainboost Answer Engine. "Brainboost uses Machine Learning and Natural Language Processing techniques to go the extra mile, by actually answering questions, in plain English." The British Computer Society Information Retrieval Specialist Group. CMU Text Learning Group. "Our goal is to develop new machine learning algorithms for text and hypertext data. Applications of these algorithms include information filtering systems for the Internet, and software agents that make decisions based on text information." Among their many projects you'll find:
HP SpeechBot - audio search using speech recognition. From Hewlett-Packard.
Introduction to Information Extraction Technology. IJCAI-99 Tutorial by Douglas E. Appelt and David Israel, Artificial Intelligence Center, SRI International. In addition to the notes from the tutorial, you'll find these collections of links: Research Projects and Systems, Papers, and Resources and Tools for building information extraction systems. MARVEL: "The Intelligent Information Management Department at IBM Research is developing a multimedia analysis and retrieval system called MARVEL. MARVEL helps organize the large and growing amounts of multimedia data (e.g., video, images, audio) by using machine learning techniques to automatically label its content. The system recently won the Wall Street Journal 2004 Innovation Award in the multimedia category." A demo is available. The National Centre for Text Mining (NaCTeM). "We provide text mining services in response to the requirements of the UK academic community. Our initial focus is on applications in the biological and medical domains, where the major successes in the mining of scientific texts have so far occurred." "NewsInEssence is a system for finding and summarizing clusters of related news articles from multiple sources on the Web. It is under development by the CLAIR group at the University of Michigan." You can see it in action here. Phibot. A research project of the University of Mainz, the German Research Center of Artificial Intelligence (DFKI) and brainbot technologies AG. "Phibot is an intelligent internet information retrieval tool for scientists. As part of the Adaptive Read Project, phibot is a web-based experiment for collaborative information retrieval." Also see:
"START, the world's first Web-based question answering system, has been on-line and continuously operating since December, 1993. It has been developed by Boris Katz and his associates of the InfoLab Group at the MIT Computer Science and Artificial Intelligence Laboratory. Unlike information retrieval systems (e.g., search engines), START aims to supply users with 'just the right information,' instead of merely providing a list of hits." SUMMARIST: Automated Text Summarization project from The Natural Language Processing group at the Information Sciences Institute of the University of Southern California (USC/ISI). "Summarization is a hard problem of Natural Language Processing because, to do it properly, one has to really understand the point of a text. This requires semantic analysis, discourse processing, and inferential interpretation (grouping of the content using world knowledge)." Sun Microsystems Conceptual Indexing Project. "How often have you failed to find what you wanted in an online search because the words you used failed to match words in the material that you needed? Concept-based retrieval systems attempt to reach beyond the standard keyword approach of simply counting the words from your request that occur in a document. The Conceptual Indexing Project is developing techniques that use knowledge of concepts and their interrelationships to find correspondences between the concepts in your request and those that occur in text passages. Our goal is to improve the convenience and effectiveness of online information access. The central focus of this project is the "paraphrase problem," in which the words used in a query are different from, but conceptually related to, those in material that you need."
Aluri, Rao, and Donald E. Riggs, editors. 1990. Expert Systems in Libraries. Norwood, NJ: Ablex Pub. Corp. Association of Research Libraries. 1991. Expert Systems in ARL Libraries. Washington, DC: ARL. Davies, Peter. 1991. Artificial Intelligence: Its Role in the Information Industry. Medford, NJ: Learned Information, Inc. Ford, Nigel. 1991. Expert Systems and Artificial Intelligence: An Information Manager's Guide. London: Library Association Pub. Hovy, Eduard and Dragomir Radev, Cochairs. Intelligent Text Summarization: Papers from the 1998 AAAI Spring Symposium. Jacobs, Paul S., editor. 1992. Text-Based Intelligent Systems : Current Research and Practice in Information Extraction and Retrieval. Hillsdale, NJ: L. Erlbaum Associates. Jones, Karen Sparck. 1999. Information Retrieval and Artificial Intelligence. Artificial Intelligence 114(1-2): 257-281. Kautz, Henry; Chair. 1998. Recommender Systems - Papers from the AAAI Workshop. Technical Report WS-98-08 "Over the past few years a new kind of application, the 'recommender system,' has appeared, based on a synthesis of ideas from artificial intelligence, human-computer interaction, sociology, information retrieval, and the technology of the WWW. Recommender systems assist and augment the natural process of relying on friends, colleagues, publications, and other sources to make the choices that arise in everyday life. Examples of the kinds of questions that could be answered by a recommender system include: What kind of car should I buy? What web-pages would I find most interesting? What people in my company would be best assigned to a particular project team?" Lyons, Daniel. 1997. The Buzz About Firefly. The New York Times Magazine (June 29, 1997):36-37+. Maybury, Mark T., editor. 1993. Intelligent Multimedia Interfaces. Menlo Park and Cambridge: AAAI Press/MIT Press. This book covers the ground where artificial intelligence, multimedia computing, information retrieval and human-computer interfaces all overlap. Michelson, Avra. 1991. Expert Systems Technology and its Implication for Archives. Washington, DC: National Archives and Records Administration. Special Libraries Association. 1991. Expert Systems and Library Applications : An SLA Information Kit. Washington, DC: Special Libraries Assn. van Rijsbergen, Keith.1979. Information Retrieval, 2nd Edition. London: Butterworths. Verity, John W. 1997. Coaxing Meaning Out of Raw Data. Business Week (February 3, 1997):134+.
|