|February 5, 2004
Nippon Telegraph and Telephone Corp.
NTT-X, Inc. (NTT-X)
NTT Begins joint tests of "Web Answers" Japanese Natural Language Search Service on the Portal Site "goo"
--A new search method that instantly analyzes questions posed in natural language expressions, and provides accurate search results--
Nippon Telegraph and Telephone Corporation (NTT; Head Office: Chiyoda-ku, Tokyo; President: Norio Wada) and NTT-X, Inc. (NTT-X; Head Office: Chiyoda-ku, Tokyo; President: Takao Nakajima), have announced that as of today, they will begin joint tests of the "Web Answers" Japanese Natural Language Search Service, which uses an advanced text analysis technology developed by NTT Cyber Space Laboratories. The tests, which will be conducted on the "goo" (*1) portal site operated by NTT-X, are designed to verify and evaluate the new search service in a commercial environment.
"Web Answers" instantly analyzes questions written in natural spoken language expressions, such as "Where will the 2008 Olympics be held?" After executing a "goo" search, the system quickly analyzes and extracts from the search results words and expressions that are potential answers (in this case, the correct answer is "Beijing), ranks the Web pages that contain these words and expressions at the top of the list, and presents these to the user (ref. Attachment 1). This enables the user to acquire the desired information more efficiently, and makes it possible to use the ever-increasing information assets on the information more effectively than ever before.
Web Answers ( http://labs.goo.ne.jp/ ) can be accessed from February 5 of this year to March 31, 2004 (tentative). It will be linked from the "goo Lab"(*2) test site which was opened in the "goo" site, as the second test of comprehensive Web search engine using "Info Lead" Net Space cruising technology in October,2003.
1 Background and goals of joint tests
"goo" and other similar Web search services are essential as a means of acquiring desired information from among the huge volumes of diverse information that exists on the Internet. Up to now, however, if the user wanted to acquire information in response to a question like "Where will the 2008 Olympics be held?" he had to select and enter keywords himself, like "2008" and "Olympics," and then search for the appropriate information as a response from the Web pages containing those keywords, which are displayed as search results.
The new "Web Answers" service offers an advanced search using natural language - one that is far superior to conventional keyword searches - and provides the most appropriate search results. This tool will be useful in increasing the value of portal sites, of which search functions are an important component element. By offering test service on the portal site "goo Lab" targeting general Internet users, NTT and NTT-X will verify the technical aspects of this function, and will evaluate its business portal.
2 Role of each company
NTT has proposed the concept of the Japanese Natural Language Search Service, and will provide the advanced text analysis function required for this service. In this way, it will verify the technical aspects of this function in a commercial environment, and acquire data required for improving accuracy, which will be reflected in future development.
NTT-X will evaluate the business potential of this service based on increases in the "goo" usage rate and verification of service effects. This service is the second to be offered on the "goo Lab" site for studies of commercial service implementation after the completion of tests, following comprehensive 3D Web search service tests. NTT-X will also use these tests to emphasize the innovative qualities of the NTT Group's "goo" portal site, one of the most highly recognized Internet portals in the Internet business field.
3 Keys to advanced text analysis technology
The advanced text analysis technology is comprised of the three technologies described below. The high-speed named entity (*3) extraction technology in particular is an essential technology for instantly providing information desired by the user in the context of searches targeting Web pages.
The accuracy based on these technologies and technology improvement, the users evaluate the answers to the question, the evaluation makes it for service to reflect.
(1) Question type classification technology
With "Web Answers," the user inputs a question like "What is Astroboy's birthday?" and the system instantly understands that the type of information desired is the "date" of a birthday. To achieve a rapid search, it is important to quickly grasp and categorize the type of question, but because questions are so diverse, it has been difficult to categorize them quickly by hand. This technology is able to efficiently judge the type of question by automatically generating rules for identifying question types based on previously prepared question samples and their types. Furthermore, the classification of the word's meaning (the meaning attribute for "birthday" is "date") is matched against the "Goi-Taikei --- A Japanese Lexicon" (*4), a massive knowledge base of Japanese vocabulary. Because words that use different expressions, such as "birthday" and "date of birth," can be handled as having the same meaning, the system can learn automatically with even greater efficiency, so as to understand the meaning of the user's question more correctly.
(2) High-speed named entity extraction technology (ref. Attachment 2)
When "Web Answers" receives the inputted question "What is Astroboy's birthday?" it first selects keywords like "Astroboy" and "birthday," and executes a keyword search on "goo." At this point, the search results are ranked based on relevancy with the input keywords, but in some cases the desired information is not included in the Web pages ranked at the top of the search results.
The information that would provide the answer, however, has already been narrowed down, in that it must be related to a "date." The system thus instantly extracts named entity candidates from the excerpt text from the Web pages displayed as a part of search results. These named entities are people's names, company names, or dates that would apply to the type of information desired by the user (in this example, "dates"). Because the Web page search covers such a huge volume of data, high-speed processing is essential, but this technology learns the word order patterns that make up named entities in advance and stores these patterns in the computer in a compact format, thus enabling the system to extract named entities from the word strings quickly.
(3) Web page re-ranking technology
Web Answers extracts a named entity (in this example, a "date") using the high-speed named entity extraction technology described in (2) above. It then give a high score to Web pages in which expressions related to the search keyword and dates appear in close proximity, and which appear with a high frequency, because it is highly likely that these Web pages will contain the desired answer. The system then re-ranks these pages, so that the information required by the user can be displayed in a higher position.
4 Future developments
NTT will continue to promote development aimed at improving the functions of Internet search services in order to further increase the added value of portal sites in the broadband era. Based on the data acquired through these tests, NTT-X plans to examine the business potential of Web Answers, with a view toward adding this service to the "goo" site.
The most highly recognized Internet portal site in Japan, operated by NTT-X. The search engine, which is the core service of the site, offers not only a Web page search service, but also searches of extensive and diverse databases dictionaries, maps, and other useful information.
*2 goo Lab:
A test site that utilizes new technologies developed by NTT Laboratories to demonstrate to society the potential of advanced Internet services.
*3 Named entity:
Noun expressions that provide answers to questions such as "who, when, and where." Specifically, named entities are proper nouns, such as the names of people, places, or companies, or quantitative expressions such as dates or monetary amounts. For example, the expression "Nippon Telegraph and Telephone Corp." is made up of a combination of words like "Nippon," "Telegraph," and "Telephone," which together become a named entity expressing the name of an organization.
*4 "Goi-Taikei --- a Japanese Lexicon":
A dictionary of Japanese meanings developed by the NTT Laboratories as a dictionary for use in the Japanese-English machine translation system ALT-J/E. Includes meaning attributes and other vocabulary and knowledge information regarding some 400,000 Japanese words. This dictionary of Japanese meanings has been reedited for human use, and has been published by Iwanami Shoten, Publishers under the title " (Nihongo Goi-Taikei)"
- Attachment 1: Image of Japanese Natural Language Search Service
- Attachment 2: Outline of high-speed named entity extraction technology
For further information, please contact:
Nippon Telegraph and Telephone Corp.
NTT Cyber Solutions Laboratories
PR Section; Sadakata / Yamashita
Public Relations; Suzuki, Tabata, Kuriyama
Copyright (c) 2004 Nippon telegraph and telephone corporation