SEARCH ENGINES

In general, the term search engines refers to the software programs which collectively work together to support the indexing and retrieving of data. The explosive growth of computer generated data and the creation of nationwide data networks have lead to concurrent growth in the technology of search engines during the past several years. Numerous private companies, research institutes, and public sector organizations have become engaged in developing products for the search engine market. This is likely to be an evolving and continually improving field of technology for several years. By comparison, the systems now in operation in the Senate, House, and Library were created over 20 years ago and had to be built almost entirely "from scratch." It is these systems, sometimes referred to as "legacy systems" which the new legislative information system will eventually replace, once it can provide the same data and system capabilities, or comparable functionality. While the House, Senate, and Library have continually improved their systems within the limits of the original technology used in their construction, it is clear that to achieve the goals of the new legislative system, it is necessary to build it with today's tools, which have been developed from the beginning to leverage the advantages of distributed processing, graphical user interfaces, and high performance, integrated communication systems.

The capabilities of today's search engines vary depending upon whether they are constructed as complete and independent systems or as "tool kits" which skilled programmers can use to construct a complete system. Prices vary accordingly, and the most cost-effective choice depends upon a variety of factors such as the skill and experience of inhouse staff, the projected need for continuing development, the specific capabilities required, etc. The House, Senate, GPO, and Library computer centers have each begun to use some of these new retrieval systems to meet several of their unique institutional requirements.

One of the most important advantages of the technology now being brought into the Legislative Branch is that it is built upon open rather than proprietary standards. In a distributed technical environment, open standards are critical because they make it possible to integrate systems so that duplication of effort can be avoided. At the same time, open standards make it unnecessary to require that each technical organization utilize the same search engine or data collection system, so long as they can all meet the essential requirements for interoperability. In today's competitive market environment, this principle is especially important because it allows the Congress to continue to take advantage of technical and price competition as such opportunities arise without having to insist that every organization within the Legislative Branch acquire and use the same software at the same time. This promotes continued competition among commercial companies in developing technology that meets Congress' requirements.

Diversity and competition in the field of search engines is also important because of the range of requirements inherent in certain kinds of data and among certain kinds of users. Some systems may be optimized for searching and displaying text, others may be optimized for searching and displaying video. Similarly, some systems may have advantages for novice users, others for expert searchers. Because the legislative information system will have to accommodate data from a variety of sources, the system would have to accommodate the needs of staff who might be using a variety of search engines. The Library recommends that the Working Group form a subteam to address issues related to the integration of search engines, to continually assess the state-of-the-art in this technical field, and to regularly evaluate the effectiveness of the search engines used in Congress' legislative information system.

Table of Contents